[jira] [Created] (ARTEMIS-3061) getting AMQP duplicate property can save many String comparisons and class checks
Francesco Nigro created ARTEMIS-3061: Summary: getting AMQP duplicate property can save many String comparisons and class checks Key: ARTEMIS-3061 URL: https://issues.apache.org/jira/browse/ARTEMIS-3061 Project: ActiveMQ Artemis Issue Type: Improvement Components: AMQP Affects Versions: 2.16.0 Reporter: Francesco Nigro Assignee: Francesco Nigro -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3059) AMQP message reencoding should save creating Netty heap arenas
[ https://issues.apache.org/jira/browse/ARTEMIS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3059: - Summary: AMQP message reencoding should save creating Netty heap arenas (was: AMQP reencoding should save creating Netty heap arenas) > AMQP message reencoding should save creating Netty heap arenas > -- > > Key: ARTEMIS-3059 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3059 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: AMQP, Broker >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > AMQP reencoding is using Netty pooled heap buffers to encode the message, > creating heap arenas that would affect the broker heap memory footprint: this > could be saved by using off-heap/direct arenas that are already allocated by > the broker for networking. > > What can cause messages to be re-encoded is sending them across bridges, that > means that cluster connections (that are special bridges) can stealthy affect > the broker memory footprint. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3059) AMQP reencoding should save creating Netty heap arenas
[ https://issues.apache.org/jira/browse/ARTEMIS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3059: - Description: AMQP reencoding is using Netty pooled heap buffers to encode the message, creating heap arenas that would affect the broker heap memory footprint: this could be saved by using off-heap/direct arenas that are already allocated by the broker for networking. What can cause messages to be re-encoded is sending them across bridges, that means that cluster connections (that are special bridges) can stealthy affect the broker memory footprint. was:AMQP reencoding is using Netty pooled heap buffers to encode the message, creating heap arenas that would affect the broker heap memory footprint: this could be saved by using off-heap/direct arenas that are already allocated by the broker for networking. > AMQP reencoding should save creating Netty heap arenas > -- > > Key: ARTEMIS-3059 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3059 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: AMQP, Broker >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > AMQP reencoding is using Netty pooled heap buffers to encode the message, > creating heap arenas that would affect the broker heap memory footprint: this > could be saved by using off-heap/direct arenas that are already allocated by > the broker for networking. > > What can cause messages to be re-encoded is sending them across bridges, that > means that cluster connections (that are special bridges) can stealthy affect > the broker memory footprint. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (ARTEMIS-3059) AMQP reencoding should save creating Netty heap arenas
[ https://issues.apache.org/jira/browse/ARTEMIS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on ARTEMIS-3059 started by Francesco Nigro. > AMQP reencoding should save creating Netty heap arenas > -- > > Key: ARTEMIS-3059 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3059 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: AMQP, Broker >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > AMQP reencoding is using Netty pooled heap buffers to encode the message, > creating heap arenas that would affect the broker heap memory footprint: this > could be saved by using off-heap/direct arenas that are already allocated by > the broker for networking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-3059) AMQP reencoding should save creating Netty heap arenas
Francesco Nigro created ARTEMIS-3059: Summary: AMQP reencoding should save creating Netty heap arenas Key: ARTEMIS-3059 URL: https://issues.apache.org/jira/browse/ARTEMIS-3059 Project: ActiveMQ Artemis Issue Type: Improvement Components: AMQP, Broker Reporter: Francesco Nigro Assignee: Francesco Nigro AMQP reencoding is using Netty pooled heap buffers to encode the message, creating heap arenas that would affect the broker heap memory footprint: this could be saved by using off-heap/direct arenas that are already allocated by the broker for networking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (ARTEMIS-3021) OOM due to wrong CORE message memory estimation
[ https://issues.apache.org/jira/browse/ARTEMIS-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on ARTEMIS-3021 started by Francesco Nigro. > OOM due to wrong CORE message memory estimation > --- > > Key: ARTEMIS-3021 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3021 > Project: ActiveMQ Artemis > Issue Type: Bug >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Durable CORE messages can get their internal buffer enlarged by > encodeHeadersAndProperties while being persisted on the journal, but the > address size memory estimation using the estimated memory of a message is > performed before that, making it less precise. > This bad timing estimation, together with Netty ByteBuf auto-sizing mechanism > can cause the broker to underestimate the message footprint, causing it to go > OOM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost
[ https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3049: - Description: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to an array lookup. [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] clearly show the issue with the current implementation. The ideal approaches to improve it could be: # to replace the chunked list with a copy on write array list # to use cursor/iterator API over the chunk list, binding one to each consumer, in order to get a linear stride over the live paged messages Sadly, the latter approach seems not doable because the live page cache is accessed for each message lookup in an anonymous way, making impossible to have a 1:1 binding with the consumers, while the former seems not doable, because of the array copy cost on appending. There is still one case that could be improved using the former approach, instead, delivering a huge speedup on lookup cost: while reloading live pages. A reloaded live page already knows the amount of the loaded live paged messages, making possible to store them in a simple array, allowing a much faster lookup. was: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to an array lookup. [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] clearly show the issue with the current implementation. The ideal approaches to improve it could be: # to replace the chunked list with a copy on write array list # to use cursor/iterator API over the chunk list, binding one to each consumer, in order to get a linear stride over the live paged messages Sadly, the latter approach seems not doable because the live page cache is accessed for each message lookup in an anonymous way, making impossible to have a 1:1 binding with the consumers, while the former seems not doable, because of the array copy cost on appending. There is still one case that could be improved using the former approach, instead, delivering a huge speedup on lookup cost: while reloading live pages. A reloaded live page already knows the amount of the loaded live paged messages, making possible to store them in a simple array. > Reduce live page lookup cost > > > Key: ARTEMIS-3049 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3049 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > LivePageCacheImpl::getMessage is performing a linked-list-like lookup that > can be rather slow if compared to an array lookup. > [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] > clearly show the issue with the current implementation. > The ideal approaches to improve it could be: > # to replace the chunked list with a copy on write array list > # to use cursor/iterator API over the chunk list, binding one to each > consumer, in order to get a linear stride over the live paged messages > Sadly, the latter approach seems not doable because the live page cache is > accessed for each message lookup in an anonymous way, making impossible to > have a 1:1 binding with the consumers, while the former seems not doable, > because of the array copy cost on appending. > > There is still one case that could be improved using the former approach, > instead, delivering a huge speedup on lookup cost: while reloading live pages. > A reloaded live page already knows the amount of the loaded live paged > messages, making possible to store them in a simple array, allowing a much > faster lookup. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost
[ https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3049: - Description: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to an array lookup. [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] clearly show the issue with the current implementation. The ideal approaches to improve it could be: # to replace the chunked list with a copy on write array list # to use cursor/iterator API over the chunk list, binding one to each consumer, in order to get a linear stride over the live paged messages Sadly, the latter approach seems not doable because the live page cache is accessed for each message lookup in an anonymous way, making impossible to have a 1:1 binding with the consumers, while the former seems not doable, because of the array copy cost on appending. There is still one case that could be improved using the former approach, instead, delivering a huge speedup on lookup cost: while reloading of live pages. A reloaded live page already knows the amount of the loaded live paged messages, making possible to store them in a simple array. was: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to an array lookup. [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] clearly show the issue with the current implementation. The ideal approaches to improve it could be: # to replace the chunked list with a copy on write array list # to use cursor/iterator API over the chunk list, binding one to each consumer, in order to get a linear stride over the live paged messages Sadly, the latter approach seems not doable because the live page cache is accessed for each message lookup in an anonymous way, making impossible to have a 1:1 binding with the consumers, while the former seems not doable, because of the array copy cost on appending. There is still one case that could be improved using the former approach, instead, delivering a huge speedup on lookup cost: reloading of live page. A reloaded live page already knows the amount of the loaded live paged messages, making possible to store them in a simple array. > Reduce live page lookup cost > > > Key: ARTEMIS-3049 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3049 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > LivePageCacheImpl::getMessage is performing a linked-list-like lookup that > can be rather slow if compared to an array lookup. > [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] > clearly show the issue with the current implementation. > The ideal approaches to improve it could be: > # to replace the chunked list with a copy on write array list > # to use cursor/iterator API over the chunk list, binding one to each > consumer, in order to get a linear stride over the live paged messages > Sadly, the latter approach seems not doable because the live page cache is > accessed for each message lookup in an anonymous way, making impossible to > have a 1:1 binding with the consumers, while the former seems not doable, > because of the array copy cost on appending. > > There is still one case that could be improved using the former approach, > instead, delivering a huge speedup on lookup cost: while reloading of live > pages. > A reloaded live page already knows the amount of the loaded live paged > messages, making possible to store them in a simple array. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost
[ https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3049: - Description: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to an array lookup. [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] clearly show the issue with the current implementation. The ideal approaches to improve it could be: # to replace the chunked list with a copy on write array list # to use cursor/iterator API over the chunk list, binding one to each consumer, in order to get a linear stride over the live paged messages Sadly, the latter approach seems not doable because the live page cache is accessed for each message lookup in an anonymous way, making impossible to have a 1:1 binding with the consumers, while the former seems not doable, because of the array copy cost on appending. There is still one case that could be improved using the former approach, instead, delivering a huge speedup on lookup cost: while reloading live pages. A reloaded live page already knows the amount of the loaded live paged messages, making possible to store them in a simple array. was: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to an array lookup. [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] clearly show the issue with the current implementation. The ideal approaches to improve it could be: # to replace the chunked list with a copy on write array list # to use cursor/iterator API over the chunk list, binding one to each consumer, in order to get a linear stride over the live paged messages Sadly, the latter approach seems not doable because the live page cache is accessed for each message lookup in an anonymous way, making impossible to have a 1:1 binding with the consumers, while the former seems not doable, because of the array copy cost on appending. There is still one case that could be improved using the former approach, instead, delivering a huge speedup on lookup cost: while reloading of live pages. A reloaded live page already knows the amount of the loaded live paged messages, making possible to store them in a simple array. > Reduce live page lookup cost > > > Key: ARTEMIS-3049 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3049 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > LivePageCacheImpl::getMessage is performing a linked-list-like lookup that > can be rather slow if compared to an array lookup. > [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] > clearly show the issue with the current implementation. > The ideal approaches to improve it could be: > # to replace the chunked list with a copy on write array list > # to use cursor/iterator API over the chunk list, binding one to each > consumer, in order to get a linear stride over the live paged messages > Sadly, the latter approach seems not doable because the live page cache is > accessed for each message lookup in an anonymous way, making impossible to > have a 1:1 binding with the consumers, while the former seems not doable, > because of the array copy cost on appending. > > There is still one case that could be improved using the former approach, > instead, delivering a huge speedup on lookup cost: while reloading live pages. > A reloaded live page already knows the amount of the loaded live paged > messages, making possible to store them in a simple array. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3051) Fix MessageReferenceImpl::getMemoryEstimate
[ https://issues.apache.org/jira/browse/ARTEMIS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3051: - Description: MessageReferenceImpl::memoryOffset is used on MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes. [https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit using COOPS and 8 bytes alignment, that's very common, and that's a more accurate estimated footprint value. To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that could be improved in a bigger follow-up PR using JOL on the test suite to double check footprint. The interesting thing is that paging will be positively affected by this change, because the broker won't under-estimate the memory footprint of many references, triggering paging sooner. was: MessageReferenceImpl::memoryOffset is used on MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes. [https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit using COOPS and 8 bytes alignment, that's very common, and that's a more accurate estimated footprint value. To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that could be improved in a bigger follow-up PR. The interesting thing is that paging will be positively affected by this change, because the broker won't under-estimate the memory footprint of many references, triggering paging sooner. > Fix MessageReferenceImpl::getMemoryEstimate > --- > > Key: ARTEMIS-3051 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3051 > Project: ActiveMQ Artemis > Issue Type: Bug >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > MessageReferenceImpl::memoryOffset is used on > MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes. > [https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit > using COOPS and 8 bytes alignment, that's very common, and that's a more > accurate estimated footprint value. > To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that > could be improved in a bigger follow-up PR using JOL on the test suite to > double check footprint. > > The interesting thing is that paging will be positively affected by this > change, because the broker won't under-estimate the memory footprint of many > references, triggering paging sooner. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3051) Fix MessageReferenceImpl::getMemoryEstimate
[ https://issues.apache.org/jira/browse/ARTEMIS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3051: - Description: MessageReferenceImpl::memoryOffset is used on MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes. [https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit using COOPS and 8 bytes alignment, that's very common, and that's a more accurate estimated footprint value. To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that could be improved in a bigger follow-up PR. The interesting thing is that paging will be positively affected by this change, because the broker won't under-estimate the memory footprint of many references, triggering paging sooner. was: MessageReferenceImpl::memoryOffset is used on MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes. [https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit using COOPS and 8 bytes alignment, that's very common, and that's a more accurate estimated footprint value. To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that could be improved in a bigger follow-up PR. > Fix MessageReferenceImpl::getMemoryEstimate > --- > > Key: ARTEMIS-3051 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3051 > Project: ActiveMQ Artemis > Issue Type: Bug >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > MessageReferenceImpl::memoryOffset is used on > MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes. > [https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit > using COOPS and 8 bytes alignment, that's very common, and that's a more > accurate estimated footprint value. > To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that > could be improved in a bigger follow-up PR. > > The interesting thing is that paging will be positively affected by this > change, because the broker won't under-estimate the memory footprint of many > references, triggering paging sooner. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-3051) Fix MessageReferenceImpl::getMemoryEstimate
Francesco Nigro created ARTEMIS-3051: Summary: Fix MessageReferenceImpl::getMemoryEstimate Key: ARTEMIS-3051 URL: https://issues.apache.org/jira/browse/ARTEMIS-3051 Project: ActiveMQ Artemis Issue Type: Bug Affects Versions: 2.16.0 Reporter: Francesco Nigro Assignee: Francesco Nigro MessageReferenceImpl::memoryOffset is used on MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes. [https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit using COOPS and 8 bytes alignment, that's very common, and that's a more accurate estimated footprint value. To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that could be improved in a bigger follow-up PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3050) Reduce PagedReferenceImpl memory footprint
[ https://issues.apache.org/jira/browse/ARTEMIS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3050: - Description: PagedReferenceImpl is never being used as QueueImpl node, hence it doesn't make sense for it to extends the node class, saving few bytes of memory footprint eg a COOPS 64 bit JVM get 88 bytes vs 104 bytes -> ~18% saved memory for each message ref with no semantic impacts. Priority: Trivial (was: Major) > Reduce PagedReferenceImpl memory footprint > -- > > Key: ARTEMIS-3050 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3050 > Project: ActiveMQ Artemis > Issue Type: Improvement >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Trivial > > PagedReferenceImpl is never being used as QueueImpl node, hence it doesn't > make sense for it to extends the node class, saving few bytes of memory > footprint eg a COOPS 64 bit JVM get 88 bytes vs 104 bytes -> ~18% saved > memory for each message ref with no semantic impacts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3050) Reduce PagedReferenceImpl memory footprint
[ https://issues.apache.org/jira/browse/ARTEMIS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3050: - Summary: Reduce PagedReferenceImpl memory footprint (was: Reduce PagedMessage memory footprint) > Reduce PagedReferenceImpl memory footprint > -- > > Key: ARTEMIS-3050 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3050 > Project: ActiveMQ Artemis > Issue Type: Improvement >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-3050) Reduce PagedMessage memory footprint
Francesco Nigro created ARTEMIS-3050: Summary: Reduce PagedMessage memory footprint Key: ARTEMIS-3050 URL: https://issues.apache.org/jira/browse/ARTEMIS-3050 Project: ActiveMQ Artemis Issue Type: Improvement Reporter: Francesco Nigro Assignee: Francesco Nigro -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (ARTEMIS-3049) Reduce live page lookup cost
[ https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on ARTEMIS-3049 started by Francesco Nigro. > Reduce live page lookup cost > > > Key: ARTEMIS-3049 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3049 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > LivePageCacheImpl::getMessage is performing a linked-list-like lookup that > can be rather slow if compared to an array lookup. > [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] > clearly show the issue with the current implementation. > The ideal approaches to improve it could be: > # to replace the chunked list with a copy on write array list > # to use cursor/iterator API over the chunk list, binding one to each > consumer, in order to get a linear stride over the live paged messages > Sadly, the latter approach seems not doable because the live page cache is > accessed for each message lookup in an anonymous way, making impossible to > have a 1:1 binding with the consumers, while the former seems not doable, > because of the array copy cost on appending. > > There is still one case that could be improved using the former approach, > instead, delivering a huge speedup on lookup cost: reloading of live page. > A reloaded live page already knows the amount of the loaded live paged > messages, making possible to store them in a simple array. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost
[ https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3049: - Description: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to an array lookup. [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] clearly show the issue with the current implementation. The ideal approaches to improve it could be: # to replace the chunked list with a copy on write array list # to use cursor/iterator API over the chunk list, binding one to each consumer, in order to get a linear stride over the live paged messages Sadly, the latter approach seems not doable because the live page cache is accessed for each message lookup in an anonymous way, making impossible to have a 1:1 binding with the consumers, while the former seems not doable, because of the array copy cost on appending. There is still one case that could be improved using the former approach, instead, delivering a huge speedup on lookup cost: reloading of live page. A reloaded live page already knows the amount of the loaded live paged messages, making possible to store them in a simple array. was: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to a O(1) lookup on ArrayList-like data structure. it's possible to speed it up by: # using a last accessed buffer cache on the append only chunked list used on LivePageCacheImpl, to speedup the most recent (& nearest) accesses # using an array with the fresh reloaded paged messages, in case of cache reload https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939 clearly show the issue with the current implementation. > Reduce live page lookup cost > > > Key: ARTEMIS-3049 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3049 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > LivePageCacheImpl::getMessage is performing a linked-list-like lookup that > can be rather slow if compared to an array lookup. > [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] > clearly show the issue with the current implementation. > The ideal approaches to improve it could be: > # to replace the chunked list with a copy on write array list > # to use cursor/iterator API over the chunk list, binding one to each > consumer, in order to get a linear stride over the live paged messages > Sadly, the latter approach seems not doable because the live page cache is > accessed for each message lookup in an anonymous way, making impossible to > have a 1:1 binding with the consumers, while the former seems not doable, > because of the array copy cost on appending. > > There is still one case that could be improved using the former approach, > instead, delivering a huge speedup on lookup cost: reloading of live page. > A reloaded live page already knows the amount of the loaded live paged > messages, making possible to store them in a simple array. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARTEMIS-2852) Huge performance decrease between versions 2.2.0 and 2.13.0
[ https://issues.apache.org/jira/browse/ARTEMIS-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258743#comment-17258743 ] Francesco Nigro commented on ARTEMIS-2852: -- [~adamw1pl] I've looked at the new post on [https://softwaremill.com/mqperf/] And it seems that the new behaviour shown on: !https://softwaremill.com/user/pages/blog/mqperf/artemis.png?g-d425a3da|width=749,height=484! Is different from the one on https://softwaremill.com/mqperf-2017/ !https://softwaremill.com/user/themes/softwaremill/assets/_old-website/uploads/2017/07/mqperf/artemis.png! https://issues.apache.org/jira/browse/ARTEMIS-2877 should have already fixed the scalability issue, but I see that https://issues.apache.org/jira/browse/ARTEMIS-3045 could be a reasonable step forward to improve the current behavior: I don't still get why 2.2.0 should scale better then master itself, but I'll investigate on https://issues.apache.org/jira/browse/ARTEMIS-3045 about it. There is any chance you could check if [https://github.com/franz1981/activemq-artemis/tree/batching_replication_manager] is improving things? I don't know how it works and if there are any chances to get the result on the blog post updated at a certain point (before the next round); let me know... > Huge performance decrease between versions 2.2.0 and 2.13.0 > --- > > Key: ARTEMIS-2852 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2852 > Project: ActiveMQ Artemis > Issue Type: Bug >Reporter: Kasper Kondzielski >Assignee: Francesco Nigro >Priority: Major > Fix For: 2.16.0 > > Attachments: Selection_433.png, Selection_434.png, Selection_440.png, > Selection_441.png, Selection_451.png > > > Hi, > Recently, we started to prepare a new revision of our blog-post in which we > test various implementations of replicated queues. Previous version can be > found here: [https://softwaremill.com/mqperf/] > We updated artemis binary to 2.13.0, regenerated configuration file and > applied all the performance tricks you told us last time. In particular these > were: > * the {{Xmx}} java parameter bumped to {{16G (now bumped to 48G)}} > * in {{broker.xml}}, the {{global-max-size}} setting changed to {{8G (this > one we forgot to set, but we suspect that it is not the issue)}} > * {{journal-type}} set to {{MAPPED}} > * {{journal-datasync}}, {{journal-sync-non-transactional}} and > {{journal-sync-transactional}} all set to false > Apart from that we changed machines' type we use to r5.2xlarge ( 8 cores, 64 > GIB memory, Network bandwidth Up to 10 Gbps, Storage bandwidth Up to 4,750 > Mbps) and we decided to always run twice as much receivers as senders. > From our tests it looks like version 2.13.0 is not scaling as well, with the > increase of senders and receivers, as version 2.2.0 (previously tested). > Basically is not scaling at all as the throughput stays almost at the same > level, while previously it used to grow linearly. > Here you can find our tests results for both versions: > [https://docs.google.com/spreadsheets/d/1kr9fzSNLD8bOhMkP7K_4axBQiKel1aJtpxsBCOy9ugU/edit?usp=sharing] > We are aware that now there is a dedicated page in documentation about > performance tuning, but we are surprised that same settings as before > performs much worse. > Maybe there is an obvious property which we overlooked which should be turned > on? > All changes between those versions together with the final configuration can > be found on this merged PR: > [https://github.com/softwaremill/mqperf/commit/6bfae489e11a250dc9e6ef59719782f839e8874a] > > Charts showing machines' usage in attachments. Memory consumed by artemis > process didn't exceed ~ 16 GB. Bandwidht and cpu weren't also a bottlenecks. > p.s. I wanted to ask this question on mailing list/nabble forum first but it > seems that I don't have permissions to do so even though I registered & > subscribed. Is that intentional? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost
[ https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3049: - Description: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to a O(1) lookup on ArrayList-like data structure. it's possible to speed it up by: # using a last accessed buffer cache on the append only chunked list used on LivePageCacheImpl, to speedup the most recent (& nearest) accesses # using an array with the fresh reloaded paged messages, in case of cache reload https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939 clearly show the issue with the current implementation. was: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to a O(1) lookup on ArrayList-like data structure. it's possible to speed it up by: # using a last accessed buffer cache on the append only chunked list used on LivePageCacheImpl, to speedup the most recent (& nearest) accesses # using an array with the fresh reloaded paged messages, in case of cache reload https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939 explains clearly the issue with the current implementation. > Reduce live page lookup cost > > > Key: ARTEMIS-3049 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3049 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > LivePageCacheImpl::getMessage is performing a linked-list-like lookup that > can be rather slow if compared to a O(1) lookup on ArrayList-like data > structure. > it's possible to speed it up by: > # using a last accessed buffer cache on the append only chunked list used on > LivePageCacheImpl, to speedup the most recent (& nearest) accesses > # using an array with the fresh reloaded paged messages, in case of cache > reload > https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939 > clearly show the issue with the current implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost
[ https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3049: - Description: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to a O(1) lookup on ArrayList-like data structure. it's possible to speed it up by: # using a last accessed buffer cache on the append only chunked list used on LivePageCacheImpl, to speedup the most recent (& nearest) accesses # using an array with the fresh reloaded paged messages, in case of cache reload https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939 explains clearly the issue with the current implementation. was: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to a O(1) lookup on ArrayList-like data structure. it's possible to speed it up by: # using a last accessed buffer cache on the append only chunked list used on LivePageCacheImpl, to speedup the most recent (& nearest) accesses # using an array with the fresh reloaded paged messages, in case of cache reload > Reduce live page lookup cost > > > Key: ARTEMIS-3049 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3049 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > LivePageCacheImpl::getMessage is performing a linked-list-like lookup that > can be rather slow if compared to a O(1) lookup on ArrayList-like data > structure. > it's possible to speed it up by: > # using a last accessed buffer cache on the append only chunked list used on > LivePageCacheImpl, to speedup the most recent (& nearest) accesses > # using an array with the fresh reloaded paged messages, in case of cache > reload > https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939 > explains clearly the issue with the current implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost
[ https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3049: - Description: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to a O(1) lookup on ArrayList-like data structure. it's possible to speed it up by: # using a last accessed buffer cache on the append only chunked list used on LivePageCacheImpl, to speedup the most recent (& nearest) accesses # using an array with the fresh reloaded paged messages, in case of cache reload was: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to a O(1) lookup on ArrayList-like data structure. it's possible to speed it up by: # using a last accessed buffer cache on the append only chunked list used on LivePageCacheImpl, to speedup the most recent (& nearest) accesses # using an array with the any fresh reloaded paged messages, in case of cache reload > Reduce live page lookup cost > > > Key: ARTEMIS-3049 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3049 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > LivePageCacheImpl::getMessage is performing a linked-list-like lookup that > can be rather slow if compared to a O(1) lookup on ArrayList-like data > structure. > it's possible to speed it up by: > # using a last accessed buffer cache on the append only chunked list used on > LivePageCacheImpl, to speedup the most recent (& nearest) accesses > # using an array with the fresh reloaded paged messages, in case of cache > reload -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost
[ https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3049: - Description: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to a O(1) lookup on ArrayList-like data structure. it's possible to speed it up by: # using a last accessed buffer cache on the append only chunked list used on LivePageCacheImpl, to speedup the most recent (& nearest) accesses # using an array with the any fresh reloaded paged messages, in case of cache reload was: LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to a O(1) lookup on ArrayList-like data structure. it's possible to speed it up by: # using a last accessed buffer cache on the append only chunked list used on LivePageCacheImpl, to speedup nearest accesses (very likely to happen with a single consumer) # using an array with the any fresh reloaded paged messages, in case of cache reload > Reduce live page lookup cost > > > Key: ARTEMIS-3049 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3049 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > LivePageCacheImpl::getMessage is performing a linked-list-like lookup that > can be rather slow if compared to a O(1) lookup on ArrayList-like data > structure. > it's possible to speed it up by: > # using a last accessed buffer cache on the append only chunked list used on > LivePageCacheImpl, to speedup the most recent (& nearest) accesses > # using an array with the any fresh reloaded paged messages, in case of cache > reload -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-3049) Reduce live page lookup cost
Francesco Nigro created ARTEMIS-3049: Summary: Reduce live page lookup cost Key: ARTEMIS-3049 URL: https://issues.apache.org/jira/browse/ARTEMIS-3049 Project: ActiveMQ Artemis Issue Type: Improvement Components: Broker Affects Versions: 2.16.0 Reporter: Francesco Nigro Assignee: Francesco Nigro LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can be rather slow if compared to a O(1) lookup on ArrayList-like data structure. it's possible to speed it up by: # using a last accessed buffer cache on the append only chunked list used on LivePageCacheImpl, to speedup nearest accesses (very likely to happen with a single consumer) # using an array with the any fresh reloaded paged messages, in case of cache reload -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets
Francesco Nigro created ARTEMIS-3045: Summary: ReplicationManager can batch sent replicated packets Key: ARTEMIS-3045 URL: https://issues.apache.org/jira/browse/ARTEMIS-3045 Project: ActiveMQ Artemis Issue Type: Improvement Reporter: Francesco Nigro Assignee: Francesco Nigro -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3025) JsonReader char[] leak
[ https://issues.apache.org/jira/browse/ARTEMIS-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3025: - Description: The default Json provider ie https://github.com/apache/johnzon is using several pools while parsing eg {{org.apache.johnzon.max-string-length}} that wouldn't pool char[] if the JsonReader isn't properly closed. Currently we're not properly closing such readers and that means that we allocate {{org.apache.johnzon.max-string-length}} * 2 bytes on each handled notification ~ 20 MiB. Until https://bugs.openjdk.java.net/browse/JDK-8027959 ie JDK u40, with G1GC the mentioned char[] was (very likely) allocated into the old generation as Humongous Allocation, needing a Full GC to release it. was: The default Json provider ie https://github.com/apache/johnzon is using several pools while parsing eg {{org.apache.johnzon.max-string-length}} that wouldn't pool char[] if the JsonReader isn't properly closed. Currently we're not properly closing such readers and that means that we allocate {{org.apache.johnzon.max-string-length}} * 2 bytes on each handled notification ~ 20 MiB. Until https://bugs.openjdk.java.net/browse/JDK-8027959 ie JDK u40, with G1GC the mentioned char[] was (very likely) performed into the old generation, needing a Full GC to release it. > JsonReader char[] leak > -- > > Key: ARTEMIS-3025 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3025 > Project: ActiveMQ Artemis > Issue Type: Bug >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The default Json provider ie https://github.com/apache/johnzon is using > several pools while parsing eg {{org.apache.johnzon.max-string-length}} that > wouldn't pool char[] if the JsonReader isn't properly closed. > Currently we're not properly closing such readers and that means that we > allocate {{org.apache.johnzon.max-string-length}} * 2 bytes on each handled > notification ~ 20 MiB. > Until https://bugs.openjdk.java.net/browse/JDK-8027959 ie JDK u40, with G1GC > the mentioned char[] was (very likely) allocated into the old generation as > Humongous Allocation, needing a Full GC to release it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-3025) JsonReader char[] leak
[ https://issues.apache.org/jira/browse/ARTEMIS-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-3025: - Description: The default Json provider ie https://github.com/apache/johnzon is using several pools while parsing eg {{org.apache.johnzon.max-string-length}} that wouldn't pool char[] if the JsonReader isn't properly closed. Currently we're not properly closing such readers and that means that we allocate {{org.apache.johnzon.max-string-length}} * 2 bytes on each handled notification ~ 20 MiB. Until https://bugs.openjdk.java.net/browse/JDK-8027959 ie JDK u40, with G1GC the mentioned char[] was (very likely) performed into the old generation, needing a Full GC to release it. was: The default Json provider ie https://github.com/apache/johnzon is using several pools while parsing eg {{org.apache.johnzon.max-string-length}} that would leak char[] if a JsonReader isn't properly closed. Currently we're not properly closing such readers and that means leaking up to {{org.apache.johnzon.max-string-length}} * 2 bytes on each handled notification. > JsonReader char[] leak > -- > > Key: ARTEMIS-3025 > URL: https://issues.apache.org/jira/browse/ARTEMIS-3025 > Project: ActiveMQ Artemis > Issue Type: Bug >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The default Json provider ie https://github.com/apache/johnzon is using > several pools while parsing eg {{org.apache.johnzon.max-string-length}} that > wouldn't pool char[] if the JsonReader isn't properly closed. > Currently we're not properly closing such readers and that means that we > allocate {{org.apache.johnzon.max-string-length}} * 2 bytes on each handled > notification ~ 20 MiB. > Until https://bugs.openjdk.java.net/browse/JDK-8027959 ie JDK u40, with G1GC > the mentioned char[] was (very likely) performed into the old generation, > needing a Full GC to release it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-3025) JsonReader char[] leak
Francesco Nigro created ARTEMIS-3025: Summary: JsonReader char[] leak Key: ARTEMIS-3025 URL: https://issues.apache.org/jira/browse/ARTEMIS-3025 Project: ActiveMQ Artemis Issue Type: Bug Reporter: Francesco Nigro Assignee: Francesco Nigro The default Json provider ie https://github.com/apache/johnzon is using several pools while parsing eg {{org.apache.johnzon.max-string-length}} that would leak char[] if a JsonReader isn't properly closed. Currently we're not properly closing such readers and that means leaking up to {{org.apache.johnzon.max-string-length}} * 2 bytes on each handled notification. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-3021) OOM due to wrong CORE message memory estimation
Francesco Nigro created ARTEMIS-3021: Summary: OOM due to wrong CORE message memory estimation Key: ARTEMIS-3021 URL: https://issues.apache.org/jira/browse/ARTEMIS-3021 Project: ActiveMQ Artemis Issue Type: Bug Reporter: Francesco Nigro Assignee: Francesco Nigro Durable CORE messages can get their internal buffer enlarged by encodeHeadersAndProperties while being persisted on the journal, but the address size memory estimation using the estimated memory of a message is performed before that, making it less precise. This bad timing estimation, together with Netty ByteBuf auto-sizing mechanism can cause the broker to underestimate the message footprint, causing it to go OOM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-3016) Reduce DuplicateIDCache memory footprint
Francesco Nigro created ARTEMIS-3016: Summary: Reduce DuplicateIDCache memory footprint Key: ARTEMIS-3016 URL: https://issues.apache.org/jira/browse/ARTEMIS-3016 Project: ActiveMQ Artemis Issue Type: Improvement Reporter: Francesco Nigro Assignee: Francesco Nigro DuplicateIDCache uses many Long and Integer boxed instances that makes using duplicate caches id too memory hungry. This could be improved by using better data structures and pooling mechanism. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARTEMIS-2852) Huge performance decrease between versions 2.2.0 and 2.13.0
[ https://issues.apache.org/jira/browse/ARTEMIS-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233450#comment-17233450 ] Francesco Nigro commented on ARTEMIS-2852: -- [~adamw1pl] I can help to review the results, if needed, but my primary concern is...the numbers are reasonably around what was achieved with 2.2.0? If the conf were comparable > Huge performance decrease between versions 2.2.0 and 2.13.0 > --- > > Key: ARTEMIS-2852 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2852 > Project: ActiveMQ Artemis > Issue Type: Bug >Reporter: Kasper Kondzielski >Assignee: Francesco Nigro >Priority: Major > Fix For: 2.16.0 > > Attachments: Selection_433.png, Selection_434.png, Selection_440.png, > Selection_441.png, Selection_451.png > > > Hi, > Recently, we started to prepare a new revision of our blog-post in which we > test various implementations of replicated queues. Previous version can be > found here: [https://softwaremill.com/mqperf/] > We updated artemis binary to 2.13.0, regenerated configuration file and > applied all the performance tricks you told us last time. In particular these > were: > * the {{Xmx}} java parameter bumped to {{16G (now bumped to 48G)}} > * in {{broker.xml}}, the {{global-max-size}} setting changed to {{8G (this > one we forgot to set, but we suspect that it is not the issue)}} > * {{journal-type}} set to {{MAPPED}} > * {{journal-datasync}}, {{journal-sync-non-transactional}} and > {{journal-sync-transactional}} all set to false > Apart from that we changed machines' type we use to r5.2xlarge ( 8 cores, 64 > GIB memory, Network bandwidth Up to 10 Gbps, Storage bandwidth Up to 4,750 > Mbps) and we decided to always run twice as much receivers as senders. > From our tests it looks like version 2.13.0 is not scaling as well, with the > increase of senders and receivers, as version 2.2.0 (previously tested). > Basically is not scaling at all as the throughput stays almost at the same > level, while previously it used to grow linearly. > Here you can find our tests results for both versions: > [https://docs.google.com/spreadsheets/d/1kr9fzSNLD8bOhMkP7K_4axBQiKel1aJtpxsBCOy9ugU/edit?usp=sharing] > We are aware that now there is a dedicated page in documentation about > performance tuning, but we are surprised that same settings as before > performs much worse. > Maybe there is an obvious property which we overlooked which should be turned > on? > All changes between those versions together with the final configuration can > be found on this merged PR: > [https://github.com/softwaremill/mqperf/commit/6bfae489e11a250dc9e6ef59719782f839e8874a] > > Charts showing machines' usage in attachments. Memory consumed by artemis > process didn't exceed ~ 16 GB. Bandwidht and cpu weren't also a bottlenecks. > p.s. I wanted to ask this question on mailing list/nabble forum first but it > seems that I don't have permissions to do so even though I registered & > subscribed. Is that intentional? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARTEMIS-2852) Huge performance decrease between versions 2.2.0 and 2.13.0
[ https://issues.apache.org/jira/browse/ARTEMIS-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233415#comment-17233415 ] Francesco Nigro commented on ARTEMIS-2852: -- [~adamw1pl] any news on the numbers of the new version? :D > Huge performance decrease between versions 2.2.0 and 2.13.0 > --- > > Key: ARTEMIS-2852 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2852 > Project: ActiveMQ Artemis > Issue Type: Bug >Reporter: Kasper Kondzielski >Assignee: Francesco Nigro >Priority: Major > Fix For: 2.16.0 > > Attachments: Selection_433.png, Selection_434.png, Selection_440.png, > Selection_441.png, Selection_451.png > > > Hi, > Recently, we started to prepare a new revision of our blog-post in which we > test various implementations of replicated queues. Previous version can be > found here: [https://softwaremill.com/mqperf/] > We updated artemis binary to 2.13.0, regenerated configuration file and > applied all the performance tricks you told us last time. In particular these > were: > * the {{Xmx}} java parameter bumped to {{16G (now bumped to 48G)}} > * in {{broker.xml}}, the {{global-max-size}} setting changed to {{8G (this > one we forgot to set, but we suspect that it is not the issue)}} > * {{journal-type}} set to {{MAPPED}} > * {{journal-datasync}}, {{journal-sync-non-transactional}} and > {{journal-sync-transactional}} all set to false > Apart from that we changed machines' type we use to r5.2xlarge ( 8 cores, 64 > GIB memory, Network bandwidth Up to 10 Gbps, Storage bandwidth Up to 4,750 > Mbps) and we decided to always run twice as much receivers as senders. > From our tests it looks like version 2.13.0 is not scaling as well, with the > increase of senders and receivers, as version 2.2.0 (previously tested). > Basically is not scaling at all as the throughput stays almost at the same > level, while previously it used to grow linearly. > Here you can find our tests results for both versions: > [https://docs.google.com/spreadsheets/d/1kr9fzSNLD8bOhMkP7K_4axBQiKel1aJtpxsBCOy9ugU/edit?usp=sharing] > We are aware that now there is a dedicated page in documentation about > performance tuning, but we are surprised that same settings as before > performs much worse. > Maybe there is an obvious property which we overlooked which should be turned > on? > All changes between those versions together with the final configuration can > be found on this merged PR: > [https://github.com/softwaremill/mqperf/commit/6bfae489e11a250dc9e6ef59719782f839e8874a] > > Charts showing machines' usage in attachments. Memory consumed by artemis > process didn't exceed ~ 16 GB. Bandwidht and cpu weren't also a bottlenecks. > p.s. I wanted to ask this question on mailing list/nabble forum first but it > seems that I don't have permissions to do so even though I registered & > subscribed. Is that intentional? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-2996) Provide JMH Benchmarks for Artemis
Francesco Nigro created ARTEMIS-2996: Summary: Provide JMH Benchmarks for Artemis Key: ARTEMIS-2996 URL: https://issues.apache.org/jira/browse/ARTEMIS-2996 Project: ActiveMQ Artemis Issue Type: Bug Components: Tests Reporter: Francesco Nigro Assignee: Francesco Nigro In order to reliably measure performance of many Artemis component would be welcome to implement some https://github.com/openjdk/jmh benchmarks to be used for development purposes ie not part of the release -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARTEMIS-2852) Huge performance decrease between versions 2.2.0 and 2.13.0
[ https://issues.apache.org/jira/browse/ARTEMIS-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230761#comment-17230761 ] Francesco Nigro commented on ARTEMIS-2852: -- Thanks [~adamw1pl] for the info as well!! > Huge performance decrease between versions 2.2.0 and 2.13.0 > --- > > Key: ARTEMIS-2852 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2852 > Project: ActiveMQ Artemis > Issue Type: Bug >Reporter: Kasper Kondzielski >Priority: Major > Attachments: Selection_433.png, Selection_434.png, Selection_440.png, > Selection_441.png, Selection_451.png > > > Hi, > Recently, we started to prepare a new revision of our blog-post in which we > test various implementations of replicated queues. Previous version can be > found here: [https://softwaremill.com/mqperf/] > We updated artemis binary to 2.13.0, regenerated configuration file and > applied all the performance tricks you told us last time. In particular these > were: > * the {{Xmx}} java parameter bumped to {{16G (now bumped to 48G)}} > * in {{broker.xml}}, the {{global-max-size}} setting changed to {{8G (this > one we forgot to set, but we suspect that it is not the issue)}} > * {{journal-type}} set to {{MAPPED}} > * {{journal-datasync}}, {{journal-sync-non-transactional}} and > {{journal-sync-transactional}} all set to false > Apart from that we changed machines' type we use to r5.2xlarge ( 8 cores, 64 > GIB memory, Network bandwidth Up to 10 Gbps, Storage bandwidth Up to 4,750 > Mbps) and we decided to always run twice as much receivers as senders. > From our tests it looks like version 2.13.0 is not scaling as well, with the > increase of senders and receivers, as version 2.2.0 (previously tested). > Basically is not scaling at all as the throughput stays almost at the same > level, while previously it used to grow linearly. > Here you can find our tests results for both versions: > [https://docs.google.com/spreadsheets/d/1kr9fzSNLD8bOhMkP7K_4axBQiKel1aJtpxsBCOy9ugU/edit?usp=sharing] > We are aware that now there is a dedicated page in documentation about > performance tuning, but we are surprised that same settings as before > performs much worse. > Maybe there is an obvious property which we overlooked which should be turned > on? > All changes between those versions together with the final configuration can > be found on this merged PR: > [https://github.com/softwaremill/mqperf/commit/6bfae489e11a250dc9e6ef59719782f839e8874a] > > Charts showing machines' usage in attachments. Memory consumed by artemis > process didn't exceed ~ 16 GB. Bandwidht and cpu weren't also a bottlenecks. > p.s. I wanted to ask this question on mailing list/nabble forum first but it > seems that I don't have permissions to do so even though I registered & > subscribed. Is that intentional? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (ARTEMIS-2984) Compressed large messages can leak native resources
[ https://issues.apache.org/jira/browse/ARTEMIS-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on ARTEMIS-2984 started by Francesco Nigro. > Compressed large messages can leak native resources > --- > > Key: ARTEMIS-2984 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2984 > Project: ActiveMQ Artemis > Issue Type: Bug >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Compressed large messages use native resources in the form of Inflater and > Deflater and should release them in a timely manner (instead of relying on > finalization) to save OOM to happen (of direct memory, to be precise). > This issue has the chance to simplify large message controllers, because much > of the existing code on controllers (including compressed one) isn't needed > at runtime, but just for testing purposes and a proper fix can move dead code > there too, saving leaky behavior to be maintained. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2984) Compressed large messages can leak native resources
[ https://issues.apache.org/jira/browse/ARTEMIS-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2984: - Description: Compressed large messages use native resources in the form of Inflater and Deflater and should release them in a timely manner (instead of relying on finalization) to save OOM to happen (of direct memory, to be precise). This issue has the chance to simplify a lot the large message controller, because much of the existing code on controllers (including compressed one) isn't needed at runtime, but just for testing purposes and a proper fix can move dead code there too, saving leaky behavior to be maintained. > Compressed large messages can leak native resources > --- > > Key: ARTEMIS-2984 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2984 > Project: ActiveMQ Artemis > Issue Type: Bug >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Compressed large messages use native resources in the form of Inflater and > Deflater and should release them in a timely manner (instead of relying on > finalization) to save OOM to happen (of direct memory, to be precise). > This issue has the chance to simplify a lot the large message controller, > because much of the existing code on controllers (including compressed one) > isn't needed at runtime, but just for testing purposes and a proper fix can > move dead code there too, saving leaky behavior to be maintained. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2984) Compressed large messages can leak native resources
[ https://issues.apache.org/jira/browse/ARTEMIS-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2984: - Description: Compressed large messages use native resources in the form of Inflater and Deflater and should release them in a timely manner (instead of relying on finalization) to save OOM to happen (of direct memory, to be precise). This issue has the chance to simplify large message controllers, because much of the existing code on controllers (including compressed one) isn't needed at runtime, but just for testing purposes and a proper fix can move dead code there too, saving leaky behavior to be maintained. was: Compressed large messages use native resources in the form of Inflater and Deflater and should release them in a timely manner (instead of relying on finalization) to save OOM to happen (of direct memory, to be precise). This issue has the chance to simplify a lot the large message controller, because much of the existing code on controllers (including compressed one) isn't needed at runtime, but just for testing purposes and a proper fix can move dead code there too, saving leaky behavior to be maintained. > Compressed large messages can leak native resources > --- > > Key: ARTEMIS-2984 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2984 > Project: ActiveMQ Artemis > Issue Type: Bug >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Compressed large messages use native resources in the form of Inflater and > Deflater and should release them in a timely manner (instead of relying on > finalization) to save OOM to happen (of direct memory, to be precise). > This issue has the chance to simplify large message controllers, because much > of the existing code on controllers (including compressed one) isn't needed > at runtime, but just for testing purposes and a proper fix can move dead code > there too, saving leaky behavior to be maintained. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2984) Compressed large messages can leak native resources
[ https://issues.apache.org/jira/browse/ARTEMIS-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2984: - Summary: Compressed large messages can leak native resources (was: Compressed large messages retain native resources on errors) > Compressed large messages can leak native resources > --- > > Key: ARTEMIS-2984 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2984 > Project: ActiveMQ Artemis > Issue Type: Bug >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-2984) Compressed large messages retain native resources on errors
Francesco Nigro created ARTEMIS-2984: Summary: Compressed large messages retain native resources on errors Key: ARTEMIS-2984 URL: https://issues.apache.org/jira/browse/ARTEMIS-2984 Project: ActiveMQ Artemis Issue Type: Bug Reporter: Francesco Nigro Assignee: Francesco Nigro -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARTEMIS-2975) Not able to know if the artemis server is working properly or not.
[ https://issues.apache.org/jira/browse/ARTEMIS-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226546#comment-17226546 ] Francesco Nigro commented on ARTEMIS-2975: -- Hi [~Karanbvp] help us to understand what's "shared storage outage" and in which cases it occurs (eg server start, server running, failback, on failover, under load etc etc) possibly with some reproducer. AFAIK Artemis fully rely on reliable disk sanity and it should go into an I/O critical error state before shutting down in case of any i/O error. The sole exception to this seems related to file lock loss, that are using a background thread to validate and recover the lock. > Not able to know if the artemis server is working properly or not. > -- > > Key: ARTEMIS-2975 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2975 > Project: ActiveMQ Artemis > Issue Type: Bug >Reporter: Karan Aggarwal >Priority: Major > > Observed that sometimes, Artemis server is stuck due to shared storage outage. > It is neither in running state nor in stopped state. > Is there a way to get the current state? To identify that there is some issue > in server and restart it programmatically? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARTEMIS-2808) Artemis HA with shared storage strategy does not reconnect with shared storage if reconnection happens at shared storage
[ https://issues.apache.org/jira/browse/ARTEMIS-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226495#comment-17226495 ] Francesco Nigro commented on ARTEMIS-2808: -- Hi [~Karanbvp] sadly the restart allowed feature has demonstrated to be too dangerous due to memory/resource leaks and has been replaced with something more appropriate for use case we would like to cover (DBMS connectivity loss on shared store HA). > Artemis HA with shared storage strategy does not reconnect with shared > storage if reconnection happens at shared storage > > > Key: ARTEMIS-2808 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2808 > Project: ActiveMQ Artemis > Issue Type: Bug >Affects Versions: 2.11.0 > Environment: Windows 10 >Reporter: Karan Aggarwal >Priority: Blocker > Attachments: Scenario_1.zip, Scenario_2.zip > > > We verified the behavior of Artemis HA by bringing down the shared storage > (VM) while run is in progress and here is the observation: > *Scenario:* > * When Artemis services are up and running and run is in progress we > restarted the machine hosting the shared storage > * Shared storage was back up in 5 mins > * Both Artemis master and slave did not connect back to the shared storage > * We tried stopping the Artemis brokers. The slave stopped, but the master > did not stop. We had to kill the process. > * We tried to start the Artemis brokers. The master did not start up at all. > The slave started successfully. > * We restarted the master Artemis server. Server started successfully and > acquired back up. > Shared Storage type: NFS > Impact: The run is stopped and Artemis servers needs to be started again > every time shared storage connection goes down momentarily. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (ARTEMIS-2823) Improve JDBC connection management
[ https://issues.apache.org/jira/browse/ARTEMIS-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on ARTEMIS-2823 started by Francesco Nigro. > Improve JDBC connection management > -- > > Key: ARTEMIS-2823 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2823 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Reporter: Mikko >Assignee: Francesco Nigro >Priority: Major > Time Spent: 16h 20m > Remaining Estimate: 0h > > I have a case where the whole clustering reliability and HA must rely on HA > capabilities of clustered database, and running on top of application server > is not an option. > The current JDBC store implementation is rather bare bones on the connection > management side. JDBC driver is used directly with no management layer. At > startup, the broker just opens couple of direct connections to database and > expects them to be available forever. This is something that cannot be > expected in HA production environment. So, similarly to the discussion linked > below, in our case we lose the db connection after one hour, and all the > brokers need to be restared to get new connections: > [http://activemq.2283324.n4.nabble.com/Artemis-does-not-reconnect-to-MySQL-after-connection-timeout-td4751956.html] > > This is something that could be resolved by simply using JDBC4 isValid > checks, but proper connection handling and pooling through datasource would > be preferrable. > I have implemented a solution for this by using DBCP2 datasource. Our test > cluster has been successfully running this forked version since the release > of Artemis 2.13.0. I will prepare of pull request if this is seen to be > something that can be useful. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARTEMIS-2926) Scheduled task executions are skipped randomly
[ https://issues.apache.org/jira/browse/ARTEMIS-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro reassigned ARTEMIS-2926: Assignee: Francesco Nigro > Scheduled task executions are skipped randomly > -- > > Key: ARTEMIS-2926 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2926 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker >Affects Versions: 2.13.0 >Reporter: Apache Dev >Assignee: Francesco Nigro >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Scheduled tasks extending {{ActiveMQScheduledComponent}} could randomly skip > an execution, logging: > {code} > Execution ignored due to too many simultaneous executions, probably a > previous delayed execution > {code} > The problem is in the "ActiveMQScheduledComponent#runForExecutor" Runnable. > Times to be compared ({{currentTimeMillis()}} and {{lastTime}}) are taken > inside the runnable execution itself. So, depending on relative execution > times, it could happen that the difference is less than the given period > (e.g. 1 ms), resulting in a skipped execution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2958) Timed out waiting pool stop on backup restart
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop to be moved after {{callDeActiveCallbacks()}}, while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. The thread pool stop can block stopping if there is a long task still running/blocked in the pool and the default strategy while stopping the broker is to await 10 seconds before forcing a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in the thread pool. The core issue that cause the thread pool to be randomly blocked was present by long time, but the unavailability time window of JMX introduced by the mentioned JIRAs was the change that has triggered the bomb. The test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback): given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} And indeed it seems that both {{BackupManager::stop}} and {{BackupManager::activated}} are calling {{BackupConnector::close}} that's calling {{closeLocator(backupServerLocator)}} without unblocking {{clusterControl.authorize()}}. A possible fix would be to correctly unblock any blocking call on both cases, to clean stop {{BackupManager}} and let the broker thread pool to immediately stop. was: The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop to be moved after {{callDeActiveCallbacks()}}, while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. The thread pool stop can block stopping if there is a long task still running/blocked in the pool and the default strategy while stopping the
[jira] [Updated] (ARTEMIS-2958) Timed out waiting pool stop on backup restart
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop to be moved after {{callDeActiveCallbacks()}}, while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. The thread pool stop can block stopping if there is a long task still running/blocked in the pool and the default strategy while stopping the broker is to await 10 seconds before forcing a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in the thread pool. The core issue that cause the thread pool to be randomly blocked was present by long time, but the unavailability time window of JMX introduced by the mentioned JIRAs was the change that has triggered the bomb. The test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback): given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} And indeed it seems that both {{BackupManager::stop}} and {{BackupManager::activated}} are calling {{BackupConnector::close}} that's calling {{closeLocator(backupServerLocator)}} without unblocking {{clusterControl.authorize()}}. A first fix would be to correctly unblock any blocking call on both cases, to clean stop {{BackupManager}} and let the broker thread pool to immediately stop. was: The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop to be moved after {{callDeActiveCallbacks()}}, while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. The thread pool stop can block stopping if there is a long task still running/blocked in the pool and the default strategy while stopping the
[jira] [Updated] (ARTEMIS-2958) Timed out waiting pool stop on backup restart
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop to be moved after {{callDeActiveCallbacks()}}, while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. The thread pool stop can block stopping if there is a long task still running/blocked in the pool and the default strategy while stopping the broker is to await 10 seconds before forcing a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in the thread pool. The core issue that cause the thread pool to be randomly blocked was present by long time, but the unavailability time window of JMX introduced by the mentioned JIRAs was the change that has triggered the bomb. The test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback): given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} And indeed it seems that {{BackupManager::stop}} that's calling {{BackupConnector::close}} just {{closeLocator(backupServerLocator)}} that won't unblock {{clusterControl.authorize()}}. A first fix would be to correctly unblock any blocking call to clean stop {{BackupManager}} and let the broker thread pool to immediately stop. was: The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop to be moved after {{callDeActiveCallbacks()}}, while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending
[jira] [Updated] (ARTEMIS-2958) Timed out waiting pool stop on backup restart
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop to be moved after {{callDeActiveCallbacks()}}, while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in the thread pool. The core issue that cause the thread pool to be randomly blocked was present by long time, but the unavailability time window of JMX introduced by the mentioned JIRAs was the change that has triggered the bomb. The test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback): given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} And indeed it seems that {{BackupManager::stop}} that's calling {{BackupConnector::close}} just {{closeLocator(backupServerLocator)}} that won't unblock {{clusterControl.authorize()}}. A first fix would be to correctly unblock any blocking call to clean stop {{BackupManager}} and let the broker thread pool to immediately stop. was: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending tasks
[jira] [Updated] (ARTEMIS-2958) Timed out waiting pool stop on backup restart
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in the thread pool. The core issue that cause the thread pool to be randomly blocked was present by long time, but the unavailability time window of JMX introduced by the mentioned JIRAs was the change that has triggered the bomb. The test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback): given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} And indeed it seems that {{BackupManager::stop}} that's calling {{BackupConnector::close}} just {{closeLocator(backupServerLocator)}} that won't unblock {{clusterControl.authorize()}}. A first fix would be to correctly unblock any blocking call to clean stop {{BackupManager}} and let the broker thread pool to immediately stop. was: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending
[jira] [Updated] (ARTEMIS-2958) Timed out waiting pool stop on backup restart
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Summary: Timed out waiting pool stop on backup restart (was: Timed out waiting for pool slow down backup restart on failback) > Timed out waiting pool stop on backup restart > - > > Key: ARTEMIS-2958 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2958 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, JMX >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on > server stop to be moved after {{callDeActiveCallbacks()}} while the changes > on ARTEMIS-2838 to > {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on > {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to > happen on server stop. > it means that on server restart, if the thread pool has a slow stop, JMX > won't be available until a new start. > A slow thread pool stop can happen if there is any long task still > running/blocked in the pool: the default strategy while stopping the broker > is to await 10 seconds before force a shutdown of the pending tasks ie JMX > can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} > and before a new (pre)activation will register it again. > This stealthy behaviour has been captured by random failures on > {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, > due to some randomly blocked task in the thread pool. > The core issue that cause the thread pool to be randomly blocked was present > by long time, but the unavailability time window of JMX introduced by the > mentioned JIRAs was the change that has triggered the bomb. > The test check for 5 seconds the availability of backup JMX connection during > a backup restart (on failback): given that the default thread pool await time > is 10 seconds, a longer thread pool stop will make the test to fail. > It seems, by a thread dump inspection that the pending task is: > {code:java} > jmx-failback2-out:"Thread-1 > (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" > Id=43 TIMED_WAITING on > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 > jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) > jmx-failback2-out: - waiting on > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 > jmx-failback2-out: at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > jmx-failback2-out: at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) > jmx-failback2-out: at > org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) > jmx-failback2-out: - locked java.lang.Object@607e79a2 > jmx-failback2-out: at > org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) > jmx-failback2-out: at > org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) > jmx-failback2-out: at > org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) > jmx-failback2-out: at > org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) > jmx-failback2-out: at > org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) > jmx-failback2-out: at > org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) > jmx-failback2-out: at > org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown > Source) > jmx-failback2-out: at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > jmx-failback2-out: at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > jmx-failback2-out: at > org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) > jmx-failback2-out: > jmx-failback2-out: Number of locked synchronizers = 1 > jmx-failback2-out: - > java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 > {code} > And indeed it seems that {{BackupManager::stop}} that's calling > {{BackupConnector::close}} just {{closeLocator(backupServerLocator)}} that > won't unblock {{clusterControl.authorize()}}. > A first fix would be to correctly unblock any blocking call to clean stop > {{BackupManager}} and let
[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in the thread pool. The core issue that cause the thread pool to be randomly blocked was present by long time, but the unavailability time window of JMX introduced by the mentioned JIRAs was the change that has triggered the bomb. The test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback): given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems, by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} And indeed it seems that {{BackupManager::stop}} that's calling {{BackupConnector::close}} just {{closeLocator(backupServerLocator)}} that won't unblock {{clusterControl.authorize()}}. A first fix would be to correctly unblock any blocking call to clean stop {{BackupManager}} and let the broker thread pool to immediately stop. was: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending
[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in the thread pool. The core issue that cause the thread pool to be randomly blocked was present by long time, but the unavailability time window of JMX introduced by the mentioned JIRAs was the change that has triggered the bomb. The test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback): given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems, by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} And indeed it seems that {{BackupManager::stop}} that's calling {{BackupConnector::close}} just {{closeLocator(backupServerLocator)}} that won't unblock {{clusterControl.authorize()}}. A first fix would be to correctly unblock any blocking call to clean stop {{BackupManager}} and let the broker thread pool to stop immediately. was: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending
[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in the thread pool. The core issue that cause the thread pool to be randomly blocked was present by long time, but the unavailability time window of JMX introduced by the mentioned JIRAs was the change that has triggered the bomb. The test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback): given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems, by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} was: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked
[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in the thread pool. The core issue that cause the thread pool to be randomly blocked was present by long time, but the unavailability time window of JMX introduced by the mentioned JIRAs was the change that has triggered the bomb. This test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback): given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems, by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} was: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly
[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy while stopping the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in the thread pool. The core issue that cause the thread pool to be randomly blocked was present by long time, but the unavailability time window of JMX introduced by the mentioned JIRAs was that change that has triggered the bomb. This test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback): given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems, by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} was: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy of the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in
[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy of the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in the thread pool. The core issue that cause the thread pool to be randomly blocked was present by long time, but the unavailability time window of JMX introduced by the mentioned JIRAs was that change that has triggered the bomb. This test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems, by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} was: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy of the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. The issue that was causing the thread pool to be blocked randomly awaiting an executing task was present by long time,
[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy of the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in the thread pool. The core issue that cause the thread pool to be randomly blocked was present by long time, but the unavailability time window of JMX introduced by the mentioned JIRAs was that change that has triggered the bomb. This test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback): given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems, by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} was: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy of the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. This stealthy behaviour has been captured by random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to some randomly blocked task in the thread
[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if the thread pool has a slow stop, JMX won't be available until a new start. A slow thread pool stop can happen if there is any long task still running/blocked in the pool: the default strategy of the broker is to await 10 seconds before force a shutdown of the pending tasks ie JMX can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new (pre)activation will register it again. The issue that was causing the thread pool to be blocked randomly awaiting an executing task was present by long time, but the unavailability of JMX introduced by the mentioned JIRAs has caused some random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}. This test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems, by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} was: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if any task is pending on the thread pool stop, the thread pool wouldn't let a start to activate JMX again for the default 10 seconds required to force a pool shutdown. The issue that was causing the thread pool to be blocked randomly awaiting an executing task was present by long time, but the unavailability of JMX introduced by the mentioned JIRAs has caused some random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}. This test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2,
[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to happen on server stop. it means that on server restart, if any task is pending on the thread pool stop, the thread pool wouldn't let a start to activate JMX again for the default 10 seconds required to force a pool shutdown. The issue that was causing the thread pool to be blocked randomly awaiting an executing task was present by long time, but the unavailability of JMX introduced by the mentioned JIRAs has caused some random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}. This test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems, by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} was: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity registration to happen on server stop. it means that on server restart, if any task is pending on the thread pool stop, the thread pool wouldn't let a start to activate JMX again for the default 10 seconds required to force a pool shutdown. The issue that was causing the thread pool to be blocked randomly awaiting an executing task was present by long time, but the unavailability of JMX introduced by the mentioned JIRAs has caused some random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}. This test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems, by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1
[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} while the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}} have moved the HawtioSecurity registration to happen on server stop. it means that on server restart, if any task is pending on the thread pool stop, the thread pool wouldn't let a start to activate JMX again for the default 10 seconds required to force a pool shutdown. The issue that was causing the thread pool to be blocked randomly awaiting an executing task was present by long time, but the unavailability of JMX introduced by the mentioned JIRAs has caused some random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}. This test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems, by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} was: The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} and the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}}: it means that on server restart, if any task is pending on the thread pool stop, the thread pool wouldn't let a start to activate JMX again for the default 10 seconds required to force a pool shutdown. The issue that was causing the thread pool to be blocked randomly awaiting an executing task was present by long time, but the unavailability of JMX introduced by the mentioned JIRAs has caused some random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}. This test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems, by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1
[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} and the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}}: it means that on server restart, if any task is pending on the thread pool stop, the thread pool wouldn't let a start to activate JMX again for the default 10 seconds required to force a pool shutdown. The issue that was causing the thread pool to be blocked randomly awaiting an executing task was present by long time, but the unavailability of JMX introduced by the mentioned JIRAs has caused some random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}. This test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It seems, by a thread dump inspection that the pending task is: {code:java} jmx-failback2-out:"Thread-1 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)" Id=43 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at sun.misc.Unsafe.park(Native Method) jmx-failback2-out: - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1 jmx-failback2-out: at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) jmx-failback2-out: at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504) jmx-failback2-out: - locked java.lang.Object@607e79a2 jmx-failback2-out: at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80) jmx-failback2-out: at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) jmx-failback2-out: at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown Source) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) jmx-failback2-out: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) jmx-failback2-out: at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) jmx-failback2-out: jmx-failback2-out: Number of locked synchronizers = 1 jmx-failback2-out: - java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8 {code} was: The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} and the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}}: it means that on server restart, if any task is pending on the thread pool stop, the thread pool wouldn't let a start to activate JMX again for the default 10 seconds required to force a pool shutdown. The issue that was causing the thread pool to be blocked randomly awaiting an executing task was present by long time, but the unavailability of JMX introduced by the mentioned JIRAs has caused some random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}. This test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It's important to investigate what is causing the global thread pool of the backup server on failback to not stop immediately. > Timed out waiting for pool slow down backup restart on failback > --- > >
[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Component/s: JMX Broker > Timed out waiting for pool slow down backup restart on failback > --- > > Key: ARTEMIS-2958 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2958 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, JMX >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Minor > > The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server > stop to be moved after {{callDeActiveCallbacks()}} and the changes on > ARTEMIS-2838 to > {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on > {{callDeActiveCallbacks()}}: it means that on server restart, if any task is > pending on the thread pool stop, the thread pool wouldn't let a start to > activate JMX again for the default 10 seconds required to force a pool > shutdown. > The issue that was causing the thread pool to be blocked randomly awaiting an > executing task was present by long time, but the unavailability of JMX > introduced by the mentioned JIRAs has caused some random failures on > {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}. > This test check for 5 seconds the availability of backup JMX connection > during a backup restart (on failback)) ie {{Wait.assertTrue(() -> > testConnection(url2, objectNameBuilder2), 5_000, 100);}}: given that the > default thread pool await time is 10 seconds, a longer thread pool stop will > make the test to fail. > It's important to investigate what is causing the global thread pool of the > backup server on failback to not stop immediately. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
[ https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2958: - Description: The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} and the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}}: it means that on server restart, if any task is pending on the thread pool stop, the thread pool wouldn't let a start to activate JMX again for the default 10 seconds required to force a pool shutdown. The issue that was causing the thread pool to be blocked randomly awaiting an executing task was present by long time, but the unavailability of JMX introduced by the mentioned JIRAs has caused some random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}. This test check for 5 seconds the availability of backup JMX connection during a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await time is 10 seconds, a longer thread pool stop will make the test to fail. It's important to investigate what is causing the global thread pool of the backup server on failback to not stop immediately. was: The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} and the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}}: it means that if any task is pending on the thread pool, the thread pool would left JMX to be unavailable for at least the default 10 seconds required to force a pool shutdown. The issue that was causing the thread pool to be blocked randomly awaiting an executing task was present by long time, but the unavailability of JMX introduced by the mentioned JIRAs has caused some random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to {{Wait.assertTrue(() -> testConnection(url2, objectNameBuilder2), 5_000, 100);}} that was awaiting only 5 seconds the JMX connection to be available on backup restart: given that the default thread pool await time is 10 seconds that was the primary cause of the failure. It's important to investigate what is causing the global thread pool of the backup server on failback to not stop immediately. > Timed out waiting for pool slow down backup restart on failback > --- > > Key: ARTEMIS-2958 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2958 > Project: ActiveMQ Artemis > Issue Type: Bug >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Minor > > The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server > stop to be moved after {{callDeActiveCallbacks()}} and the changes on > ARTEMIS-2838 to > {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on > {{callDeActiveCallbacks()}}: it means that on server restart, if any task is > pending on the thread pool stop, the thread pool wouldn't let a start to > activate JMX again for the default 10 seconds required to force a pool > shutdown. > The issue that was causing the thread pool to be blocked randomly awaiting an > executing task was present by long time, but the unavailability of JMX > introduced by the mentioned JIRAs has caused some random failures on > {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}. > This test check for 5 seconds the availability of backup JMX connection > during a backup restart (on failback)) ie {{Wait.assertTrue(() -> > testConnection(url2, objectNameBuilder2), 5_000, 100);}}: given that the > default thread pool await time is 10 seconds, a longer thread pool stop will > make the test to fail. > It's important to investigate what is causing the global thread pool of the > backup server on failback to not stop immediately. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback
Francesco Nigro created ARTEMIS-2958: Summary: Timed out waiting for pool slow down backup restart on failback Key: ARTEMIS-2958 URL: https://issues.apache.org/jira/browse/ARTEMIS-2958 Project: ActiveMQ Artemis Issue Type: Bug Reporter: Francesco Nigro Assignee: Francesco Nigro The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server stop to be moved after {{callDeActiveCallbacks()}} and the changes on ARTEMIS-2838 to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on {{callDeActiveCallbacks()}}: it means that if any task is pending on the thread pool, the thread pool would left JMX to be unavailable for at least the default 10 seconds required to force a pool shutdown. The issue that was causing the thread pool to be blocked randomly awaiting an executing task was present by long time, but the unavailability of JMX introduced by the mentioned JIRAs has caused some random failures on {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, due to {{Wait.assertTrue(() -> testConnection(url2, objectNameBuilder2), 5_000, 100);}} that was awaiting only 5 seconds the JMX connection to be available on backup restart: given that the default thread pool await time is 10 seconds that was the primary cause of the failure. It's important to investigate what is causing the global thread pool of the backup server on failback to not stop immediately. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2957) ManagementContext is started twice
[ https://issues.apache.org/jira/browse/ARTEMIS-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2957: - Description: ManagementContext isn't guarding from being started twice. A recent change on [ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has introduced a {{ActivateCallback::preActivate}} that's (re)starting the {{ManagementContext}}. Just guarding against consecutive starts is fixing this, but probably we should start it just once. The failing test due to this was {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, with {code:java} jmx-failback1-out:2020-10-20 22:48:46,515 WARN [org.apache.activemq.artemis.core.server] AMQ97: Unable to start Management Context, RBAC not available: java.rmi.server.ExportException: internal error: ObjID already in use jmx-failback1-out: at sun.rmi.transport.ObjectTable.putTarget(ObjectTable.java:186) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.transport.Transport.exportObject(Transport.java:106) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:260) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:208) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.registry.RegistryImpl.setup(RegistryImpl.java:152) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.registry.RegistryImpl.(RegistryImpl.java:137) [rt.jar:1.8.0_66] jmx-failback1-out: at java.rmi.registry.LocateRegistry.createRegistry(LocateRegistry.java:203) [rt.jar:1.8.0_66] jmx-failback1-out: at org.apache.activemq.artemis.core.server.management.RmiRegistryFactory.init(RmiRegistryFactory.java:48) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.management.ManagementConnector.start(ManagementConnector.java:54) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.management.ManagementContext.start(ManagementContext.java:50) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.commands.Run$1.preActivate(Run.java:88) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.callPreActiveCallbacks(ActiveMQServerImpl.java:2840) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart1(ActiveMQServerImpl.java:2964) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.SharedStoreLiveActivation.run(SharedStoreLiveActivation.java:66) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStart(ActiveMQServerImpl.java:626) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.start(ActiveMQServerImpl.java:550) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.integration.FileBroker.start(FileBroker.java:64) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.commands.Run.execute(Run.java:116) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.Artemis.internalExecute(Artemis.java:153) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:101) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:128) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_66] jmx-failback1-out: at java.lang.reflect.Method.invoke(Method.java:497) [rt.jar:1.8.0_66] jmx-failback1-out: at org.apache.activemq.artemis.boot.Artemis.execute(Artemis.java:134) [artemis-boot.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at
[jira] [Updated] (ARTEMIS-2957) ManagementContext is started twice
[ https://issues.apache.org/jira/browse/ARTEMIS-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2957: - Description: ManagementContext isn't guarding from being started twice. A recent change on [ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has introduced a ActivateCallback::preActivate that's (re)starting the {{ManagementContext}}. Just guarding against consecutive starts is fixing this, but probably we should start it just once. The failing test due to this was {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, with {code:java} jmx-failback1-out:2020-10-20 22:48:46,515 WARN [org.apache.activemq.artemis.core.server] AMQ97: Unable to start Management Context, RBAC not available: java.rmi.server.ExportException: internal error: ObjID already in use jmx-failback1-out: at sun.rmi.transport.ObjectTable.putTarget(ObjectTable.java:186) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.transport.Transport.exportObject(Transport.java:106) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:260) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:208) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.registry.RegistryImpl.setup(RegistryImpl.java:152) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.registry.RegistryImpl.(RegistryImpl.java:137) [rt.jar:1.8.0_66] jmx-failback1-out: at java.rmi.registry.LocateRegistry.createRegistry(LocateRegistry.java:203) [rt.jar:1.8.0_66] jmx-failback1-out: at org.apache.activemq.artemis.core.server.management.RmiRegistryFactory.init(RmiRegistryFactory.java:48) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.management.ManagementConnector.start(ManagementConnector.java:54) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.management.ManagementContext.start(ManagementContext.java:50) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.commands.Run$1.preActivate(Run.java:88) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.callPreActiveCallbacks(ActiveMQServerImpl.java:2840) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart1(ActiveMQServerImpl.java:2964) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.SharedStoreLiveActivation.run(SharedStoreLiveActivation.java:66) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStart(ActiveMQServerImpl.java:626) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.start(ActiveMQServerImpl.java:550) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.integration.FileBroker.start(FileBroker.java:64) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.commands.Run.execute(Run.java:116) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.Artemis.internalExecute(Artemis.java:153) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:101) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:128) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_66] jmx-failback1-out: at java.lang.reflect.Method.invoke(Method.java:497) [rt.jar:1.8.0_66] jmx-failback1-out: at org.apache.activemq.artemis.boot.Artemis.execute(Artemis.java:134) [artemis-boot.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at
[jira] [Updated] (ARTEMIS-2957) ManagementContext is started twice
[ https://issues.apache.org/jira/browse/ARTEMIS-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2957: - Description: ManagementContext isn't guarding from being started twice. A recent change on [ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has introduced a ActivateCallback::preActivate that's (re)starting the managementContext. Just guarding against consecutive starts is fixing this, but probably we should start it just once. The failing test due to this was {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}, with {code:java} jmx-failback1-out:2020-10-20 22:48:46,515 WARN [org.apache.activemq.artemis.core.server] AMQ97: Unable to start Management Context, RBAC not available: java.rmi.server.ExportException: internal error: ObjID already in use jmx-failback1-out: at sun.rmi.transport.ObjectTable.putTarget(ObjectTable.java:186) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.transport.Transport.exportObject(Transport.java:106) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:260) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:208) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.registry.RegistryImpl.setup(RegistryImpl.java:152) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.rmi.registry.RegistryImpl.(RegistryImpl.java:137) [rt.jar:1.8.0_66] jmx-failback1-out: at java.rmi.registry.LocateRegistry.createRegistry(LocateRegistry.java:203) [rt.jar:1.8.0_66] jmx-failback1-out: at org.apache.activemq.artemis.core.server.management.RmiRegistryFactory.init(RmiRegistryFactory.java:48) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.management.ManagementConnector.start(ManagementConnector.java:54) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.management.ManagementContext.start(ManagementContext.java:50) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.commands.Run$1.preActivate(Run.java:88) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.callPreActiveCallbacks(ActiveMQServerImpl.java:2840) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart1(ActiveMQServerImpl.java:2964) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.SharedStoreLiveActivation.run(SharedStoreLiveActivation.java:66) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStart(ActiveMQServerImpl.java:626) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.start(ActiveMQServerImpl.java:550) [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.integration.FileBroker.start(FileBroker.java:64) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.commands.Run.execute(Run.java:116) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.Artemis.internalExecute(Artemis.java:153) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:101) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:128) [artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_66] jmx-failback1-out: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_66] jmx-failback1-out: at java.lang.reflect.Method.invoke(Method.java:497) [rt.jar:1.8.0_66] jmx-failback1-out: at org.apache.activemq.artemis.boot.Artemis.execute(Artemis.java:134) [artemis-boot.jar:2.16.0-SNAPSHOT] jmx-failback1-out: at
[jira] [Updated] (ARTEMIS-2957) ManagementContext is started twice
[ https://issues.apache.org/jira/browse/ARTEMIS-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2957: - Description: ManagementContext isn't guarding from being started twice. A recent change on [ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has introduced a ActivateCallback::preActivate that's (re)starting the managementContext. Just guarding against consecutive starts is fixing this, but probably we should start it just once. The failing test due to this was org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX was: ManagementContext isn't guarding from being started twice. A recent change on [ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has introduced a ActivateCallback::preActivate that's (re)starting the managementContext. Just guarding against consecutive starts is fixing this, but probably we should start it just once. > ManagementContext is started twice > -- > > Key: ARTEMIS-2957 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2957 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: JMX >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > ManagementContext isn't guarding from being started twice. > A recent change on > [ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has > introduced a ActivateCallback::preActivate that's (re)starting the > managementContext. > Just guarding against consecutive starts is fixing this, but probably we > should start it just once. > The failing test due to this was > org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (ARTEMIS-2957) ManagementContext is started twice
[ https://issues.apache.org/jira/browse/ARTEMIS-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on ARTEMIS-2957 started by Francesco Nigro. > ManagementContext is started twice > -- > > Key: ARTEMIS-2957 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2957 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: JMX >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > ManagementContext isn't guarding from being started twice. > A recent change on > [ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has > introduced a ActivateCallback::preActivate that's (re)starting the > managementContext. > Just guarding against consecutive starts is fixing this, but probably we > should start it just once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-2957) ManagementContext is started twice
Francesco Nigro created ARTEMIS-2957: Summary: ManagementContext is started twice Key: ARTEMIS-2957 URL: https://issues.apache.org/jira/browse/ARTEMIS-2957 Project: ActiveMQ Artemis Issue Type: Bug Components: JMX Affects Versions: 2.16.0 Reporter: Francesco Nigro Assignee: Francesco Nigro ManagementContext isn't guarding from being started twice. A recent change on [ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has introduced a ActivateCallback::preActivate that's (re)starting the managementContext. Just guarding against consecutive starts is fixing this, but probably we should start it just once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro resolved ARTEMIS-2955. -- Fix Version/s: 2.16.0 Resolution: Fixed > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Fix For: 2.16.0 > > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, > screenshot-1.png, screenshot-2.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of > [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. Specifically it seems > related to Derby GenericActivationHolder. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217404#comment-17217404 ] Francesco Nigro edited comment on ARTEMIS-2955 at 10/20/20, 8:27 AM: - [~gtully] has noted that poolPreparedStatements is false by default on commons-dbcp2: let me check if this can be improved by setting it as true by default (that makes sense to me for a real broker too). was (Author: nigrofranz): [~gtully] hos noted that poolPreparedStatements is false by default on commons-dbcp2: let me check if this can be improved by setting it as true by default (that makes sense to me for a real broker too). > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, > screenshot-1.png, screenshot-2.png > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of > [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. Specifically it seems > related to Derby GenericActivationHolder. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217432#comment-17217432 ] Francesco Nigro edited comment on ARTEMIS-2955 at 10/20/20, 8:35 AM: - Thanks to the great suggestion of [~gtully], this is going to fix a "likely" performance issue on the broker, not just on the test: going to update the JIRA. See the resulting flamegraph !screenshot-2.png! There is no more traces of reflective access nor creation of prepared statements and perfs are even better then using EmbeddedDataSource given that we don't allocate anymore a new connection each time. Please [~mikkommku] take a look to my PR (will send it soon) so I know I'm not impacting anything for you as well: it's very likely I'm going to turn on poolPreparedStatements by default and users can choose if they prefer to change it to false by manually setting it, in case. one question about {quote}The reason I did not enable it on my pull request was that it's not something universal that's available on alternative datasources, and couldn't quite figure out whether implementation changes were needed to prepare for use of other datasources.{quote} I'm just using {code:java} addDataSourceProperty("poolPreparedStatements", "true"); {code} if dataSourceProperties.isEmpty() but I see a {code:java} addDataSourceProperty("maxTotal", "-1"); {code} Probably we should check if the data source class used is the default one before setting both these default, I don't think "maxTotal" is available to other data source solutions as well. was (Author: nigrofranz): Thanks to the great suggestion of [~gtully], this is going to fix a "likely" performance issue on the broker, not just on the test: going to update the JIRA. See the resulting flamegraph !screenshot-2.png! There is no more traces of reflective access nor creation of prepared statements and perfs are even better then using EmbeddedDataSource given that we don't allocate anymore a new connection each time. Please [~mikkommku] take a look to my PR (will send it soon) so I know I'm not impacting anything for you as well: it's very likely I'm going to turn on poolPreparedStatements by default and users can choose if they prefer to change it to false by manually setting it, in case. > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, > screenshot-1.png, screenshot-2.png > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of > [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. Specifically it seems > related to Derby GenericActivationHolder. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217432#comment-17217432 ] Francesco Nigro commented on ARTEMIS-2955: -- Thanks to the great suggestion of [~gtully], this is going to fix a "likely" performance issue on the broker, not just on the test: going to update the JIRA. See the resulting flamegraph !screenshot-2.png! There is no more traces of reflective access nor creation of prepared statements and perfs are even better then using EmbeddedDataSource given that we don't allocate anymore a new connection each time. Please [~mikkommku] take a look to my PR (will send it soon) so I know I'm not impacting anything for you as well: it's very likely I'm going to turn on poolPreparedStatements by default and users can choose if they prefer to change it to false by manually setting it, in case. > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, > screenshot-1.png, screenshot-2.png > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of > [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. Specifically it seems > related to Derby GenericActivationHolder. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2955: - Attachment: screenshot-2.png > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, > screenshot-1.png, screenshot-2.png > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of > [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. Specifically it seems > related to Derby GenericActivationHolder. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217404#comment-17217404 ] Francesco Nigro edited comment on ARTEMIS-2955 at 10/20/20, 8:05 AM: - [~gtully] hos noted that poolPreparedStatements is false by default on commons-dbcp2: let me check if this can be improved by setting it as true by default (that makes sense to me for a real broker too). was (Author: nigrofranz): [~gtully] Node that poolPreparedStatements is false by default on commons-dbcp2: let me check if this can be improved by setting it as true by default (that makes sense to me for a real broker too). > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, > screenshot-1.png > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of > [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. Specifically it seems > related to Derby GenericActivationHolder. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217404#comment-17217404 ] Francesco Nigro commented on ARTEMIS-2955: -- [~gtully] Node that poolPreparedStatements is false by default on commons-dbcp2: let me check if this can be improved by setting it as true by default (that makes sense to me for a real broker too). > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, > screenshot-1.png > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of > [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. Specifically it seems > related to Derby GenericActivationHolder. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2955: - Description: The test suite has shown an increase in duration of 30 minutes going from to 2:30 to 3 hours: it seems related to integration paging tests running on Embedded Derby with the changes of [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. After some profiling sessions on org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll it seems that using commons-dbcp2 with Embedded Derby isn't working as expected: !image-2020-10-20-09-08-45-390.png! while, if we switch to [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] we have !image-2020-10-20-09-10-10-644.png! By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: it seems that commons-dbcp2 is forcing to setup from the ground the prepared statement each time, while EmbeddedDataSource nope. Specifically it seems related to Derby GenericActivationHolder. I suggest to disable commons-dbcp2 for Derby and investigate if it could happen in a real broker too. was: The test suite has shown an increase in duration of 30 minutes going from to 2:30 to 3 hours: it seems related to integration paging tests running on Embedded Derby with the changes of [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. After some profiling sessions on org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll it seems that using commons-dbcp2 with Embedded Derby isn't working as expected: !image-2020-10-20-09-08-45-390.png! while, if we switch to [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] we have !image-2020-10-20-09-10-10-644.png! By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: it seems that commons-dbcp2 is forcing to setup from the ground the prepared statement each time, while EmbeddedDataSource nope. Just for reference, in violet, that's the amount of "heavy" work performed by EmbeddedDataSource if we use commons-dbcp2: !screenshot-1.png! It 's clear that the update row is performing a huge additional amount of work... I suggest to disable commons-dbcp2 for Derby and investigate if it could happen in a real broker too. > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, > screenshot-1.png > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of > [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. Specifically it seems > related to Derby GenericActivationHolder. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2955: - Description: The test suite has shown an increase in duration of 30 minutes going from to 2:30 to 3 hours: it seems related to integration paging tests running on Embedded Derby with the changes of [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. After some profiling sessions on org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll it seems that using commons-dbcp2 with Embedded Derby isn't working as expected: !image-2020-10-20-09-08-45-390.png! while, if we switch to [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] we have !image-2020-10-20-09-10-10-644.png! By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: it seems that commons-dbcp2 is forcing to setup from the ground the prepared statement each time, while EmbeddedDataSource nope. Just for reference, in violet, that's the amount of "heavy" work performed by EmbeddedDataSource if we use commons-dbcp2: !screenshot-1.png! It 's clear that the update row is performing a huge additional amount of work... I suggest to disable commons-dbcp2 for Derby and investigate if it could happen in a real broker too. was: The test suite has shown an increase in duration of 30 minutes going from to 2:30 to 3 hours: it seems related to integration paging tests running on Embedded Derby with the changes of [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. After some profiling sessions on org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll it seems that using commons-dbcp2 with Embedded Derby isn't working as expected: !image-2020-10-20-09-08-45-390.png! while, if we switch to [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] we have !image-2020-10-20-09-10-10-644.png! By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: it seems that commons-dbcp2 is forcing to setup from the ground the prepared statement each time, while EmbeddedDataSource nope. I suggest to disable commons-dbcp2 for Derby and investigate if it could happen in a real broker too. > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, > screenshot-1.png > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of > [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. > Just for reference, in violet, that's the amount of "heavy" work performed by > EmbeddedDataSource if we use commons-dbcp2: > !screenshot-1.png! > It 's clear that the update row is performing a huge additional amount of > work... > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2955: - Attachment: screenshot-1.png > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, > screenshot-1.png > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of > [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217377#comment-17217377 ] Francesco Nigro commented on ARTEMIS-2955: -- [~mikkommku] It's something you've noted in the test env while using commons-dbcp2? > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of > [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2955: - Description: The test suite has shown an increase in duration of 30 minutes going from to 2:30 to 3 hours: it seems related to integration paging tests running on Embedded Derby with the changes of [ARTEMIS-2823 Improve JDBC connection management|https://issues.apache.org/jira/browse/ARTEMIS-2823]. After some profiling sessions on org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll it seems that using commons-dbcp2 with Embedded Derby isn't working as expected: !image-2020-10-20-09-08-45-390.png! while, if we switch to [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] we have !image-2020-10-20-09-10-10-644.png! By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: it seems that commons-dbcp2 is forcing to setup from the ground the prepared statement each time, while EmbeddedDataSource nope. I suggest to disable commons-dbcp2 for Derby and investigate if it could happen in a real broker too. was: The test suite has shown an increase in duration of 30 minutes going from to 2:30 to 3 hours: it seems related to integration paging tests running on Embedded Derby with the changes of [ARTEMIS-2823 Improve JDBC connection management|https://issues.apache.org/jira/browse/ARTEMIS-2823]. After some profiling sessions on org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll it seems that using commons-dbcp2 with Embedded Derby isn't working as expected: !image-2020-10-20-09-08-45-390.png! while, if we switch to [EmbeddedDataSource.|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] we have !image-2020-10-20-09-10-10-644.png! By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: it seems that commons-dbcp2 is forcing to setup from the ground the prepared statement each time, while EmbeddedDataSource nope. I suggest to disable commons-dbcp2 for Derby and investigate if it could happen in a real broker too. > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of [ARTEMIS-2823 > Improve JDBC connection > management|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2955: - Description: The test suite has shown an increase in duration of 30 minutes going from to 2:30 to 3 hours: it seems related to integration paging tests running on Embedded Derby with the changes of [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. After some profiling sessions on org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll it seems that using commons-dbcp2 with Embedded Derby isn't working as expected: !image-2020-10-20-09-08-45-390.png! while, if we switch to [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] we have !image-2020-10-20-09-10-10-644.png! By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: it seems that commons-dbcp2 is forcing to setup from the ground the prepared statement each time, while EmbeddedDataSource nope. I suggest to disable commons-dbcp2 for Derby and investigate if it could happen in a real broker too. was: The test suite has shown an increase in duration of 30 minutes going from to 2:30 to 3 hours: it seems related to integration paging tests running on Embedded Derby with the changes of [ARTEMIS-2823 Improve JDBC connection management|https://issues.apache.org/jira/browse/ARTEMIS-2823]. After some profiling sessions on org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll it seems that using commons-dbcp2 with Embedded Derby isn't working as expected: !image-2020-10-20-09-08-45-390.png! while, if we switch to [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] we have !image-2020-10-20-09-10-10-644.png! By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: it seems that commons-dbcp2 is forcing to setup from the ground the prepared statement each time, while EmbeddedDataSource nope. I suggest to disable commons-dbcp2 for Derby and investigate if it could happen in a real broker too. > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of > [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2955: - Description: The test suite has shown an increase in duration of 30 minutes going from to 2:30 to 3 hours: it seems related to integration paging tests running on Embedded Derby with the changes of [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. After some profiling sessions on org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll it seems that using commons-dbcp2 with Embedded Derby isn't working as expected: !image-2020-10-20-09-08-45-390.png! while, if we switch to [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] we have !image-2020-10-20-09-10-10-644.png! By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: it seems that commons-dbcp2 is forcing to setup from the ground the prepared statement each time, while EmbeddedDataSource nope. I suggest to disable commons-dbcp2 for Derby and investigate if it could happen in a real broker too. was: The test suite has shown an increase in duration of 30 minutes going from to 2:30 to 3 hours: it seems related to integration paging tests running on Embedded Derby with the changes of [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. After some profiling sessions on org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll it seems that using commons-dbcp2 with Embedded Derby isn't working as expected: !image-2020-10-20-09-08-45-390.png! while, if we switch to [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] we have !image-2020-10-20-09-10-10-644.png! By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: it seems that commons-dbcp2 is forcing to setup from the ground the prepared statement each time, while EmbeddedDataSource nope. I suggest to disable commons-dbcp2 for Derby and investigate if it could happen in a real broker too. > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of > [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
[ https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2955: - Description: The test suite has shown an increase in duration of 30 minutes going from to 2:30 to 3 hours: it seems related to integration paging tests running on Embedded Derby with the changes of [ARTEMIS-2823 Improve JDBC connection management|https://issues.apache.org/jira/browse/ARTEMIS-2823]. After some profiling sessions on org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll it seems that using commons-dbcp2 with Embedded Derby isn't working as expected: !image-2020-10-20-09-08-45-390.png! while, if we switch to [EmbeddedDataSource.|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] we have !image-2020-10-20-09-10-10-644.png! By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: it seems that commons-dbcp2 is forcing to setup from the ground the prepared statement each time, while EmbeddedDataSource nope. I suggest to disable commons-dbcp2 for Derby and investigate if it could happen in a real broker too. was: The test suite has shown an increase in duration of 30 minutes going from to 2:30 to 3 hours: it seems related to integration paging tests running on Embedded Derby with the changes of [ARTEMIS-2823 Improve JDBC connection management|https://issues.apache.org/jira/browse/ARTEMIS-2823]. After some profiling sessions on org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll it seems that using commons-dbcp2 with Embedded Derby isn't working as expected: !image-2020-10-20-09-08-45-390.png! while, if we switch to https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html we have !image-2020-10-20-09-10-10-644.png! By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: it seems that commons-dbcp2 is forcing to setup from the ground the prepared statement each time, while EmbeddedDataSource nope. I suggest to disable commons-dbcp2 for Derby and investigate if it could happen in a real broker too. > commons-dbcp2 performance issue with Derby Embedded DBMS > > > Key: ARTEMIS-2955 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker, Tests >Affects Versions: 2.16.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: image-2020-10-20-09-08-45-390.png, > image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg > > > The test suite has shown an increase in duration of 30 minutes going from to > 2:30 to 3 hours: it seems related to integration paging tests running on > Embedded Derby with the changes of [ARTEMIS-2823 > Improve JDBC connection > management|https://issues.apache.org/jira/browse/ARTEMIS-2823]. > After some profiling sessions on > org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll > it seems that using commons-dbcp2 with Embedded Derby isn't working as > expected: > !image-2020-10-20-09-08-45-390.png! > while, if we switch to > [EmbeddedDataSource.|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html] > we have > !image-2020-10-20-09-10-10-644.png! > By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: > it seems that commons-dbcp2 is forcing to setup from the ground the prepared > statement each time, while EmbeddedDataSource nope. > I suggest to disable commons-dbcp2 for Derby and investigate if it could > happen in a real broker too. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS
Francesco Nigro created ARTEMIS-2955: Summary: commons-dbcp2 performance issue with Derby Embedded DBMS Key: ARTEMIS-2955 URL: https://issues.apache.org/jira/browse/ARTEMIS-2955 Project: ActiveMQ Artemis Issue Type: Bug Components: Broker, Tests Affects Versions: 2.16.0 Reporter: Francesco Nigro Assignee: Francesco Nigro Attachments: image-2020-10-20-09-08-45-390.png, image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg The test suite has shown an increase in duration of 30 minutes going from to 2:30 to 3 hours: it seems related to integration paging tests running on Embedded Derby with the changes of [ARTEMIS-2823 Improve JDBC connection management|https://issues.apache.org/jira/browse/ARTEMIS-2823]. After some profiling sessions on org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll it seems that using commons-dbcp2 with Embedded Derby isn't working as expected: !image-2020-10-20-09-08-45-390.png! while, if we switch to https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html we have !image-2020-10-20-09-10-10-644.png! By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: it seems that commons-dbcp2 is forcing to setup from the ground the prepared statement each time, while EmbeddedDataSource nope. I suggest to disable commons-dbcp2 for Derby and investigate if it could happen in a real broker too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-2949) Reduce GC on OperationContext::checkTasks
Francesco Nigro created ARTEMIS-2949: Summary: Reduce GC on OperationContext::checkTasks Key: ARTEMIS-2949 URL: https://issues.apache.org/jira/browse/ARTEMIS-2949 Project: ActiveMQ Artemis Issue Type: Improvement Reporter: Francesco Nigro Assignee: Francesco Nigro OperationContext::checkTasks is allocating Iterators that seems not scalar replaced and that could be saved by using the Queue API of LinkedList. Similarly, store only tasks can use a reduced (in term of footprint) task holder to reduce the allocation pressure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARTEMIS-2941) Improve JDBC HA connection resiliency
[ https://issues.apache.org/jira/browse/ARTEMIS-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro closed ARTEMIS-2941. Resolution: Won't Fix > Improve JDBC HA connection resiliency > - > > Key: ARTEMIS-2941 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2941 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.15.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This is aiming to replace the restart enhancement feature of > https://issues.apache.org/jira/browse/ARTEMIS-2918 because this last one is > too dangerous due to the numerous potential leaks that a server in production > could hit by allowing it to restart while keeping the Java process around. > Currently, JDBC HA uses an expiration time on locks that mark the time by > which a server instance is allowed to keep a specific role, dependent by the > owned lock (live or backup). > Right now, the first failed attempt to renew such expiration time force a > broker to shutdown immediately, while it could be more "relaxed" and just > keep retry until the very end ie when the expiration time is approaching to > end. > > The only concern of this feature is related to the relation between the > broker wall-clock time and the DBMS one, that's used to set the expiration > time and that should be within certain margins. > For this last part I'm aware that classic ActiveMQ lease locks use some > configuration parameter to set the magnitude of the allowed difference (and > to compute some base offset too). > > Right now this feature seems more risk-free and appealing then > https://issues.apache.org/jira/browse/ARTEMIS-2918, given it narrows the > scope of it to what's the very core issue ie a more resilient behaviour on > JDBC lost connectivity. > > To understand the implications of such change, consider a shared store HA > pair with configured 60 seconds of expiration time: > # DBMS goes down > # an in-flight persistent operation on the live data store cause the live > broker to kill itself immediately, because no reliable storage is connected > # backup is unable to renew its backup lease lock > # DBMS goes up in time, before the backup lock local expiration time is ended > # backup is able to renew its backup lease lock and retrieve the very last > state of live (that was live) and, if no script is configured to restart the > live, to failover and take its role > # backup is now live and able to serve clients > > > There are 2 legit questions re potential improvements on this: > # why the live cannot keep re-trying I/O (on the journal, paging or large > messages) until its local expiration time end? > # why the live isn't just returning back an I/O error to the clients? > > The former is complex: the main problem I see is from the resource > utilization point of view; keeping an accumulating backlog of pending > requests, blocked awaiting the last one for an arbitrary long time will > probably cause the broker memory to blown up, to not mention that clients > will timed out too. > The latter seems more appealing, because will allow clients to fail fast, but > it would affect the current semantic we use on the broker storage operations > and I need more investigation to understand how to implement it. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (ARTEMIS-2941) Improve JDBC HA connection resiliency
[ https://issues.apache.org/jira/browse/ARTEMIS-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro reopened ARTEMIS-2941: -- > Improve JDBC HA connection resiliency > - > > Key: ARTEMIS-2941 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2941 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.15.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This is aiming to replace the restart enhancement feature of > https://issues.apache.org/jira/browse/ARTEMIS-2918 because this last one is > too dangerous due to the numerous potential leaks that a server in production > could hit by allowing it to restart while keeping the Java process around. > Currently, JDBC HA uses an expiration time on locks that mark the time by > which a server instance is allowed to keep a specific role, dependent by the > owned lock (live or backup). > Right now, the first failed attempt to renew such expiration time force a > broker to shutdown immediately, while it could be more "relaxed" and just > keep retry until the very end ie when the expiration time is approaching to > end. > > The only concern of this feature is related to the relation between the > broker wall-clock time and the DBMS one, that's used to set the expiration > time and that should be within certain margins. > For this last part I'm aware that classic ActiveMQ lease locks use some > configuration parameter to set the magnitude of the allowed difference (and > to compute some base offset too). > > Right now this feature seems more risk-free and appealing then > https://issues.apache.org/jira/browse/ARTEMIS-2918, given it narrows the > scope of it to what's the very core issue ie a more resilient behaviour on > JDBC lost connectivity. > > To understand the implications of such change, consider a shared store HA > pair with configured 60 seconds of expiration time: > # DBMS goes down > # an in-flight persistent operation on the live data store cause the live > broker to kill itself immediately, because no reliable storage is connected > # backup is unable to renew its backup lease lock > # DBMS goes up in time, before the backup lock local expiration time is ended > # backup is able to renew its backup lease lock and retrieve the very last > state of live (that was live) and, if no script is configured to restart the > live, to failover and take its role > # backup is now live and able to serve clients > > > There are 2 legit questions re potential improvements on this: > # why the live cannot keep re-trying I/O (on the journal, paging or large > messages) until its local expiration time end? > # why the live isn't just returning back an I/O error to the clients? > > The former is complex: the main problem I see is from the resource > utilization point of view; keeping an accumulating backlog of pending > requests, blocked awaiting the last one for an arbitrary long time will > probably cause the broker memory to blown up, to not mention that clients > will timed out too. > The latter seems more appealing, because will allow clients to fail fast, but > it would affect the current semantic we use on the broker storage operations > and I need more investigation to understand how to implement it. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2945) Artemis native JNI code can be replaced by Java
[ https://issues.apache.org/jira/browse/ARTEMIS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2945: - Summary: Artemis native JNI code can be replaced by Java (was: Artemis native JNI code code be replaced by Java) > Artemis native JNI code can be replaced by Java > --- > > Key: ARTEMIS-2945 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2945 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.15.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > LibAIO JNI code could be rewritten in Java, while keeping the LibaioContext > and LibaioFile APIs the same. > > There are few benefits from this: > # simplification of C code logic to ease maintain it > # quicker development process (implement-try-test-debug cycle) for non-C > programmers, including simpler integration with Java test suites > # easier monitoring/telemetry integration > > As demonstrations/proofs of such benefits I would introduce several changes > into the Java version: > # using the libaio async fdatasync feature to allow the LibaioContext duty > cycle loop to free CPU resources in order to handle compaction reads without > being slowed down by an in-progress fdatasync > # use a lock-free high performance data structure to reuse iocbs instead of > a locked (using a mutex) one > # expose in-flight callbacks to allow future PRs to introduce Java-only > latency telemetry per-request or just error check/kill of "slow" in-flight > requests > > The possible drawbacks are: > # slower performance due to GC barriers cost (see the notes on the PR code) > # slower performance in case of JVM without Unsafe (that means very few) > The latter issue could be addressed by using the new/proper VarHandle > features when the Artemis min supported version will move from Java 8 or > using the same approach on other projects relying on it eg Netty. > A note about how to correctly benchmark this due to how I've implemented > async fdatasync: in order to save both LibaioContext and TimedBuffer to > perform fdatasync batching of writes, I've preferred to simplify the > LibaioContext: it means that the buffer timeout to be used on broker.xml > should be obtained by ./artemis perf-journal --sync-writes ie batching writes > at the speed of the measured fdatasync RTT latency. > This last behaviour could be changed used a more Apple-to-Apple approach > although I still think that the beauty of using is Java is exactly to bring > new features/logics in with a shorter development cycle :) > We're not in hurry to get this done so that perf-wise this feature could be > implemented improving performance over the original version too, if possible. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARTEMIS-2926) Scheduled task executions are skipped randomly
[ https://issues.apache.org/jira/browse/ARTEMIS-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212950#comment-17212950 ] Francesco Nigro edited comment on ARTEMIS-2926 at 10/13/20, 9:25 AM: - That's a good point similar to what [~robbie] has exposed on https://github.com/apache/activemq-artemis/pull/3287#issuecomment-707028383: agree, we should not measure time on the executor in order to drop executions and, better, probably we shouldn't drop execution according to that metric... And related to the original issue with JDBC: the idea would be to allow to start as soon as possible without dropping a "too early" execution, but instead, allowing duplicate ones too. Regardless the bug, probably ActiveMQScheduledComponent isn't a good fit for that feature as it is. was (Author: nigrofranz): That's a good point similar to what [~robbie] has exposed on https://github.com/apache/activemq-artemis/pull/3287#issuecomment-707028383: agree, we should not measure time on the executor in order to drop executions and, better, probably we shouldn't drop execution according to that metric... > Scheduled task executions are skipped randomly > -- > > Key: ARTEMIS-2926 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2926 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker >Affects Versions: 2.13.0 >Reporter: Apache Dev >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Scheduled tasks extending {{ActiveMQScheduledComponent}} could randomly skip > an execution, logging: > {code} > Execution ignored due to too many simultaneous executions, probably a > previous delayed execution > {code} > The problem is in the "ActiveMQScheduledComponent#runForExecutor" Runnable. > Times to be compared ({{currentTimeMillis()}} and {{lastTime}}) are taken > inside the runnable execution itself. So, depending on relative execution > times, it could happen that the difference is less than the given period > (e.g. 1 ms), resulting in a skipped execution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARTEMIS-2926) Scheduled task executions are skipped randomly
[ https://issues.apache.org/jira/browse/ARTEMIS-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212950#comment-17212950 ] Francesco Nigro edited comment on ARTEMIS-2926 at 10/13/20, 9:14 AM: - That's a good point similar to what [~robbie] has exposed on https://github.com/apache/activemq-artemis/pull/3287#issuecomment-707028383: agree, we should not measure time on the executor in order to drop executions and, better, probably we shouldn't drop execution according to that metric... was (Author: nigrofranz): That's a good point similar to what [~robbie] has exposed on https://github.com/apache/activemq-artemis/pull/3287#issuecomment-707028383: agree, we should not measure time on the executor but still, that's the only one that guarantees program order aka single threaded-like execution. Let me think about it a bit more if there is a better choice here or do you already have a solution?:) > Scheduled task executions are skipped randomly > -- > > Key: ARTEMIS-2926 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2926 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker >Affects Versions: 2.13.0 >Reporter: Apache Dev >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Scheduled tasks extending {{ActiveMQScheduledComponent}} could randomly skip > an execution, logging: > {code} > Execution ignored due to too many simultaneous executions, probably a > previous delayed execution > {code} > The problem is in the "ActiveMQScheduledComponent#runForExecutor" Runnable. > Times to be compared ({{currentTimeMillis()}} and {{lastTime}}) are taken > inside the runnable execution itself. So, depending on relative execution > times, it could happen that the difference is less than the given period > (e.g. 1 ms), resulting in a skipped execution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARTEMIS-2926) Scheduled task executions are skipped randomly
[ https://issues.apache.org/jira/browse/ARTEMIS-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212950#comment-17212950 ] Francesco Nigro commented on ARTEMIS-2926: -- That's a good point similar to what [~robbie] has exposed on https://github.com/apache/activemq-artemis/pull/3287#issuecomment-707028383: agree, we should not measure time on the executor but still, that's the only one that guarantees program order aka single threaded-like execution. Let me think about it a bit more if there is a better choice here or do you already have a solution?:) > Scheduled task executions are skipped randomly > -- > > Key: ARTEMIS-2926 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2926 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker >Affects Versions: 2.13.0 >Reporter: Apache Dev >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Scheduled tasks extending {{ActiveMQScheduledComponent}} could randomly skip > an execution, logging: > {code} > Execution ignored due to too many simultaneous executions, probably a > previous delayed execution > {code} > The problem is in the "ActiveMQScheduledComponent#runForExecutor" Runnable. > Times to be compared ({{currentTimeMillis()}} and {{lastTime}}) are taken > inside the runnable execution itself. So, depending on relative execution > times, it could happen that the difference is less than the given period > (e.g. 1 ms), resulting in a skipped execution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2941) Improve JDBC HA connection resiliency
[ https://issues.apache.org/jira/browse/ARTEMIS-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2941: - Summary: Improve JDBC HA connection resiliency (was: Improve JDBC connection resiliency) > Improve JDBC HA connection resiliency > - > > Key: ARTEMIS-2941 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2941 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.15.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > This is aiming to replace the restart enhancement feature of > https://issues.apache.org/jira/browse/ARTEMIS-2918 because this last one is > too dangerous due to the numerous potential leaks that a server in production > could hit by allowing it to restart while keeping the Java process around. > Currently, JDBC HA uses an expiration time on locks that mark the time by > which a server instance is allowed to keep a specific role, dependent by the > owned lock (live or backup). > Right now, the first failed attempt to renew such expiration time force a > broker to shutdown immediately, while it could be more "relaxed" and just > keep retry until the very end ie when the expiration time is approaching to > end. > > The only concern of this feature is related to the relation between the > broker wall-clock time and the DBMS one, that's used to set the expiration > time and that should be within certain margins. > For this last part I'm aware that classic ActiveMQ lease locks use some > configuration parameter to set the magnitude of the allowed difference (and > to compute some base offset too). > > Right now this feature seems more risk-free and appealing then > https://issues.apache.org/jira/browse/ARTEMIS-2918, given it narrows the > scope of it to what's the very core issue ie a more resilient behaviour on > JDBC lost connectivity. > > To understand the implications of such change, consider a shared store HA > pair with configured 60 seconds of expiration time: > # DBMS goes down > # an in-flight persistent operation on the live data store cause the live > broker to kill itself immediately, because no reliable storage is connected > # backup is unable to renew its backup lease lock > # DBMS goes up in time, before the backup lock local expiration time is ended > # backup is able to renew its backup lease lock and retrieve the very last > state of live (that was live) and, if no script is configured to restart the > live, to failover and take its role > # backup is now live and able to serve clients > > > There are 2 legit questions re potential improvements on this: > # why the live cannot keep re-trying I/O (on the journal, paging or large > messages) until its local expiration time end? > # why the live isn't just returning back an I/O error to the clients? > > The former is complex: the main problem I see is from the resource > utilization point of view; keeping an accumulating backlog of pending > requests, blocked awaiting the last one for an arbitrary long time will > probably cause the broker memory to blown up, to not mention that clients > will timed out too. > The latter seems more appealing, because will allow clients to fail fast, but > it would affect the current semantic we use on the broker storage operations > and I need more investigation to understand how to implement it. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARTEMIS-2945) Artemis native JNI code code be replaced by Java
[ https://issues.apache.org/jira/browse/ARTEMIS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212640#comment-17212640 ] Francesco Nigro commented on ARTEMIS-2945: -- The async fdatasync change isn't mandatory; given the simplicity to change the Java code, it could be easily changed into {code:java} if (res >= 0) { final int fd = IoCb.aioFildes(pooledIOCB.bytes); if (fd != dumbFD) { if (useFdatasync) { if (lastFile != fd) { lastFile = fd; fdatasync(fd); } } } else { stop = true; } } {code} Using the fdatasync blocking behaviour to backpressure and batch burst of writes. The downside of this approach would be to prevent read operations to be scheduled if the poller thread is kept busy awaiting an in-flight fdatasync. > Artemis native JNI code code be replaced by Java > > > Key: ARTEMIS-2945 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2945 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.15.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > LibAIO JNI code could be rewritten in Java, while keeping the LibaioContext > and LibaioFile APIs the same. > > There are few benefits from this: > # simplification of C code logic to ease maintain it > # quicker development process (implement-try-test-debug cycle) for non-C > programmers, including simpler integration with Java test suites > # easier monitoring/telemetry integration > > As demonstrations/proofs of such benefits I would introduce several changes > into the Java version: > # using the libaio async fdatasync feature to allow the LibaioContext duty > cycle loop to free CPU resources in order to handle compaction reads without > being slowed down by an in-progress fdatasync > # use a lock-free high performance data structure to reuse iocbs instead of > a locked (using a mutex) one > # expose in-flight callbacks to allow future PRs to introduce Java-only > latency telemetry per-request or just error check/kill of "slow" in-flight > requests > > The possible drawbacks are: > # slower performance due to GC barriers cost (see the notes on the PR code) > # slower performance in case of JVM without Unsafe (that means very few) > The latter issue could be addressed by using the new/proper VarHandle > features when the Artemis min supported version will move from Java 8 or > using the same approach on other projects relying on it eg Netty. > A note about how to correctly benchmark this due to how I've implemented > async fdatasync: in order to save both LibaioContext and TimedBuffer to > perform fdatasync batching of writes, I've preferred to simplify the > LibaioContext: it means that the buffer timeout to be used on broker.xml > should be obtained by ./artemis perf-journal --sync-writes ie batching writes > at the speed of the measured fdatasync RTT latency. > This last behaviour could be changed used a more Apple-to-Apple approach > although I still think that the beauty of using is Java is exactly to bring > new features/logics in with a shorter development cycle :) > We're not in hurry to get this done so that perf-wise this feature could be > implemented improving performance over the original version too, if possible. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2945) Artemis native JNI code code be replaced by Java
[ https://issues.apache.org/jira/browse/ARTEMIS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2945: - Attachment: (was: new_1000_bis.svg) > Artemis native JNI code code be replaced by Java > > > Key: ARTEMIS-2945 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2945 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.15.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > LibAIO JNI code could be rewritten in Java, while keeping the LibaioContext > and LibaioFile APIs the same. > > There are few benefits from this: > # simplification of C code logic to ease maintain it > # quicker development process (implement-try-test-debug cycle) for non-C > programmers, including simpler integration with Java test suites > # easier monitoring/telemetry integration > > As demonstrations/proofs of such benefits I would introduce several changes > into the Java version: > # using the libaio async fdatasync feature to allow the LibaioContext duty > cycle loop to free CPU resources in order to handle compaction reads without > being slowed down by an in-progress fdatasync > # use a lock-free high performance data structure to reuse iocbs instead of > a locked (using a mutex) one > # expose in-flight callbacks to allow future PRs to introduce Java-only > latency telemetry per-request or just error check/kill of "slow" in-flight > requests > > The possible drawbacks are: > # slower performance due to GC barriers cost (see the notes on the PR code) > # slower performance in case of JVM without Unsafe (that means very few) > The latter issue could be addressed by using the new/proper VarHandle > features when the Artemis min supported version will move from Java 8 or > using the same approach on other projects relying on it eg Netty. > A note about how to correctly benchmark this due to how I've implemented > async fdatasync: in order to save both LibaioContext and TimedBuffer to > perform fdatasync batching of writes, I've preferred to simplify the > LibaioContext: it means that the buffer timeout to be used on broker.xml > should be obtained by ./artemis perf-journal --sync-writes ie batching writes > at the speed of the measured fdatasync RTT latency. > This last behaviour could be changed used a more Apple-to-Apple approach > although I still think that the beauty of using is Java is exactly to bring > new features/logics in with a shorter development cycle :) > We're not in hurry to get this done so that perf-wise this feature could be > implemented improving performance over the original version too, if possible. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2945) Artemis native JNI code code be replaced by Java
[ https://issues.apache.org/jira/browse/ARTEMIS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2945: - Attachment: (was: old_1000_bis.svg) > Artemis native JNI code code be replaced by Java > > > Key: ARTEMIS-2945 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2945 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.15.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > LibAIO JNI code could be rewritten in Java, while keeping the LibaioContext > and LibaioFile APIs the same. > > There are few benefits from this: > # simplification of C code logic to ease maintain it > # quicker development process (implement-try-test-debug cycle) for non-C > programmers, including simpler integration with Java test suites > # easier monitoring/telemetry integration > > As demonstrations/proofs of such benefits I would introduce several changes > into the Java version: > # using the libaio async fdatasync feature to allow the LibaioContext duty > cycle loop to free CPU resources in order to handle compaction reads without > being slowed down by an in-progress fdatasync > # use a lock-free high performance data structure to reuse iocbs instead of > a locked (using a mutex) one > # expose in-flight callbacks to allow future PRs to introduce Java-only > latency telemetry per-request or just error check/kill of "slow" in-flight > requests > > The possible drawbacks are: > # slower performance due to GC barriers cost (see the notes on the PR code) > # slower performance in case of JVM without Unsafe (that means very few) > The latter issue could be addressed by using the new/proper VarHandle > features when the Artemis min supported version will move from Java 8 or > using the same approach on other projects relying on it eg Netty. > A note about how to correctly benchmark this due to how I've implemented > async fdatasync: in order to save both LibaioContext and TimedBuffer to > perform fdatasync batching of writes, I've preferred to simplify the > LibaioContext: it means that the buffer timeout to be used on broker.xml > should be obtained by ./artemis perf-journal --sync-writes ie batching writes > at the speed of the measured fdatasync RTT latency. > This last behaviour could be changed used a more Apple-to-Apple approach > although I still think that the beauty of using is Java is exactly to bring > new features/logics in with a shorter development cycle :) > We're not in hurry to get this done so that perf-wise this feature could be > implemented improving performance over the original version too, if possible. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2945) Artemis native JNI code code be replaced by Java
[ https://issues.apache.org/jira/browse/ARTEMIS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2945: - Attachment: new_1000_bis.svg old_1000_bis.svg > Artemis native JNI code code be replaced by Java > > > Key: ARTEMIS-2945 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2945 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.15.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > Attachments: new_1000_bis.svg, old_1000_bis.svg > > > LibAIO JNI code could be rewritten in Java, while keeping the LibaioContext > and LibaioFile APIs the same. > > There are few benefits from this: > # simplification of C code logic to ease maintain it > # quicker development process (implement-try-test-debug cycle) for non-C > programmers, including simpler integration with Java test suites > # easier monitoring/telemetry integration > > As demonstrations/proofs of such benefits I would introduce several changes > into the Java version: > # using the libaio async fdatasync feature to allow the LibaioContext duty > cycle loop to free CPU resources in order to handle compaction reads without > being slowed down by an in-progress fdatasync > # use a lock-free high performance data structure to reuse iocbs instead of > a locked (using a mutex) one > # expose in-flight callbacks to allow future PRs to introduce Java-only > latency telemetry per-request or just error check/kill of "slow" in-flight > requests > > The possible drawbacks are: > # slower performance due to GC barriers cost (see the notes on the PR code) > # slower performance in case of JVM without Unsafe (that means very few) > The latter issue could be addressed by using the new/proper VarHandle > features when the Artemis min supported version will move from Java 8 or > using the same approach on other projects relying on it eg Netty. > A note about how to correctly benchmark this due to how I've implemented > async fdatasync: in order to save both LibaioContext and TimedBuffer to > perform fdatasync batching of writes, I've preferred to simplify the > LibaioContext: it means that the buffer timeout to be used on broker.xml > should be obtained by ./artemis perf-journal --sync-writes ie batching writes > at the speed of the measured fdatasync RTT latency. > This last behaviour could be changed used a more Apple-to-Apple approach > although I still think that the beauty of using is Java is exactly to bring > new features/logics in with a shorter development cycle :) > We're not in hurry to get this done so that perf-wise this feature could be > implemented improving performance over the original version too, if possible. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2946) Increase MaxInlineLevel to 15 on JVM configuration
[ https://issues.apache.org/jira/browse/ARTEMIS-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2946: - Summary: Increase MaxInlineLevel to 15 on JVM configuration (was: Increase MaxInlineLevel on JVM configuration) > Increase MaxInlineLevel to 15 on JVM configuration > -- > > Key: ARTEMIS-2946 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2946 > Project: ActiveMQ Artemis > Issue Type: Improvement >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > According to [https://bugs.openjdk.java.net/browse/JDK-8234863] there are > applications that can benefit from having an increased MaxInlineLevel: many > Netty-based ones shown a clear improvement in performance, especially ones > with long stack traces and/or wide hierarchies. > > See [https://github.com/netty/netty/pull/10368#issuecomment-648174201] for > more information about the potential improvements on encoding/decoding. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARTEMIS-2946) Increase MaxInlineLevel on JVM configuration
[ https://issues.apache.org/jira/browse/ARTEMIS-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco Nigro updated ARTEMIS-2946: - Description: According to [https://bugs.openjdk.java.net/browse/JDK-8234863] there are applications that can benefit from having an increased MaxInlineLevel: many Netty-based ones shown a clear improvement in performance, especially ones with long stack traces and/or wide hierarchies. See [https://github.com/netty/netty/pull/10368#issuecomment-648174201] for more information about the potential improvements on encoding/decoding. was: According to [https://bugs.openjdk.java.net/browse/JDK-8234863] there are applications that can benefit from having an increased MaxInlineLevel: many Netty-based ones shown a clear improvement in performance, especially ones with long stack traces and/or wide hierarchies. > Increase MaxInlineLevel on JVM configuration > > > Key: ARTEMIS-2946 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2946 > Project: ActiveMQ Artemis > Issue Type: Improvement >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > According to [https://bugs.openjdk.java.net/browse/JDK-8234863] there are > applications that can benefit from having an increased MaxInlineLevel: many > Netty-based ones shown a clear improvement in performance, especially ones > with long stack traces and/or wide hierarchies. > > See [https://github.com/netty/netty/pull/10368#issuecomment-648174201] for > more information about the potential improvements on encoding/decoding. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARTEMIS-2946) Increase MaxInlineLevel on JVM configuration
Francesco Nigro created ARTEMIS-2946: Summary: Increase MaxInlineLevel on JVM configuration Key: ARTEMIS-2946 URL: https://issues.apache.org/jira/browse/ARTEMIS-2946 Project: ActiveMQ Artemis Issue Type: Improvement Reporter: Francesco Nigro Assignee: Francesco Nigro According to [https://bugs.openjdk.java.net/browse/JDK-8234863] there are applications that can benefit from having an increased MaxInlineLevel: many Netty-based ones shown a clear improvement in performance, especially ones with long stack traces and/or wide hierarchies. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARTEMIS-2945) Artemis native JNI code code be replaced by Java
[ https://issues.apache.org/jira/browse/ARTEMIS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212551#comment-17212551 ] Francesco Nigro commented on ARTEMIS-2945: -- I've avoided to change API in order to keep the Artemis broker code happy, but there are lot of improvements that could be made if we allow it instead eg while pooling ByteBuffers to perform read/write. This would save referencing objects belonging to old gen or different heap regions (for region based GCs like Shenandoah/G1/ZGC), > Artemis native JNI code code be replaced by Java > > > Key: ARTEMIS-2945 > URL: https://issues.apache.org/jira/browse/ARTEMIS-2945 > Project: ActiveMQ Artemis > Issue Type: Improvement > Components: Broker >Affects Versions: 2.15.0 >Reporter: Francesco Nigro >Assignee: Francesco Nigro >Priority: Major > > LibAIO JNI code could be rewritten in Java, while keeping the LibaioContext > and LibaioFile APIs the same. > > There are few benefits from this: > # simplification of C code logic to ease maintain it > # quicker development process (implement-try-test-debug cycle) for non-C > programmers, including simpler integration with Java test suites > # easier monitoring/telemetry integration > > As demonstrations/proofs of such benefits I would introduce several changes > into the Java version: > # using the libaio async fdatasync feature to allow the LibaioContext duty > cycle loop to free CPU resources in order to handle compaction reads without > being slowed down by an in-progress fdatasync > # use a lock-free high performance data structure to reuse iocbs instead of > a locked (using a mutex) one > # expose in-flight callbacks to allow future PRs to introduce Java-only > latency telemetry per-request or just error check/kill of "slow" in-flight > requests > > The possible drawbacks are: > # slower performance due to GC barriers cost (see the notes on the PR code) > # slower performance in case of JVM without Unsafe (that means very few) > The latter issue could be addressed by using the new/proper VarHandle > features when the Artemis min supported version will move from Java 8 or > using the same approach on other projects relying on it eg Netty. > A note about how to correctly benchmark this due to how I've implemented > async fdatasync: in order to save both LibaioContext and TimedBuffer to > perform fdatasync batching of writes, I've preferred to simplify the > LibaioContext: it means that the buffer timeout to be used on broker.xml > should be obtained by ./artemis perf-journal --sync-writes ie batching writes > at the speed of the measured fdatasync RTT latency. > This last behaviour could be changed used a more Apple-to-Apple approach > although I still think that the beauty of using is Java is exactly to bring > new features/logics in with a shorter development cycle :) > We're not in hurry to get this done so that perf-wise this feature could be > implemented improving performance over the original version too, if possible. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)