[GitHub] activemq-artemis pull request #1929: ARTEMIS-1728 Reclaim memory when page c...
GitHub user shoukunhuai opened a pull request: https://github.com/apache/activemq-artemis/pull/1929 ARTEMIS-1728 Reclaim memory when page cursor complete Free hash set used to hold page position for acks and removed refs. The two set is cleared, but they still hold a big array. It is safe to replace the old one with empty set. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shoukunhuai/activemq-artemis reclaim_hashmap Alternatively you can review and apply these changes as the patch at: https://github.com/apache/activemq-artemis/pull/1929.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1929 commit 6d445c5d6485ed5ca65ea5718905a84ca6d5eb9b Author: huaishk <shoukunhuai@...> Date: 2018-03-05T09:32:29Z ARTEMIS-1728 Reclaim memory when page cursor complete Free hash set used to hold page position for acks and removed refs. The two set is cleared, but they still hold a big array. It is safe to replace the old one with empty set. ---
[GitHub] activemq-artemis issue #1899: ARTEMIS-1700 Fixed deadlock in paging state
Github user shoukunhuai commented on the issue: https://github.com/apache/activemq-artemis/pull/1899 So it is a mistake to use global thread pool instead of io thread pool for page cursor. But this does not fix our problem, as you can see ``` "Thread-274672 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5@4e91d63f)" Id=274703 TIMED_WAITING on java.util.concurrent.CountDownLatch$Sync@5c416651 at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.CountDownLatch$Sync@5c416651 at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) at org.apache.activemq.artemis.core.journal.impl.SimpleWaitIOCallback.waitCompletion(SimpleWaitIOCallback.java:73) at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.waitCompletion(OperationContextImpl.java:313) at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.waitOnOperations(AbstractJournalStorageManager.java:294) at org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.storeBookmark(PageCursorProviderImpl.java:539) at org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.cleanupComplete(PageCursorProviderImpl.java:431) at org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.cleanup(PageCursorProviderImpl.java:383) - locked org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl@4f8d6d9a at org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl$1.run(PageCursorProviderImpl.java:291) at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) at org.apache.activemq.artemis.utils.actors.ProcessorBase$ExecutorTask.run(ProcessorBase.java:53) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Number of locked synchronizers = 2 - java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@613d010 - java.util.concurrent.ThreadPoolExecutor$Worker@4c03ae59 ``` When exit paging state, we will store bookmark for each page subscription and wait until all callbacks done. I believe this may happen even running in io thread as long as singleThreadExecutor in AbstractJournalStrorageManager use thread from global server thread pool. ---
[GitHub] activemq-artemis pull request #1894: ARTEMIS-1700 Fixed deadlock in paging s...
Github user shoukunhuai commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1894#discussion_r170407785 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/persistence/impl/journal/AbstractJournalStorageManager.java --- @@ -1488,7 +1494,13 @@ public synchronized void start() throws Exception { beforeStart(); - singleThreadExecutor = executorFactory.getExecutor(); + ThreadFactory tFactory = AccessController.doPrivileged(new PrivilegedAction() { + @Override + public ThreadFactory run() { +return new ActiveMQThreadFactory("ActiveMQ-journal-server-" + this.toString(), true, ClientSessionFactoryImpl.class.getClassLoader()); + } + }); + singleThreadExecutor = Executors.newSingleThreadExecutor(tFactory); --- End diff -- https://github.com/apache/activemq-artemis/blob/d6d895c558cc104475188d942473771418b5e3e6/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/group/impl/LocalGroupingHandler.java#L109-L117 @clebertsuconic the comment said we need an executor out side the pool ---
[GitHub] activemq-artemis pull request #1894: ARTEMIS-1700 Fixed deadlock in paging s...
Github user shoukunhuai commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1894#discussion_r170405747 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/persistence/impl/journal/AbstractJournalStorageManager.java --- @@ -1488,7 +1494,13 @@ public synchronized void start() throws Exception { beforeStart(); - singleThreadExecutor = executorFactory.getExecutor(); + ThreadFactory tFactory = AccessController.doPrivileged(new PrivilegedAction() { + @Override + public ThreadFactory run() { +return new ActiveMQThreadFactory("ActiveMQ-journal-server-" + this.toString(), true, ClientSessionFactoryImpl.class.getClassLoader()); + } + }); + singleThreadExecutor = Executors.newSingleThreadExecutor(tFactory); --- End diff -- We are running 2.4.0 See https://issues.apache.org/jira/browse/ARTEMIS-1700 There is a artemis.log attached. ---
[GitHub] activemq-artemis pull request #1894: ARTEMIS-1700 Fixed deadlock in paging s...
Github user shoukunhuai commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1894#discussion_r170405300 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/persistence/impl/journal/AbstractJournalStorageManager.java --- @@ -1488,7 +1494,13 @@ public synchronized void start() throws Exception { beforeStart(); - singleThreadExecutor = executorFactory.getExecutor(); + ThreadFactory tFactory = AccessController.doPrivileged(new PrivilegedAction() { + @Override + public ThreadFactory run() { +return new ActiveMQThreadFactory("ActiveMQ-journal-server-" + this.toString(), true, ClientSessionFactoryImpl.class.getClassLoader()); + } + }); + singleThreadExecutor = Executors.newSingleThreadExecutor(tFactory); --- End diff -- We believe this happens when - using fixed thread pool which is the default, and - an address has many producers, more then thread pool's size, and - the address is about to exit paging state ---
[GitHub] activemq-artemis pull request #1894: ARTEMIS-1700 Fixed deadlock in paging s...
Github user shoukunhuai commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1894#discussion_r170405082 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/persistence/impl/journal/AbstractJournalStorageManager.java --- @@ -1488,7 +1494,13 @@ public synchronized void start() throws Exception { beforeStart(); - singleThreadExecutor = executorFactory.getExecutor(); + ThreadFactory tFactory = AccessController.doPrivileged(new PrivilegedAction() { + @Override + public ThreadFactory run() { +return new ActiveMQThreadFactory("ActiveMQ-journal-server-" + this.toString(), true, ClientSessionFactoryImpl.class.getClassLoader()); + } + }); + singleThreadExecutor = Executors.newSingleThreadExecutor(tFactory); --- End diff -- What if the pool is full? In our case, the pool is a 60 thread fixed pool. One of the thread is doing page cleanup, and try to exit paging state, it holds the lock in paging store. All other 59 threads is blocked on the lock, trying to page. While cleanup, we need to store bookmark in journal for each page subscription, then wait until completed. In log, stored equals to storeLineUp, but there are pending tasks(there are going to count down latch cleanup thread is waiting on), the deadlock happened. ``` 16:44:28,930 AMQ222024: Could not complete operations on IO context OperationContextImpl [1251391301] [minimalStore=1, storeLineUp=2, stored=2, minimalReplicated=0, replicationLineUp=0, replicated=0, paged=0, minimalPage=0, pageLineUp=0, errorCode=-1, errorMessage=null, executorsPending=3, executor=OrderedExecutor(tasks=[org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1@4d09259, org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1@54b73dc4, org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1@640495d4])] ``` ---
[GitHub] activemq-artemis pull request #1827: ARTEMIS-1650 Improve paged message ackn...
Github user shoukunhuai commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1827#discussion_r165955807 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/RefsOperation.java --- @@ -152,14 +153,14 @@ protected void rollbackRedelivery(Transaction tx, MessageReference ref, long tim @Override public void afterCommit(final Transaction tx) { for (MessageReference ref : refsToAck) { - synchronized (ref.getQueue()) { --- End diff -- hi @michaelandrepearce I get the sync back, i think we can discuss this later, may be another PR. ---
[GitHub] activemq-artemis pull request #1827: ARTEMIS-1650 Improve paged message ackn...
Github user shoukunhuai commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1827#discussion_r165955062 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/server/MessageReference.java --- @@ -38,6 +38,10 @@ public static MessageReference createReference(Message encode, final Queue queue Message getMessage(); + default long getMessageID() { --- End diff -- I have removed default method. ---
[GitHub] activemq-artemis pull request #1827: ARTEMIS-1650 Improve paged message ackn...
Github user shoukunhuai commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1827#discussion_r16574 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/RefsOperation.java --- @@ -152,14 +153,14 @@ protected void rollbackRedelivery(Transaction tx, MessageReference ref, long tim @Override public void afterCommit(final Transaction tx) { for (MessageReference ref : refsToAck) { - synchronized (ref.getQueue()) { --- End diff -- I do this because in QueueImpl::postAcknowledge, the only thing that changed queue's state is decrement queue's delivering count, which i believe is thread safe. While QueueImpl:postRollback is different, it will not return messages if purge on no consumers and there is no consumer at that time, since add/remove consumer will lock the queue. So we must lock the queue when do post rollback. If no body is 100% sure about this, i will revert this change, since other change has improved a lot. ---
[GitHub] activemq-artemis pull request #1827: Improve paged message acknowledge
Github user shoukunhuai commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1827#discussion_r165284853 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/server/MessageReference.java --- @@ -38,6 +38,10 @@ public static MessageReference createReference(Message encode, final Queue queue Message getMessage(); + default long getMessageID() { --- End diff -- Thanks, will change it later ---
[GitHub] activemq-artemis pull request #1827: Improve paged message acknowledge
Github user shoukunhuai commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1827#discussion_r165050956 --- Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/RefsOperation.java --- @@ -152,14 +153,14 @@ protected void rollbackRedelivery(Transaction tx, MessageReference ref, long tim @Override public void afterCommit(final Transaction tx) { for (MessageReference ref : refsToAck) { - synchronized (ref.getQueue()) { -queue.postAcknowledge(ref); - } + queue.postAcknowledge(ref); } if (pagedMessagesToPostACK != null) { for (MessageReference refmsg : pagedMessagesToPostACK) { -decrementRefCount(refmsg); --- End diff -- @cshannon thanks. ---
[GitHub] activemq-artemis pull request #1827: Improve paged message acknowledge
GitHub user shoukunhuai opened a pull request: https://github.com/apache/activemq-artemis/pull/1827 Improve paged message acknowledge Cache `messageID`, `transactionID` and `isLargeMessage` in PagedReference, so that when acknowledge, we do not have to get PagedMessage which may be GCed and cause re-read entire page. Also, in RefsOperation::afterCommit do not lock queue when do postAcknowledge. I think this method is thread safe. And last, decrement ref count after commit only if it is a paged large message. Large message depends on this operation to delete file. Other implements does nothing in this method. All those change is to avoid get PagedMessage when acknowledge. Let me know if there is anything wrong. @clebertsuconic would you please help review this PR? I will create jira issue when necessary. Attached is thread dump when doing stress test, we found most of server thread is blocked on page operation. We use Artemis 2.4.0, run 10 jmeter engine, for each engine started 100 publisher and 50 subscriber. We changed jmeter to use shared subscription, and blockOnAcknowledge is set true. [jstack.txt](https://github.com/apache/activemq-artemis/files/1680132/jstack.txt) You can merge this pull request into a Git repository by running: $ git pull https://github.com/shoukunhuai/activemq-artemis enhance-paged-ack Alternatively you can review and apply these changes as the patch at: https://github.com/apache/activemq-artemis/pull/1827.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1827 commit 52d63a5802353966236f1f9eec85d577e1941a12 Author: huaishk <shoukunhuai@...> Date: 2018-01-31T01:48:43Z Improve paged message acknowledge Cache `messageID`, `transactionID` and `isLargeMessage` in PagedReference, so that when acknowledge, we do not have to get PagedMessage which may be GCed and cause re-read entire page. Also, in RefsOperation::afterCommit do not lock queue when do postAcknowledge. I think this method is thread safe. ---
[GitHub] activemq-artemis issue #1742: ARTEMIS-1570 Flush appendExecutor before take ...
Github user shoukunhuai commented on the issue: https://github.com/apache/activemq-artemis/pull/1742 @clebertsuconic sorry, almost forgot this, being busy on something else. Looks like this issue was resolved? ---
[GitHub] activemq-artemis issue #1747: [WIP] NO-JIRA DISCUSS Replica use operationcon...
Github user shoukunhuai commented on the issue: https://github.com/apache/activemq-artemis/pull/1747 @jbertram sure, i will close it, and thanks all. ---
[GitHub] activemq-artemis pull request #1747: [WIP] NO-JIRA DISCUSS Replica use opera...
Github user shoukunhuai closed the pull request at: https://github.com/apache/activemq-artemis/pull/1747 ---