[GitHub] activemq-artemis pull request #1929: ARTEMIS-1728 Reclaim memory when page c...

2018-03-05 Thread shoukunhuai
GitHub user shoukunhuai opened a pull request:

https://github.com/apache/activemq-artemis/pull/1929

ARTEMIS-1728 Reclaim memory when page cursor complete

Free hash set used to hold page position for acks and removed refs.
The two set is cleared, but they still hold a big array.

It is safe to replace the old one with empty set.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shoukunhuai/activemq-artemis reclaim_hashmap

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/activemq-artemis/pull/1929.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1929


commit 6d445c5d6485ed5ca65ea5718905a84ca6d5eb9b
Author: huaishk <shoukunhuai@...>
Date:   2018-03-05T09:32:29Z

ARTEMIS-1728 Reclaim memory when page cursor complete

Free hash set used to hold page position for acks and removed refs.
The two set is cleared, but they still hold a big array.

It is safe to replace the old one with empty set.




---


[GitHub] activemq-artemis issue #1899: ARTEMIS-1700 Fixed deadlock in paging state

2018-02-26 Thread shoukunhuai
Github user shoukunhuai commented on the issue:

https://github.com/apache/activemq-artemis/pull/1899
  
So it is a mistake to use global thread pool instead of io thread pool for 
page cursor.

But this does not fix our problem, as you can see 
```
"Thread-274672 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5@4e91d63f)"
 Id=274703 TIMED_WAITING on java.util.concurrent.CountDownLatch$Sync@5c416651
at sun.misc.Unsafe.park(Native Method)
-  waiting on java.util.concurrent.CountDownLatch$Sync@5c416651
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at 
org.apache.activemq.artemis.core.journal.impl.SimpleWaitIOCallback.waitCompletion(SimpleWaitIOCallback.java:73)
at 
org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.waitCompletion(OperationContextImpl.java:313)
at 
org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.waitOnOperations(AbstractJournalStorageManager.java:294)
at 
org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.storeBookmark(PageCursorProviderImpl.java:539)
at 
org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.cleanupComplete(PageCursorProviderImpl.java:431)
at 
org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.cleanup(PageCursorProviderImpl.java:383)
-  locked 
org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl@4f8d6d9a
at 
org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl$1.run(PageCursorProviderImpl.java:291)
at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$ExecutorTask.run(ProcessorBase.java:53)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Number of locked synchronizers = 2
- java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@613d010
- java.util.concurrent.ThreadPoolExecutor$Worker@4c03ae59
```
When exit paging state, we will store bookmark for each page subscription 
and wait until all callbacks done.
I believe this may happen even running in io thread as long as 
singleThreadExecutor in AbstractJournalStrorageManager use thread from global 
server thread pool.


---


[GitHub] activemq-artemis pull request #1894: ARTEMIS-1700 Fixed deadlock in paging s...

2018-02-23 Thread shoukunhuai
Github user shoukunhuai commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1894#discussion_r170407785
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/persistence/impl/journal/AbstractJournalStorageManager.java
 ---
@@ -1488,7 +1494,13 @@ public synchronized void start() throws Exception {
 
   beforeStart();
 
-  singleThreadExecutor = executorFactory.getExecutor();
+  ThreadFactory tFactory = AccessController.doPrivileged(new 
PrivilegedAction() {
+ @Override
+ public ThreadFactory run() {
+return new ActiveMQThreadFactory("ActiveMQ-journal-server-" + 
this.toString(), true, ClientSessionFactoryImpl.class.getClassLoader());
+ }
+  });
+  singleThreadExecutor = Executors.newSingleThreadExecutor(tFactory);
--- End diff --


https://github.com/apache/activemq-artemis/blob/d6d895c558cc104475188d942473771418b5e3e6/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/group/impl/LocalGroupingHandler.java#L109-L117

@clebertsuconic the comment said we need an executor out side the pool


---


[GitHub] activemq-artemis pull request #1894: ARTEMIS-1700 Fixed deadlock in paging s...

2018-02-23 Thread shoukunhuai
Github user shoukunhuai commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1894#discussion_r170405747
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/persistence/impl/journal/AbstractJournalStorageManager.java
 ---
@@ -1488,7 +1494,13 @@ public synchronized void start() throws Exception {
 
   beforeStart();
 
-  singleThreadExecutor = executorFactory.getExecutor();
+  ThreadFactory tFactory = AccessController.doPrivileged(new 
PrivilegedAction() {
+ @Override
+ public ThreadFactory run() {
+return new ActiveMQThreadFactory("ActiveMQ-journal-server-" + 
this.toString(), true, ClientSessionFactoryImpl.class.getClassLoader());
+ }
+  });
+  singleThreadExecutor = Executors.newSingleThreadExecutor(tFactory);
--- End diff --

We are running 2.4.0
See https://issues.apache.org/jira/browse/ARTEMIS-1700
There is a artemis.log attached.


---


[GitHub] activemq-artemis pull request #1894: ARTEMIS-1700 Fixed deadlock in paging s...

2018-02-23 Thread shoukunhuai
Github user shoukunhuai commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1894#discussion_r170405300
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/persistence/impl/journal/AbstractJournalStorageManager.java
 ---
@@ -1488,7 +1494,13 @@ public synchronized void start() throws Exception {
 
   beforeStart();
 
-  singleThreadExecutor = executorFactory.getExecutor();
+  ThreadFactory tFactory = AccessController.doPrivileged(new 
PrivilegedAction() {
+ @Override
+ public ThreadFactory run() {
+return new ActiveMQThreadFactory("ActiveMQ-journal-server-" + 
this.toString(), true, ClientSessionFactoryImpl.class.getClassLoader());
+ }
+  });
+  singleThreadExecutor = Executors.newSingleThreadExecutor(tFactory);
--- End diff --

We believe this happens when
- using fixed thread pool which is the default, and
- an address has many producers, more then thread pool's size, and
- the address is about to exit paging state


---


[GitHub] activemq-artemis pull request #1894: ARTEMIS-1700 Fixed deadlock in paging s...

2018-02-23 Thread shoukunhuai
Github user shoukunhuai commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1894#discussion_r170405082
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/persistence/impl/journal/AbstractJournalStorageManager.java
 ---
@@ -1488,7 +1494,13 @@ public synchronized void start() throws Exception {
 
   beforeStart();
 
-  singleThreadExecutor = executorFactory.getExecutor();
+  ThreadFactory tFactory = AccessController.doPrivileged(new 
PrivilegedAction() {
+ @Override
+ public ThreadFactory run() {
+return new ActiveMQThreadFactory("ActiveMQ-journal-server-" + 
this.toString(), true, ClientSessionFactoryImpl.class.getClassLoader());
+ }
+  });
+  singleThreadExecutor = Executors.newSingleThreadExecutor(tFactory);
--- End diff --

What if the pool is full?
In our case, the pool is a 60 thread fixed pool.
One of the thread is doing page cleanup, and try to exit paging state, it 
holds the lock in paging store. All other 59 threads is blocked on the lock, 
trying to page.
While cleanup, we need to store bookmark in journal for each page 
subscription, then wait until completed.
In log, stored equals to storeLineUp, but there are pending tasks(there are 
going to count down latch cleanup thread is waiting on), the deadlock happened.
```
16:44:28,930 AMQ222024: Could not complete operations on IO context 
OperationContextImpl [1251391301] [minimalStore=1, storeLineUp=2, stored=2, 
minimalReplicated=0, replicationLineUp=0, replicated=0, paged=0, minimalPage=0, 
pageLineUp=0, errorCode=-1, errorMessage=null, executorsPending=3, 
executor=OrderedExecutor(tasks=[org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1@4d09259,
 
org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1@54b73dc4,
 
org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1@640495d4])]
```


---


[GitHub] activemq-artemis pull request #1827: ARTEMIS-1650 Improve paged message ackn...

2018-02-05 Thread shoukunhuai
Github user shoukunhuai commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1827#discussion_r165955807
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/RefsOperation.java
 ---
@@ -152,14 +153,14 @@ protected void rollbackRedelivery(Transaction tx, 
MessageReference ref, long tim
@Override
public void afterCommit(final Transaction tx) {
   for (MessageReference ref : refsToAck) {
- synchronized (ref.getQueue()) {
--- End diff --

hi @michaelandrepearce I get the sync back, i think we can discuss this 
later, may be another PR.


---


[GitHub] activemq-artemis pull request #1827: ARTEMIS-1650 Improve paged message ackn...

2018-02-05 Thread shoukunhuai
Github user shoukunhuai commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1827#discussion_r165955062
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/server/MessageReference.java
 ---
@@ -38,6 +38,10 @@ public static MessageReference createReference(Message 
encode, final Queue queue
 
Message getMessage();
 
+   default long getMessageID() {
--- End diff --

I have removed default method.


---


[GitHub] activemq-artemis pull request #1827: ARTEMIS-1650 Improve paged message ackn...

2018-02-01 Thread shoukunhuai
Github user shoukunhuai commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1827#discussion_r16574
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/RefsOperation.java
 ---
@@ -152,14 +153,14 @@ protected void rollbackRedelivery(Transaction tx, 
MessageReference ref, long tim
@Override
public void afterCommit(final Transaction tx) {
   for (MessageReference ref : refsToAck) {
- synchronized (ref.getQueue()) {
--- End diff --

I do this because in QueueImpl::postAcknowledge, the only thing that 
changed queue's state is decrement queue's delivering count, which i believe is 
thread safe.

While QueueImpl:postRollback is different, it will not return messages if 
purge on no consumers and there is no consumer at that time, since add/remove 
consumer will lock the queue. So we must lock the queue when do post rollback.

If no body is 100% sure about this, i will revert this change, since other 
change has improved a lot.


---


[GitHub] activemq-artemis pull request #1827: Improve paged message acknowledge

2018-02-01 Thread shoukunhuai
Github user shoukunhuai commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1827#discussion_r165284853
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/server/MessageReference.java
 ---
@@ -38,6 +38,10 @@ public static MessageReference createReference(Message 
encode, final Queue queue
 
Message getMessage();
 
+   default long getMessageID() {
--- End diff --

Thanks, will change it later


---


[GitHub] activemq-artemis pull request #1827: Improve paged message acknowledge

2018-01-31 Thread shoukunhuai
Github user shoukunhuai commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/1827#discussion_r165050956
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/RefsOperation.java
 ---
@@ -152,14 +153,14 @@ protected void rollbackRedelivery(Transaction tx, 
MessageReference ref, long tim
@Override
public void afterCommit(final Transaction tx) {
   for (MessageReference ref : refsToAck) {
- synchronized (ref.getQueue()) {
-queue.postAcknowledge(ref);
- }
+ queue.postAcknowledge(ref);
   }
 
   if (pagedMessagesToPostACK != null) {
  for (MessageReference refmsg : pagedMessagesToPostACK) {
-decrementRefCount(refmsg);
--- End diff --

@cshannon thanks.


---


[GitHub] activemq-artemis pull request #1827: Improve paged message acknowledge

2018-01-30 Thread shoukunhuai
GitHub user shoukunhuai opened a pull request:

https://github.com/apache/activemq-artemis/pull/1827

Improve paged message acknowledge



Cache `messageID`, `transactionID` and `isLargeMessage`
in PagedReference, so that when acknowledge, we do not have to
get PagedMessage which may be GCed and cause re-read entire page.

Also, in RefsOperation::afterCommit do not lock queue when do
postAcknowledge. I think this method is thread safe.

And last, decrement ref count after commit only if it is a paged large 
message.
Large message depends on this operation to delete file. Other implements 
does nothing in this method.

All those change is to avoid get PagedMessage when acknowledge.

Let me know if there is anything wrong.

@clebertsuconic would you please help review this PR? I will create jira 
issue when necessary.

Attached is thread dump when doing stress test, we found most of server 
thread is blocked on page operation.
We use Artemis 2.4.0, run 10 jmeter engine, for each engine started 100 
publisher and 50 subscriber.
We changed jmeter to use shared subscription, and blockOnAcknowledge is set 
true.


[jstack.txt](https://github.com/apache/activemq-artemis/files/1680132/jstack.txt)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shoukunhuai/activemq-artemis enhance-paged-ack

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/activemq-artemis/pull/1827.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1827


commit 52d63a5802353966236f1f9eec85d577e1941a12
Author: huaishk <shoukunhuai@...>
Date:   2018-01-31T01:48:43Z

Improve paged message acknowledge

Cache `messageID`, `transactionID` and `isLargeMessage`
in PagedReference, so that when acknowledge, we do not have to
get PagedMessage which may be GCed and cause re-read entire page.

Also, in RefsOperation::afterCommit do not lock queue when do
postAcknowledge. I think this method is thread safe.




---


[GitHub] activemq-artemis issue #1742: ARTEMIS-1570 Flush appendExecutor before take ...

2018-01-24 Thread shoukunhuai
Github user shoukunhuai commented on the issue:

https://github.com/apache/activemq-artemis/pull/1742
  
@clebertsuconic sorry, almost forgot this, being busy on something else.
Looks like this issue was resolved?


---


[GitHub] activemq-artemis issue #1747: [WIP] NO-JIRA DISCUSS Replica use operationcon...

2018-01-11 Thread shoukunhuai
Github user shoukunhuai commented on the issue:

https://github.com/apache/activemq-artemis/pull/1747
  
@jbertram sure, i will close it, and thanks all.


---


[GitHub] activemq-artemis pull request #1747: [WIP] NO-JIRA DISCUSS Replica use opera...

2018-01-11 Thread shoukunhuai
Github user shoukunhuai closed the pull request at:

https://github.com/apache/activemq-artemis/pull/1747


---