subject:"\[jira\] \[Work logged\] \(ARTEMIS\-3045\) ReplicationManager can batch sent replicated packets"

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-03-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=571134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571134
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 24/Mar/21 13:08
Start Date: 24/Mar/21 13:08
Worklog Time Spent: 10m 
  Work Description: clebertsuconic merged pull request #3514:
URL: https://github.com/apache/activemq-artemis/pull/3514


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571134)
Time Spent: 14.5h  (was: 14h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: 2.12.png, 2.17.png, ARTEMIS-3045.png
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-03-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=570767=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-570767
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 23/Mar/21 20:37
Start Date: 23/Mar/21 20:37
Worklog Time Spent: 10m 
  Work Description: franz1981 opened a new pull request #3514:
URL: https://github.com/apache/activemq-artemis/pull/3514


   This has been introduced by 
https://github.com/apache/activemq-artemis/pull/3392


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 570767)
Time Spent: 14h 20m  (was: 14h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: 2.12.png, 2.17.png, ARTEMIS-3045.png
>
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554910
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 20:24
Start Date: 19/Feb/21 20:24
Worklog Time Spent: 10m 
  Work Description: asfgit closed pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554910)
Time Spent: 14h 10m  (was: 14h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554873
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 18:59
Start Date: 19/Feb/21 18:59
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-782275758


   @clebertsuconic @michaelandrepearce thanks folks for your help: this is now 
ready to be merged and a follow up commit will arrive next ;)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554873)
Time Spent: 14h  (was: 13h 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 14h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554800=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554800
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 16:04
Start Date: 19/Feb/21 16:04
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-782170503


   ok going to fix the last bits now so can be merged: we then we can improve 
it later ;)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554800)
Time Spent: 13h 50m  (was: 13h 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554788
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 14:59
Start Date: 19/Feb/21 14:59
Worklog Time Spent: 10m 
  Work Description: michaelandrepearce commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-782127793


   I agree with @clebertsuconic best to merge this now. Follow ups can be 
future prs else it be a never ending endeavour. Great work 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554788)
Time Spent: 13h 40m  (was: 13.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554784
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 14:48
Start Date: 19/Feb/21 14:48
Worklog Time Spent: 10m 
  Work Description: clebertsuconic commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-782120992


   Can you do the following changes:
   
   - not use System.currentTimeMillis() on the ReplicaitonManager
   - use a separate boolean instead of time = -1
   - Squash everything in a single commit
   
   Then we can merge this, as we had done a lot of tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554784)
Time Spent: 13.5h  (was: 13h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554702
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 10:35
Start Date: 19/Feb/21 10:35
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-781987267


   Just few thoughts to improve this:
   
   - allocation of Runnable could be reduced by not submitting new ones if 
we're awaiting to be resumed OR because we're still batching on the event loop 
(and we're already picking packets to be batched)
   - we can limit the time budget to perform batching while allowing to 
continue it after a flush ie `writev` socket call, 'cause very fast networks 
are mostly empty so they could keep on batching right after
   
   This should reduce higher percentiles of latencies due to less GC and some 
more fairness (batch cannot exceed time budget and pending reads could be 
picked)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554702)
Time Spent: 13h 10m  (was: 13h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554703=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554703
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 10:35
Start Date: 19/Feb/21 10:35
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-781987267


   Just few thoughts to improve this:
   
   - allocation of Runnable could be reduced by not submitting new ones if 
we're awaiting to be resumed OR because we're still batching on the event loop 
(and we're already picking packets to be batched)
   - we can limit the time budget to perform batching while allowing to 
continue it after a successful flush ie `writev` socket call, 'cause very fast 
networks are mostly empty so they could keep on batching right after
   
   This should reduce higher percentiles of latencies due to less GC and some 
more fairness (batch cannot exceed time budget and pending reads could be 
picked)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554703)
Time Spent: 13h 20m  (was: 13h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554700
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 10:31
Start Date: 19/Feb/21 10:31
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-781987267


   Just few thoughts to improve this:
   
   - allocation of Runnable could be reduced by not submitting new ones if the 
channel is not writable and we're awaiting to be resumed OR because we're 
batching on the event loop and we're already picking packets to be batched
   - we can limit the time budget to perform batching while allowing to 
continue it after a flush ie `writev` socket call, 'cause very fast networks 
are mostly empty so they could keep on batching right after
   
   This should reduce higher percentiles of latencies due to less GC and some 
more fairness (batch cannot exceed time budget and pending reads could be 
picked)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554700)
Time Spent: 13h  (was: 12h 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554652
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 08:38
Start Date: 19/Feb/21 08:38
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-781920540


   @michaelandrepearce 
   
   This part of my same answer at the end of the comment explain the latter 
point:
   > I see that ChannelImpl.CHANNEL_ID.REPLICATION Artemis Channel is always 
created with a confWindowSize == -1 so it seems it won't have any blocking 
behaviour caused by the resendCache or responseAsyncCache and given that it 
cannot failover nor is using sendBlocking, its ChannelImpl::lock isn't used, 
but is still brittle and I would like to enforce it.
   
   So, in short, it shouldn't block, but I would like to enforce it somehow to 
save in the future that it could ever happen...
   
   While re the former point
   > what happen to the Netty channel writability with racing sends on a 
different artemis Channel (eg PING)?
   
   It means we can get:
   -  premature stop while batching (by the PING packet size, so we won't loose 
that much really) 
   -  `sendReplicatedPackets(true)` ie resume, believing to be writable again 
without being able to send nothing, that will end by resetting the 
`notWritableFrom` counter - VERY unlikely to happen, given that PINGs are 
pretty small and there is plenty of room between low watermark and high 
watermarks of Netty writability - 
   
   Both cases won't harm correctness, but introduce just un-necessary checks...
   The ideal thing would be to send the `PING` from inside the event loop using 
the scheduled executor API exposed by it: it would save sending pings or 
whatever heartbeat mechanism if we're already sending replicated packets, 
because unnecessary really...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554652)
Time Spent: 12h 50m  (was: 12h 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554651
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 08:37
Start Date: 19/Feb/21 08:37
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-781920540


   @michaelandrepearce 
   
   This part of my same answer at the end of the comment explain the latter 
point:
   > I see that ChannelImpl.CHANNEL_ID.REPLICATION Artemis Channel is always 
created with a confWindowSize == -1 so it seems it won't have any blocking 
behaviour caused by the resendCache or responseAsyncCache and given that it 
cannot failover nor is using sendBlocking, its ChannelImpl::lock isn't used, 
but is still brittle and I would like to enforce it.
   
   So, in short, it shouldn't block, but I would like to enforce it somehow to 
save in the future that it could ever happen...
   
   While re the former point
   > what happen to the Netty channel writability with racing sends on a 
different artemis Channel (eg PING)?
   
   It means we can get:
   -  premature stop while batching (by the PING packet size, so we won't loose 
that much really) 
   -  `sendReplicatedPackets(true)` ie resume, believing to be writable again 
without being able to send nothing, that will end by resetting the 
`notWritableFrom` counter despite being not writable from some time - VERY 
unlikely to happen, given that PINGs are pretty small and there is plenty of 
room between low watermark and high watermarks of Netty writability - 
   
   Both cases won't harm correctness, but introduce just un-necessary checks...
   The ideal thing would be to send the `PING` from inside the event loop using 
the scheduled executor API exposed by it: it would save sending pings or 
whatever heartbeat mechanism if we're already sending replicated packets, 
because unnecessary really...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554651)
Time Spent: 12h 40m  (was: 12.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554650
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 08:32
Start Date: 19/Feb/21 08:32
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-781920540


   @michaelandrepearce 
   
   This part of my same answer at the end of the comment explain the latter 
point:
   > I see that ChannelImpl.CHANNEL_ID.REPLICATION Artemis Channel is always 
created with a confWindowSize == -1 so it seems it won't have any blocking 
behaviour caused by the resendCache or responseAsyncCache and given that it 
cannot failover nor is using sendBlocking, its ChannelImpl::lock isn't used, 
but is still brittle and I would like to enforce it.
   
   So, in short, it shouldn't block, but I would like to enforce it somehow to 
save in the future that it could ever happen...
   
   While re the former point
   > what happen to the Netty channel writability with racing sends on a 
different artemis Channel (eg PING)?
   
   It means we can get:
   -  premature stop while batching (by the PING packet size, so we won't loose 
that much really) 
   -  `sendReplicatedPackets(true)` ie resume, believing to be writable again, 
that will end by resetting the `notWritableFrom` counter despite being not 
writable from some time - VERY unlikely to happen, given that PINGs are pretty 
small and there is plenty of room between low watermark and high watermarks of 
Netty writability - 
   
   Both cases won't harm correctness, but introduce just un-necessary checks...
   The ideal thing would be to send the `PING` from inside the event loop using 
the scheduled executor API exposed by it: it would save sending pings or 
whatever heartbeat mechanism if we're already sending replicated packets, 
because unnecessary really...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554650)
Time Spent: 12.5h  (was: 12h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554648
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 08:31
Start Date: 19/Feb/21 08:31
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-781920540


   @michaelandrepearce 
   
   This part of my same answer at the end of the comment explain the latter 
point:
   > I see that ChannelImpl.CHANNEL_ID.REPLICATION Artemis Channel is always 
created with a confWindowSize == -1 so it seems it won't have any blocking 
behaviour caused by the resendCache or responseAsyncCache and given that it 
cannot failover nor is using sendBlocking, its ChannelImpl::lock isn't used, 
but is still brittle and I would like to enforce it.
   
   So, in short, it shouldn't block, but I would like to enforce it somehow to 
save in the future that it could ever happen...
   
   While re the former point
   > what happen to the Netty channel writability with racing sends on a 
different artemis Channel (eg PING)?
   
   It means we can get:
   -  premature stop while batching (by the PING packet size, so we won't loose 
that much really) 
   -  `sendReplicatedPackets(true)` ie resume, believing to be writable again, 
that will end by resetting the `notWritableFrom` counter despite being not 
writable from some time
   
   Both cases won't harm correctness, but introduce just un-necessary checks...
   The ideal thing would be to send the `PING` from inside the event loop using 
the scheduled executor API exposed by it: it would save sending pings or 
whatever heartbeat mechanism if we're already sending replicated packets, 
because unnecessary really...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554648)
Time Spent: 12h 20m  (was: 12h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554636
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 08:03
Start Date: 19/Feb/21 08:03
Worklog Time Spent: 10m 
  Work Description: michaelandrepearce commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-781906121


   Hi @franz1981 i guess the link from comment doesnt work. my question sorry 
was around what was the resolution for the below as it isnt clear to me. -->
   
   
   > The change seems ok CI-wise, but, given that's running its logic on the 
Netty event loop we need a couple of questions to be answered:
   > 
   > * what happen to the Netty channel writability with racing sends on a 
different artemis `Channel` (eg `PING`)?
   >   The `REPLICATION` channel isn't the only one using the underlying 
`Netty` connection and that means that being awaken that a Netty channel is 
writable again can make it not writable while attempting to write the 
replication packets, because a concurrent write has filled it again!
   > * `ChannelImpl::send` can block? If yes, it can cause some trouble because 
the cluster connection isn't the only citizen of the Netty event loop thread 
and this can cause other connections to starve (to not mention the same 
connection responses to be read!)
   > 
   > @clebertsuconic @jbertram @gtully
   > I believe that answering to these questions is key to be sure that's a 
safe change...
   > 
   > I see that `ChannelImpl.CHANNEL_ID.REPLICATION` Artemis `Channel` is 
always created with a `confWindowSize == -1` so it seems it won't have any 
blocking behaviour caused by the `resendCache` or `responseAsyncCache` and 
given that it cannot failover nor is using `sendBlocking`, its 
`ChannelImpl::lock` isn't used, but is still brittle and I would like to 
enforce it.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554636)
Time Spent: 12h 10m  (was: 12h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554627
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 07:37
Start Date: 19/Feb/21 07:37
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-781893787


   Re the status of this PR, is done but I would like to:
   - enforce not blocking ops on Artemis channel on Netty event loop
   - save too many Runnable instances to be created (now > of the packets)
   - [maybe] stop batching depending on time spent on the event loop (eg 100 ms 
at max, although dangerous due to GC!)
   The last 2 points could be the reason of the increased tail latency, because:
   - bigger GC pauses due to the backlog of Runnables
   - bigger replicated batches can delay reading backup responses and answering 
back to clients (both share the same event loop)
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554627)
Time Spent: 11h 50m  (was: 11h 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554628
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 07:37
Start Date: 19/Feb/21 07:37
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-781893787


   Re the status of this PR, is done but I would like to:
   - enforce not blocking ops on Artemis channel on Netty event loop
   - save too many Runnable instances to be created (now > of the packets)
   - [maybe] stop batching depending on time spent on the event loop (eg 100 ms 
at max, although dangerous due to GC!)
   - 
   The last 2 points could responsible of the increased tail latency, because:
   - bigger GC pauses due to the backlog of Runnables
   - bigger replicated batches can delay reading backup responses and answering 
back to clients (both share the same event loop)
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554628)
Time Spent: 12h  (was: 11h 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 12h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554602=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554602
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 06:16
Start Date: 19/Feb/21 06:16
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-781852241


   Many thanks @michaelandrepearce to have tried it and happy that work as 
designed yay!
   
   > Just one other note to add, i wasn't able to get enough throughput, to 
stress the network enough
   
   But in term of throughput you've got some improvement if compared to the 
original version? Or the test was trying to keep a stable throughput while 
measuring end to end latencies?
   If both cases I am still happy, because the primary purpose was to save 
resources and possibly be faster too.
   
   > A quick question reading through the review chat, its not clear to me what 
the resolution to #3392 (comment) is? Was there one? Or is it still pending...
   
   The regression was probably caused by using a different topology on the 
benchmark (vs the 2.2.0 baseline), and I am addressing this by chatting 
directly with the bench author.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554602)
Time Spent: 11h 40m  (was: 11.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554598
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 06:11
Start Date: 19/Feb/21 06:11
Worklog Time Spent: 10m 
  Work Description: franz1981 closed pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554598)
Time Spent: 11h 10m  (was: 11h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554600=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554600
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 06:11
Start Date: 19/Feb/21 06:11
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-781852241


   > Just one other note to add, i wasn't able to get enough throughput, to 
stress the network enough
   
   But in term of throughput you've got some improvement if compared to the 
original version? Or the test was trying to keep a stable throughput while 
measuring end to end latencies?
   If both cases I am still happy, because the primary purpose was to save 
resources and possibly be faster too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554600)
Time Spent: 11.5h  (was: 11h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554599
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 19/Feb/21 06:11
Start Date: 19/Feb/21 06:11
Worklog Time Spent: 10m 
  Work Description: franz1981 opened a new pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392


   https://issues.apache.org/jira/browse/ARTEMIS-3045



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554599)
Time Spent: 11h 20m  (was: 11h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554498
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 18/Feb/21 22:15
Start Date: 18/Feb/21 22:15
Worklog Time Spent: 10m 
  Work Description: michaelandrepearce edited a comment on pull request 
#3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-78154


   Hi @franz1981, i finally got a few hours free, to build and deploy this on 
some hardware i have available to me. I ran some of the scale up test suites 
and confirm that comparing against 2.16.0 (the last upstream version i had to 
hand on those servers) there is a significant shift left on the latency curve 
(aka improvement), a bit of a tail bump increase though, but more importantly 
we did see the improved resource utilisation that you mentioned, with cpu and 
network utilisation being lower, you're an eco warrior reducing our power 
consumption in the DC! every little helps!
   
   A quick question reading through the review chat, its not clear to me what 
the resolution to 
https://github.com/apache/activemq-artemis/pull/3392#issuecomment-778040106 is? 
Was there one? Or is it still pending...
   
   Just one other note to add, i wasn't able to get enough throughput, to 
stress the network enough, to see what happend if there was network back 
pressure, the links between our the servers i had are in some of our up-rated 
core switches (e.g. network capacity is over provisioned), so in my setup its 
unlikely to really happen. (famous last words, i can now seeing that as a 
reason for my next prod outage)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554498)
Time Spent: 11h  (was: 10h 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 11h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=554493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554493
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 18/Feb/21 22:08
Start Date: 18/Feb/21 22:08
Worklog Time Spent: 10m 
  Work Description: michaelandrepearce commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-78154


   Hi @franz1981, i finally got a few hours free, to build and deploy this on 
some hardware i have available to me. I ran some of the scale up test suites 
and confirm that comparing against 2.16.0 (the last upstream version i had to 
hand on those servers) there is a significant shift left on the latency curve 
(aka improvement), a bit of a tail bump increase though, but more importantly 
we did see the improved resource utilisation that you mentioned, with cpu and 
network utilisation being lower, you're an eco warrior reducing our power 
consumption in the DC! every little helps!
   
   A quick question reading through the review chat, its not clear to me what 
the resolution to 
https://github.com/apache/activemq-artemis/pull/3392#issuecomment-778040106 is? 
Was there one? Or is it still pending...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554493)
Time Spent: 10h 50m  (was: 10h 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=553514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553514
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 17/Feb/21 10:16
Start Date: 17/Feb/21 10:16
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-780451492


   Re the performance regression, I have commented on their PR (see 
https://github.com/softwaremill/mqperf/pull/47#issuecomment-778280388) that 
"probably" (it's still under investigation in my company TBH) it was due to 
excessive notifications among cluster nodes or just unhappy distribution of 
consumers, causing messages redistribution.
   
   This change, instead, has been developed for 3 reasons, at the beginning:
   
   1. to address what was looking like a perf regression vs 2.2.0 (that should 
be a wrong assumption, as said)
   2. bad network bandwidth utilization with small replication packets (or with 
high latency networks)
   3. to allow measuring the backlog of pending packets in case of slow 
network/high latency network
   
   The last 2 points are still valid, to not mention that I'me getting a huge 
improvement in CPU utilization, that means that can be used elsewhere in the 
broker :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553514)
Time Spent: 10h 40m  (was: 10.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=553513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553513
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 17/Feb/21 10:15
Start Date: 17/Feb/21 10:15
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-780451492


   Re the performance regression, I have commented on their PR (see 
https://github.com/softwaremill/mqperf/pull/47#issuecomment-778280388) that 
"probably" (it's still under investigation in my company TBH) it was due to 
excessive notifications among cluster nodes or just unhappy distribution of 
consumers, causing messages redistribution.
   
   This change, instead, has been developed for 3 reasons, at the beginning:
   
   1. to address what was looking like a perf regression vs 2.2.0 (that should 
be a wrong assumption, as said)
   2. an issue of network bandwidth usage with small replication packets (or 
with high latency networks)
   3. to allow measuring the backlog of pending packets in case of slow 
network/high latency network
   
   The last 2 points are still valid, to not mention that I'me getting a huge 
improvement in CPU utilization, that means that can be used elsewhere in the 
broker :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553513)
Time Spent: 10.5h  (was: 10h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=553512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553512
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 17/Feb/21 10:15
Start Date: 17/Feb/21 10:15
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-780451492


   Re the performance regression, I have commented on their PR (see 
https://github.com/softwaremill/mqperf/pull/47#issuecomment-778280388) that 
"probably" (it's still under investigation on my company TBH) it was due to 
excessive notifications among cluster nodes or just unhappy distribution of 
consumers, causing messages redistribution.
   
   This change, instead, has been developed for 3 reasons, at the beginning:
   
   1. to address what was looking like a perf regression vs 2.2.0 (that should 
be a wrong assumption, as said)
   2. an issue of network bandwidth usage with small replication packets (or 
with high latency networks)
   3. to allow measuring the backlog of pending packets in case of slow 
network/high latency network
   
   The last 2 points are still valid, to not mention that I'me getting a huge 
improvement in CPU utilization, that means that can be used elsewhere in the 
broker :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553512)
Time Spent: 10h 20m  (was: 10h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=553501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553501
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 17/Feb/21 09:57
Start Date: 17/Feb/21 09:57
Worklog Time Spent: 10m 
  Work Description: michaelandrepearce commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-780440917


   +1 with the idea of using the external softwaremill benchmark, as the real 
validation we're resolved the regression in perf that was seen in their recent 
re-run, where Artemis dropped perf compared to previous year



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553501)
Time Spent: 10h 10m  (was: 10h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-17 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=553494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-553494
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 17/Feb/21 09:29
Start Date: 17/Feb/21 09:29
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-780424636


   The additional thinking time before merging this is beneficial :)
   
   There is a test scenario I would like to run this PR against and is similar 
to https://github.com/softwaremill/mqperf: with a big amount of clients 
connected, enough to cause some of these connections to be distributed in the 
same event loop of the replication, to check how the clients or the replication 
will be impacted.
   Given that we submit replication packets into Netty outbound buffer until 
!writable, it shouldn't affect too much the scalability of the system, because 
it will occupy the event loop for a limited amount of time, but still, would be 
nice to measure it and maybe put a reasonable limit to it eg 100 ms max



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 553494)
Time Spent: 10h  (was: 9h 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=552954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-552954
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 16/Feb/21 13:12
Start Date: 16/Feb/21 13:12
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-779826480


   Yep, and seems that we should do something like this 
https://github.com/netty/netty/pull/9687/files#diff-27a399de973ee2c525680c87ae344f68fac760db0d7155764b0454c045e16e04R35
 to enable it in our tests (maybe!)
   
   The only problem is that maybe it just check that no `synchronized` or other 
blocking calls are called, and it won't check if these are contended or 
not...mmm



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 552954)
Time Spent: 9h 50m  (was: 9h 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=552953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-552953
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 16/Feb/21 13:11
Start Date: 16/Feb/21 13:11
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-779826480


   Yep, and seems that we should do something like this 
https://github.com/netty/netty/pull/9687/files#diff-27a399de973ee2c525680c87ae344f68fac760db0d7155764b0454c045e16e04R35
 to enable it in our tests (maybe!)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 552953)
Time Spent: 9h 40m  (was: 9.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=552950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-552950
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 16/Feb/21 13:09
Start Date: 16/Feb/21 13:09
Worklog Time Spent: 10m 
  Work Description: gtully commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-779825632


   ok, sure, any time we can "ask the computer" for an answer we should, that 
looks like a nice tool.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 552950)
Time Spent: 9.5h  (was: 9h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=552946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-552946
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 16/Feb/21 13:03
Start Date: 16/Feb/21 13:03
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-779822056


   @gtully 
   > maybe a little socket proxy 
   
   I'm more concerned about Artemis `ChannelImpl` blocking ops more then what 
Netty would do...
   In this case maybe would worth to try 
https://github.com/netty/netty/pull/9687 ie BlockHound to check if the Netty 
event loop won't block, wdyt?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 552946)
Time Spent: 9h 20m  (was: 9h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=552944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-552944
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 16/Feb/21 13:02
Start Date: 16/Feb/21 13:02
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-779822056


   @gtully 
   > maybe a little socket proxy 
   I'm more concerned about `ChannelImpl` blocking ops more then what Netty 
would do...
   In this case maybe would worth to try 
https://github.com/netty/netty/pull/9687 ie BlockHound to check if the Netty 
event loop won't block, wdyt?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 552944)
Time Spent: 9h  (was: 8h 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=552945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-552945
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 16/Feb/21 13:02
Start Date: 16/Feb/21 13:02
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-779822056


   @gtully 
   > maybe a little socket proxy 
   
   I'm more concerned about `ChannelImpl` blocking ops more then what Netty 
would do...
   In this case maybe would worth to try 
https://github.com/netty/netty/pull/9687 ie BlockHound to check if the Netty 
event loop won't block, wdyt?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 552945)
Time Spent: 9h 10m  (was: 9h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=552942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-552942
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 16/Feb/21 12:59
Start Date: 16/Feb/21 12:59
Worklog Time Spent: 10m 
  Work Description: gtully commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-779820507


   The best way to be sure is to write a test; maybe a little socket proxy can 
be used to block the write side of the replication channel to force blocking at 
some point and ask the computer how it behaves. There is a SocketProxy in the 
5.x test code base that could work in this cast I think.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 552942)
Time Spent: 8h 50m  (was: 8h 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-16 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=552922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-552922
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 16/Feb/21 12:39
Start Date: 16/Feb/21 12:39
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-778040106


   The change seems ok CI-wise, but, given that's running its logic on the 
Netty event loop we need a couple of questions to be answered:
   - what happen to the Netty channel writability with racing sends on a 
different artemis `Channel` (eg `PING`)? 
   The `REPLICATION` channel isn't the only one using the underlying `Netty` 
connection and that means that being awaken that a Netty channel is writable 
again can make it not writable while attempting to write the replication 
packets, because a concurrent write has filled it again!
   - `ChannelImpl::send` can block? If yes, it can cause some trouble because 
the cluster connection isn't the only citizen of the Netty event loop thread 
and this can cause other connections to starve (to not mention the same 
connection responses to be read!)
   
   @clebertsuconic @jbertram @gtully 
   I believe that answering to these questions is key to be sure that's a safe 
change...
   
   I see that `ChannelImpl.CHANNEL_ID.REPLICATION` Artemis `Channel` is always 
created with a `confWindowSize == -1` so it seems it won't have any blocking 
behaviour caused by the `resendCache` or `responseAsyncCache` and given that it 
cannot failover nor is using `sendBlocking`, its `ChannelImpl::lock`  isn't 
used, but is still brittle and I would like to enforce it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 552922)
Time Spent: 8h 40m  (was: 8.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=551815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-551815
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 12/Feb/21 13:48
Start Date: 12/Feb/21 13:48
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-778040106


   The change seems ok CI-wise, but, given that's running its logic on the 
Netty event loop we need a couple of questions to be answered:
   - what happen to the Netty channel writability with racing sends on a 
different artemis `Channel` (eg `PING`)? 
   The `REPLICATION` channel isn't the only one using the underlying `Netty` 
connection and that means that being awaken that a Netty channel is writable 
again can make it not writable while attempting to write the replication 
packets, because a concurrent write has filled it again!
   - `ChannelImpl::send` can block? If yes, it can cause some trouble because 
the cluster connection isn't the only citizen of the Netty event loop thread 
and this can cause other connections to starve (to not mention the same 
connection responses to be read!)
   
   @clebertsuconic @jbertram @gtully 
   I believe that answering to this questions is key to be sure that's a safe 
change...
   
   I see that `ChannelImpl.CHANNEL_ID.REPLICATION` Artemis `Channel` is always 
created with a `confWindowSize == -1` so it seems it won't have any blocking 
behaviour caused by the `resendCache` or `responseAsyncCache` and given that it 
cannot failover nor is using `sendBlocking`, its `ChannelImpl::lock`  isn't 
used, but is still brittle and I would like to enforce it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 551815)
Time Spent: 8.5h  (was: 8h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=551813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-551813
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 12/Feb/21 13:47
Start Date: 12/Feb/21 13:47
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-778040106


   The change seems ok CI-wise, but, given that's running its logic on the 
Netty event loop we need a couple of pre-conditions to be verified:
   - what happen to the Netty channel writability with racing sends on a 
different artemis `Channel` (eg `PING`)? 
   The `REPLICATION` channel isn't the only one using the underlying `Netty` 
connection and that means that being awaken that a Netty channel is writable 
again can make it not writable while attempting to write the replication 
packets, because a concurrent write has filled it again!
   - `ChannelImpl::send` can block? If yes, it can cause some trouble because 
the cluster connection isn't the only citizen of the Netty event loop thread 
and this can cause other connections to starve (to not mention the same 
connection responses to be read!)
   
   @clebertsuconic @jbertram @gtully 
   I believe that answering to this questions is key to understand how to make 
this change safely or to verify that's already ok as it is...
   
   I see that `ChannelImpl.CHANNEL_ID.REPLICATION` Artemis `Channel` is always 
created with a `confWindowSize == -1` so it seems it won't have any blocking 
behaviour caused by the `resendCache` or `responseAsyncCache` and given that it 
cannot failover nor is using `sendBlocking`, its `ChannelImpl::lock`  isn't 
used, but is still brittle and I would like to enforce it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 551813)
Time Spent: 8h 10m  (was: 8h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=551814=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-551814
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 12/Feb/21 13:47
Start Date: 12/Feb/21 13:47
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-778040106


   The change seems ok CI-wise, but, given that's running its logic on the 
Netty event loop we need a couple of questions to be answered:
   - what happen to the Netty channel writability with racing sends on a 
different artemis `Channel` (eg `PING`)? 
   The `REPLICATION` channel isn't the only one using the underlying `Netty` 
connection and that means that being awaken that a Netty channel is writable 
again can make it not writable while attempting to write the replication 
packets, because a concurrent write has filled it again!
   - `ChannelImpl::send` can block? If yes, it can cause some trouble because 
the cluster connection isn't the only citizen of the Netty event loop thread 
and this can cause other connections to starve (to not mention the same 
connection responses to be read!)
   
   @clebertsuconic @jbertram @gtully 
   I believe that answering to this questions is key to understand how to make 
this change safely or to verify that's already ok as it is...
   
   I see that `ChannelImpl.CHANNEL_ID.REPLICATION` Artemis `Channel` is always 
created with a `confWindowSize == -1` so it seems it won't have any blocking 
behaviour caused by the `resendCache` or `responseAsyncCache` and given that it 
cannot failover nor is using `sendBlocking`, its `ChannelImpl::lock`  isn't 
used, but is still brittle and I would like to enforce it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 551814)
Time Spent: 8h 20m  (was: 8h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=551728=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-551728
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 12/Feb/21 08:37
Start Date: 12/Feb/21 08:37
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-778040106







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 551728)
Time Spent: 7h 50m  (was: 7h 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=551729=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-551729
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 12/Feb/21 08:37
Start Date: 12/Feb/21 08:37
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-778040106


   The change seems ok if that that running on the Netty event loop need a 
couple of pre-conditions to be verified:
   - what happen to the Netty channel writability with racing sends on a 
different artemis `Channel` (eg `PING`)? 
   The `REPLICATION` channel isn't the only one using the underlying `Netty` 
connection and that means that being awaken that a Netty channel is writable 
again can make it not writable while attempting to write the replication 
packets, because a concurrent write has filled it again!
   - `ChannelImpl::send` can block? If yes, it can cause some trouble because 
the cluster connection isn't the only citizen of the Netty event loop thread 
and this can cause other connections to starve (to not mention the same 
connection responses to be read!)
   
   @clebertsuconic @jbertram @gtully 
   I believe that answering to this questions is key to understand how to make 
this change safely or to verify that's already ok as it is...
   
   I see that `ChannelImpl.CHANNEL_ID.REPLICATION` Artemis `Channel` is always 
created with a `confWindowSize == -1` so it seems it won't have any blocking 
behaviour caused by the `resendCache` or `responseAsyncCache` and given that it 
cannot failover nor is using `sendBlocking`, its `ChannelImpl::lock`  isn't 
used, but is still brittle and I would like to enforce it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 551729)
Time Spent: 8h  (was: 7h 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=551723=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-551723
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 12/Feb/21 08:25
Start Date: 12/Feb/21 08:25
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-778040106


   The change seems ok if that that running on the Netty event loop need a 
couple of pre-conditions to be verified:
   - what happen to the Netty channel writability with racing sends on a 
different artemis `Channel` (eg `PING`)? 
   The `REPLICATION` channel isn't the only one using the underlying `Netty` 
connection and that means that being awaken that a Netty channel is writable 
again can make it not writable while attempting to write the replication 
packets, because a concurrent write has filled it again!
   - `ChannelImpl::send` can block? If yes, it can cause some trouble because 
the cluster connection isn't the only citizen of the Netty event loop thread 
and this can cause other connections to starve (to not mention the same 
connection responses to be read!)
   
   @clebertsuconic @jbertram @gtully 
   I believe that answering to this questions is key to understand how to make 
this change safely or to verify that's already ok as it is...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 551723)
Time Spent: 7h 40m  (was: 7.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=551716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-551716
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 12/Feb/21 07:58
Start Date: 12/Feb/21 07:58
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-778040106


   The change seems ok if that that running on the Netty event loop need a 
couple of pre-conditions to be verified:
   - what happen to the Netty channel writability with racing sends on a 
different artemis `Channel` (eg `PING`)? 
   The `REPLICATION` channel isn't the only one using the underlying `Netty` 
connection and that means that being awaken that a Netty channel is writable 
again can make it not writable while attempting to write the replication 
packets, because the concurrent write has filled it again!
   - `ChannelImpl::send` can block? If yes, it can cause some trouble because 
the cluster connection isn't the only citizen of the Netty event loop thread 
and this can cause other connections to starve (to not mention the same 
connection responses to be read!)
   
   @clebertsuconic @jbertram @gtully 
   I believe that answering to this questions is key to understand how to make 
this change safely or to verify that's already ok as it is...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 551716)
Time Spent: 7.5h  (was: 7h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=551715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-551715
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 12/Feb/21 07:57
Start Date: 12/Feb/21 07:57
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-778040106


   The change seems ok if that that running on the Netty event loop need a 
couple of pre-conditions to be verified:
   - what happen to the Netty channel writability with racing sends on a 
different artemis `Channel` (eg `PING`)? 
   The `REPLICATION` channel isn't the only one using the underlying `Netty` 
connection and that means that being awaken that a Netty channel is writable 
again can make it not writable while attempting to write, because the 
concurrent write has filled it again!
   - `ChannelImpl::send` can block? If yes, it can cause some trouble because 
the cluster connection isn't the only citizen of the Netty event loop thread 
and this can cause other connections to starve (to not mention the same 
connection responses to be read!)
   
   @clebertsuconic @jbertram @gtully 
   I believe that answering to this questions is key to understand how to make 
this change safely or to verify that's already ok as it is...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 551715)
Time Spent: 7h 20m  (was: 7h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=551714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-551714
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 12/Feb/21 07:57
Start Date: 12/Feb/21 07:57
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-778040106


   Just a thought here on this change: the change seems ok if that that running 
on the Netty event loop need a couple of pre-conditions to be verified:
   - what happen to the Netty channel writability with racing sends on a 
different artemis `Channel` (eg `PING`)? 
   The `REPLICATION` channel isn't the only one using the underlying `Netty` 
connection and that means that being awaken that a Netty channel is writable 
again can make it not writable while attempting to write, because the 
concurrent write has filled it again!
   - `ChannelImpl::send` can block? If yes, it can cause some trouble because 
the cluster connection isn't the only citizen of the Netty event loop thread 
and this can cause other connections to starve (to not mention the same 
connection responses to be read!)
   
   @clebertsuconic @jbertram @gtully 
   I believe that answering to this questions is key to understand how to make 
this change safely or to verify that's already ok as it is...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 551714)
Time Spent: 7h 10m  (was: 7h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=550228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-550228
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 09/Feb/21 13:16
Start Date: 09/Feb/21 13:16
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-775927511


   CI seems fine with this as well :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 550228)
Time Spent: 7h  (was: 6h 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=549733=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549733
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 08/Feb/21 17:45
Start Date: 08/Feb/21 17:45
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-775322995


   Just fixed the Ci test failure: still need to add some metrics and simplify 
the code, as you rightly said, I don't think I will be able to do it for this 
release :+1: 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549733)
Time Spent: 6h 50m  (was: 6h 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=549710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549710
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 08/Feb/21 17:06
Start Date: 08/Feb/21 17:06
Worklog Time Spent: 10m 
  Work Description: clebertsuconic commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-775295471


   this is a great improvement. However I will wait the next release given that 
this might be some risk now.
   
   We can release very soon including this commit, like 2 weeks from now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549710)
Time Spent: 6h 40m  (was: 6.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=549572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549572
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 08/Feb/21 13:18
Start Date: 08/Feb/21 13:18
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-775135889


   @clebertsuconic I would like to include some metrics/counters re the amount 
of pending bytes to be sent vs the written on the wire, to help setting TCP 
buffers and Netty watermarks 
(https://netty.io/4.1/api/io/netty/channel/WriteBufferWaterMark.html)
   
   The CI seems good with it: the only real failure seems on 
`org.apache.activemq.artemis.tests.integration.cluster.failover.PageCleanupWhileReplicaCatchupTest.testPageCleanup`
 
   and I suppose that's due to the connection failure handling



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549572)
Time Spent: 6.5h  (was: 6h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=549557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549557
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 08/Feb/21 13:04
Start Date: 08/Feb/21 13:04
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-775135889


   @clebertsuconic I would like to include some metrics/counters re the amount 
of pending bytes to be sent vs the written on the wire, to help setting TCP 
buffers and Netty watermarks 
(https://netty.io/4.1/api/io/netty/channel/WriteBufferWaterMark.html)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549557)
Time Spent: 6h 20m  (was: 6h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=549550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549550
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 08/Feb/21 12:41
Start Date: 08/Feb/21 12:41
Worklog Time Spent: 10m 
  Work Description: franz1981 removed a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-775058524


   @clebertsuconic 
   My only concern here, despite the huge performance increase, is related to 
`ReplicationPageWriteMessage`: I'm not sure if we are unlucky and it could 
cause a page message to be read from the disk/cache in order to encode it for 
the replica..



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549550)
Time Spent: 6h 10m  (was: 6h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=549509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549509
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 08/Feb/21 10:57
Start Date: 08/Feb/21 10:57
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-775058524


   @clebertsuconic 
   My only concern here, despite the huge performance increase, is related to 
`ReplicationPageWriteMessage`: I'm not sure if we are unlucky and it could 
cause a page message to be read from the disk/cache in order to encode it for 
the replica..



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549509)
Time Spent: 6h  (was: 5h 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-08 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=549491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549491
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 08/Feb/21 10:25
Start Date: 08/Feb/21 10:25
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-775039853


   @michaelandrepearce Just a note on the very last version with 32 clients and 
100 bytes messages:
   ```
   **
   RUN 3EndToEnd Throughput: 30063 ops/sec
   **
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean   1090.92
   min 179.20
   50.00%  983.04
   90.00% 1482.75
   99.00% 3473.41
   99.90% 6553.60
   99.99% 9764.86
   max   16187.39
   count   32
   ```
   TLDR:
   20447 ops/sec vs 30063 ops/sec with much better latencies
   A reduction of network bandwidth of ~40%
   A reduction of Netty CPU core usage of 50%
   An overall reduction of CPU of ~1 core
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549491)
Time Spent: 5h 50m  (was: 5h 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=549078=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549078
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 06/Feb/21 13:54
Start Date: 06/Feb/21 13:54
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-774405162


   @clebertsuconic @michaelandrepearce @jbertram @brusdev @gtully 
   This change seems to perform best, but I cannot say I am satisfied, because 
I see a problem both on this and the original implementation ie we are not 
handling back-pressure back to the replicated journal caller threads.
   This remind me of 
https://cassandra.apache.org/blog/2020/09/03/improving-resiliency.html for who 
is interested..
   
   TLDR:
   - in the master implementation we can make the broker to go OOM by adding 
too many Runnables to the replication stream because it can block awaiting 
Netty writability for 30 seconds
   - in this PR first implementation we can have a JCTools q of the outstanding 
packet requests that can grow unbounded for the same reason and makes the 
broker to go OOM again
   - in this PR last implementation we can have the Netty internal outbound 
(off-heap) buffer that can grow unbounded
   
   Despite the first 2 solutions seems better at first look, because they wait 
for enough room in Netty buffer, we can still get OOM under the same 
circumstances ie while awaiting the backup to catch-up.
   
   We should:
   - track the amount of pending work for monitoring
   - propagate back-pressure
   
   Similarly to Cassandra, we could setAutoread false on client connections, 
but that means:
   - clients should handle themselves from OOM 
   - they're stopped to make progress in whatever operation they're performing 
vs the broker (not just ones that involves replication)
   
   Sadly this doesn't seem to be a viable solution, for the latter effect.
   Another solution would be to start diminishing the credits for the addresses 
that involves large msg/paging and journalling, but it's probably more feasible 
per-protocol, not a unified solution...
   
   I cannot say what's the best option here and I don't see if we are already 
using some form of end-to-end protection. 
   I see we just let the journal replication request to be enqueued, leaving 
the `OperationalContextImpl` in charge of sending back acks to clients only 
when both the local durable write and the replication has succeeded: this seem 
to prevent the same "user" to ask for a new request while awaiting the previous 
one (thinking about `MessageProducer::produce` with a durable write), but just 
increasing the amount of clients can increase the amount of enqueued requests 
to cause OOM, under load.
   
   Any thought?
   
   IMO solving this correctly can bring a huge performance increase with an 
improved stability too.
   
   Note:
   An "easy" solution to this problem would be to pause the replication 
request(s) (it should happen on an I/O thread) unblocking it when needed (or if 
the slow replication connection is closed) if the amount of pending requests 
get too high, but the risk is to have a stealthy performance killer due to the 
cost of blocking/unblocking in case of misconfigured Netty watermarks or TCP 
buffers...
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549078)
Time Spent: 5h 40m  (was: 5.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=549076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549076
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 06/Feb/21 13:54
Start Date: 06/Feb/21 13:54
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-774405162


   @clebertsuconic @michaelandrepearce @jbertram @brusdev @gtully 
   This change seems to perform best, but I cannot say I am satisfied, because 
I see a problem both on this and the original implementation ie we are not 
handling back-pressure back to the replicated journal caller threads.
   This remind me of 
https://cassandra.apache.org/blog/2020/09/03/improving-resiliency.html for who 
is interested..
   
   TLDR:
   - in the master implementation we can make the broker to go OOM by adding 
too many Runnables to the replication stream because it can block awaiting 
Netty writability for 30 seconds
   - in this PR first implementation we can have a JCTools q of the outstanding 
packet requests that can grow unbounded for the same reason and makes the 
broker to go OOM again
   - in this PR last implementation we can have the Netty internal outbound 
(off-heap) buffer that can grow unbounded
   
   Despite the first 2 solutions seems better at first look, because they wait 
for enough room in Netty buffer, we can still get OOM under the same 
circumstances ie while awaiting the backup to catch-up.
   
   We should:
   - track the amount of pending work for monitoring
   - propagate back-pressure
   
   Similarly to Cassandra, we could setAutoread false on client connections, 
but that means:
   - clients should handle themselves from OOM 
   - they're stopped to make progress in whatever operation they're performing 
vs the broker (not just ones that involves replication)
   
   Sadly this doesn't seem to be a viable solution, for the latter effect.
   Another solution would be to start diminishing the credits for the addresses 
that involves large msg/paging and journalling, but it's probably more feasible 
per-protocol, not a unified solution...
   
   I cannot say what's the best option here and I don't see if we are already 
using some form of end-to-end protection. 
   I see we just let the journal replication request to be enqueued, leaving 
the 'OperationalContextImpl` in charge of sending back acks to clients only 
when both the local durable write and the replication has succeeded: this seem 
to prevent the same "user" to ask for a new request while awaiting the previous 
one (thinking about `MessageProducer::produce` with a durable write), but just 
increasing the amount of clients can increase the amount of enqueued requests 
to cause OOM, under load.
   
   Any thought?
   
   IMO solving this correctly can bring a huge performance increase with an 
improved stability too.
   
   Note:
   An "easy" solution to this problem would be to pause the replication 
request(s) (it should happen on an I/O thread) unblocking it when needed (or if 
the slow replication connection is closed) if the amount of pending requests 
get too high, but the risk is to have a stealthy performance killer due to the 
cost of blocking/unblocking in case of misconfigured Netty watermarks or TCP 
buffers...
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549076)
Time Spent: 5.5h  (was: 5h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=549074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549074
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 06/Feb/21 13:44
Start Date: 06/Feb/21 13:44
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-774405162


   @clebertsuconic @michaelandrepearce @jbertram @brusdev @gtully 
   This change seems to perform best, but I cannot say I am satisfied, because 
I see a problem both on this and the original implementation ie we are not 
handling back-pressure back to the replicated journal caller threads.
   This remind me of 
https://cassandra.apache.org/blog/2020/09/03/improving-resiliency.html for who 
is interested..
   
   TLDR:
   - in the master implementation we can make the broker to go OOM by adding 
too many Runnables to the replication stream because it can block awaiting 
Netty writability for 30 seconds
   - in this PR first implementation we can have a JCTools q of the outstanding 
packet requests that can grow unbounded for the same reason and makes the 
broker to go OOM again
   - in this PR last implementation we can have the Netty internal outbound 
(off-heap) buffer that can grow unbounded
   
   Despite the first 2 solutions seems better at first look, because they wait 
for enough room in Netty buffer, we can still get OOM under the same 
circumstances ie while awaiting the backup to catch-up.
   
   We should:
   - track the amount of pending work for monitoring
   - propagate back-pressure
   
   But given that Artemis clients are of very different types, probably we 
should (similar to Cassandra) setAutoread false to client connections, but that 
means that we rely on client to save themselves from OOM.
   Or, depending by client type, we can stop sending credits back to clients to 
slow'em.
   At worst we should stop accepting connecting clients too (but is too 
drastic, because maybe they won't make the broker to replicate anything).
   
   I cannot say what's the best option here and if we already use some form of 
end-to-end protection I cannot see here, but it doesn't seem the case, given 
that many parallel client can still cause overloading much before receiving 
back notification of a durable local write + backup notification.
   Any thought?
   
   IMO solving this correctly can bring a huge performance increase with an 
improved stability too.
   
   Note:
   An "easy" solution to this problem would be to pause the replication 
request(s) (it should happen on an I/O thread) unblocking it when needed (or if 
the slow replication connection is closed) if the amount of pending requests 
get too high, but the risk is to have a stealthy performance killer due to the 
cost of blocking/unblocking in case of misconfigured Netty watermarks or TCP 
buffers...
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549074)
Time Spent: 5h 20m  (was: 5h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=549073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549073
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 06/Feb/21 13:43
Start Date: 06/Feb/21 13:43
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-774405162


   @clebertsuconic @michaelandrepearce @jbertram @brusdev @gtully 
   This change seems to perform best, but I cannot say I am satisfied, because 
I see a problem both on this and the original implementation ie we are not 
handling back-pressure back to the replicated journal caller threads.
   This remind me of 
https://cassandra.apache.org/blog/2020/09/03/improving-resiliency.html for who 
is interested..
   
   TLDR:
   - in the master implementation we can make the broker to go OOM by adding 
too many Runnables to the replication stream because it can block awaiting 
Netty writability for 30 seconds
   - in this PR first implementation we can have a JCTools q of the outstanding 
packet requests that can grow unbounded for the same reason and makes the 
broker to go OOM again
   - in this PR last implementation we can have the Netty internal outbound 
(off-heap) buffer that can grow unbounded
   
   Despite the first 2 solutions seems better at first look, because they wait 
for enough room in Netty buffer, we can still get OOM under the same 
circumstances ie while awaiting the backup to catch-up.
   
   We should:
   - track the amount of pending work for monitoring
   - try to propagate back-pressure until clients
   
   But given that Artemis clients are of very different types, probably we 
should (similar to Cassandra) setAutoread false to client connections, but that 
means that we rely on client to save themselves from OOM.
   Or, depending by client type, we can stop sending credits back to clients to 
slow'em.
   At worst we should stop accepting connecting clients too (but is too 
drastic, because maybe they won't make the broker to replicate anything).
   
   I cannot say what's the best option here and if we already use some form of 
end-to-end protection I cannot see here, but it doesn't seem the case, given 
that many parallel client can still cause overloading much before receiving 
back notification of a durable local write + backup notification.
   Any thought?
   
   IMO solving this correctly can bring a huge performance increase with an 
improved stability too.
   
   Note:
   An "easy" solution to this problem would be to pause the replication 
request(s) (it should happen on an I/O thread) unblocking it when needed (or if 
the slow replication connection is closed) if the amount of pending requests 
get too high, but the risk is to have a stealthy performance killer due to the 
cost of blocking/unblocking in case of misconfigured Netty watermarks or TCP 
buffers...
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549073)
Time Spent: 5h 10m  (was: 5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=549072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549072
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 06/Feb/21 13:42
Start Date: 06/Feb/21 13:42
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-774405162


   @clebertsuconic @michaelandrepearce @jbertram @brusdev @gtully 
   This change seems to perform best, but I cannot say I am satisfied, because 
I see a problem both on this and the original implementation ie we are not 
handling back-pressure back to the replicated journal caller threads.
   This remind me of 
https://cassandra.apache.org/blog/2020/09/03/improving-resiliency.html for who 
is interested..
   
   TLDR:
   - in the master implementation we can make the broker to go OOM by adding 
too many Runnables to the replication stream because it can block awaiting 
Netty writability for 30 seconds and stop consuming the tasks
   - in this PR first implementation we can have a JCTools q of the outstanding 
packet requests that can grow unbounded for the same reason and makes the 
broker to go OOM again
   - in this PR last implementation we can have the Netty internal outbound 
(off-heap) buffer that can grow unbounded
   
   Despite the first 2 solutions seems better at first look, because they wait 
for enough room in Netty buffer, we can still get OOM under the same 
circumstances ie while awaiting the backup to catch-up.
   
   We should:
   - track the amount of pending work for monitoring
   - try to propagate back-pressure until clients
   
   But given that Artemis clients are of very different types, probably we 
should (similar to Cassandra) setAutoread false to client connections, but that 
means that we rely on client to save themselves from OOM.
   Or, depending by client type, we can stop sending credits back to clients to 
slow'em.
   At worst we should stop accepting connecting clients too (but is too 
drastic, because maybe they won't make the broker to replicate anything).
   
   I cannot say what's the best option here and if we already use some form of 
end-to-end protection I cannot see here, but it doesn't seem the case, given 
that many parallel client can still cause overloading much before receiving 
back notification of a durable local write + backup notification.
   Any thought?
   
   IMO solving this correctly can bring a huge performance increase with an 
improved stability too.
   
   Note:
   An "easy" solution to this problem would be to pause the replication 
request(s) (it should happen on an I/O thread) unblocking it when needed (or if 
the slow replication connection is closed) if the amount of pending requests 
get too high, but the risk is to have a stealthy performance killer due to the 
cost of blocking/unblocking in case of misconfigured Netty watermarks or TCP 
buffers...
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 549072)
Time Spent: 5h  (was: 4h 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=548997=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-548997
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 06/Feb/21 08:19
Start Date: 06/Feb/21 08:19
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-774405162


   @clebertsuconic @michaelandrepearce @jbertram @brusdev @gtully 
   This change seems to perform best, but I cannot say I am satisfied, because 
I see a problem both on this and the original implementation ie we are not 
handling back-pressure back to the replicated journal and beyond.
   This remind me of 
https://cassandra.apache.org/blog/2020/09/03/improving-resiliency.html for who 
is interested..
   
   TLDR:
   - in the master implementation we can make the broker to go OOM by adding 
too many Runnables to the replication stream because it can block awaiting 
Netty writability for 30 seconds and stop consuming the tasks
   - in this PR first implementation we can have a JCTools q of the outstanding 
packet requests that can grow unbounded for the same reason and makes the 
broker to go OOM again
   - in this PR last implementation we can have the Netty internal outbound 
(off-heap) buffer that can grow unbounded
   
   Despite the first 2 solutions seems better at first look, because they wait 
for enough room in Netty buffer, we can still get OOM under the same 
circumstances ie while awaiting the backup to catch-up.
   
   We should:
   - track the amount of pending work for monitoring
   - try to propagate back-pressure until clients
   
   But given that Artemis clients are of very different types, probably we 
should (similar to Cassandra) setAutoread false to client connections, but that 
means that we rely on client to save themselves from OOM.
   Or, depending by client type, we can stop sending credits back to clients to 
slow'em.
   At worst we should stop accepting connecting clients too (but is too 
drastic, because maybe they won't make the broker to replicate anything).
   
   I cannot say what's the best option here and if we already use some form of 
end-to-end protection I cannot see here, but it doesn't seem the case, given 
that many parallel client can still cause overloading much before receiving 
back notification of a durable local write + backup notification.
   Any thought?
   
   IMO solving this correctly can bring a huge performance increase with an 
improved stability too.
   
   Note:
   An "easy" solution to this problem would be to pause the replication 
request(s) (it should happen on an I/O thread) unblocking it when needed (or if 
the slow replication connection is closed) if the amount of pending requests 
get too high, but the risk is to have a stealthy performance killer due to the 
cost of blocking/unblocking in case of misconfigured Netty watermarks or TCP 
buffers...
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 548997)
Time Spent: 4h 50m  (was: 4h 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=548981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-548981
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 06/Feb/21 05:46
Start Date: 06/Feb/21 05:46
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-774405162


   @clebertsuconic @michaelandrepearce @jbertram @brusdev @gtully 
   This change seems to perform best, but I cannot say I am satisfied, because 
I see a problem both on this and the original implementation ie we are not 
handling back-pressure back to the replicated journal and beyond.
   This remind me of 
https://cassandra.apache.org/blog/2020/09/03/improving-resiliency.html for who 
is interested..
   
   TLDR:
   - in the master implementation we can make the broker to go OOM by adding 
too many Runnables to the replication stream because it can block awaiting 
Netty writability for 30 seconds and stop consuming the tasks
   - in this PR first implementation we can have a JCTools q of the outstanding 
packet requests that can grow unbounded for the same reason and makes the 
broker to go OOM again
   - in this PR last implementation we can have the Netty internal outbound 
(off-heap) buffer that can grow unbounded
   
   Despite the first 2 solutions seems better at first look, because they wait 
for enough room in Netty buffer, we can still get OOM under the same 
circumstances ie while awaiting the backup to catch-up.
   
   We should:
   - track the amount of pending work for monitoring
   - try to propagate back-pressure until clients
   
   But given that Artemis clients are of very different types, probably we 
should (similar to Cassandra) setAutoread false to client connections, but that 
means that we rely on client to save themselves from OOM.
   Or, depending by client type, we can stop sending credits back to clients to 
slow'em.
   At worst we should stop accepting connecting clients too (but is too 
drastic, because maybe they won't make the broker to replicate anything).
   
   I cannot say what's the best option here and if we already use some form of 
end-to-end protection I cannot see here, but it doesn't seem the case, given 
that many parallel client can still cause overloading much before receiving 
back notification of a durable local write + backup notification.
   Any thought?
   
   IMO solving this correctly can bring a huge performance increase with an 
improved stability too.
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 548981)
Time Spent: 4h 40m  (was: 4.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=548980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-548980
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 06/Feb/21 05:46
Start Date: 06/Feb/21 05:46
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-774405162


   @clebertsuconic @michaelandrepearce @jbertram @brusdev @gtully 
   This change seems to perform best, but I cannot say I am satisfied, because 
I see a problem both on this and the original implementation ie we are not 
handling back-pressure back to the replicated journal and beyond.
   This remind me of 
https://cassandra.apache.org/blog/2020/09/03/improving-resiliency.html for who 
is interested..
   
   TLDR:
   - in the master implementation we can make the broker to go OOM by adding 
too many Runnables to the replication stream because it can block awaiting 
Netty writability for 30 seconds and stop consuming the tasks
   - in this PR first implementation we can have a JCTools q of the outstanding 
packet requests that can grow unbounded for the same reason and makes the 
broker to go OOM again
   - in this PR last implementation we can have the Netty internal outbound 
(off-heap) buffer thet can grow unbounded
   
   Despite the first 2 solutions seems better at first look, because they wait 
for enough room in Netty buffer, we can still get OOM under the same 
circumstances ie while awaiting the backup to catch-up.
   
   We should:
   - track the amount of pending work for monitoring
   - try to propagate back-pressure until clients
   
   But given that Artemis clients are of very different types, probably we 
should (similar to Cassandra) setAutoread false to client connections, but that 
means that we rely on client to save themselves from OOM.
   Or, depending by client type, we can stop sending credits back to clients to 
slow'em.
   At worst we should stop accepting connecting clients too (but is too 
drastic, because maybe they won't make the broker to replicate anything).
   
   I cannot say what's the best option here and if we already use some form of 
end-to-end protection I cannot see here, but it doesn't seem the case, given 
that many parallel client can still cause overloading much before receiving 
back notification of a durable local write + backup notification.
   Any thought?
   
   IMO solving this correctly can bring a huge performance increase with an 
improved stability too.
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 548980)
Time Spent: 4.5h  (was: 4h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=548979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-548979
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 06/Feb/21 05:44
Start Date: 06/Feb/21 05:44
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-774405162


   @clebertsuconic @michaelandrepearce @jbertram @brusdev 
   This change seems to perform best, but I cannot say I am satisfied, because 
I see a problem both on this and the original implementation ie we are not 
handling back-pressure back to the replicated journal and beyond.
   This remind me of 
https://cassandra.apache.org/blog/2020/09/03/improving-resiliency.html for who 
is interested..
   
   TLDR:
   - in the master implementation we can make the broker to go OOM by adding 
too many Runnables to the replication stream because it can block awaiting 
Netty writability for 30 seconds and stop consuming the tasks
   - in this PR first implementation we can have a JCTools q of the outstanding 
packet requests that can grow unbounded for the same reason and makes the 
broker to go OOM again
   - in this PR last implementation we can have the Netty internal outbound 
(off-heap) buffer thet can grow unbounded
   
   Despite the first 2 solutions seems better at first look, because they wait 
for enough room in Netty buffer, we can still get OOM under the same 
circumstances ie while awaiting the backup to catch-up.
   
   We should:
   - track the amount of pending work for monitoring
   - try to propagate back-pressure until clients
   
   But given that Artemis clients are of very different types, probably we 
should (similar to Cassandra) setAutoread false to client connections, but that 
means that we rely on client to save themselves from OOM.
   Or, depending by client type, we can stop sending credits back to clients to 
slow'em.
   At worst we should stop accepting connecting clients too (but is too 
drastic, because maybe they won't make the broker to replicate anything).
   
   I cannot say what's the best option here and if we already use some form of 
end-to-end protection I cannot see here, but it doesn't seem the case, given 
that many parallel client can still cause overloading much before receiving 
back notification of a durable local write + backup notification.
   Any thought?
   
   IMO solving this correctly can bring a huge performance increase with an 
improved stability too.
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 548979)
Time Spent: 4h 20m  (was: 4h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=548473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-548473
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 05/Feb/21 10:25
Start Date: 05/Feb/21 10:25
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-773873107


   @clebertsuconic @michaelandrepearce 
   Plese don't merge it yet: I see that if I use the Netty connection event 
loop used for the replication I'm getting a much better performance, both 
troughput and latency-wise ie
   ```
   **
   RUN 4EndToEnd Throughput: 29681 ops/sec
   **
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean   1114.38
   min 174.08
   50.00%  987.14
   90.00% 1548.29
   99.00% 3702.78
   99.90% 7110.66
   99.99%11599.87
   max   15007.74
   count   32
   ```
   But it means that I should handle network backpressure differently
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 548473)
Time Spent: 4h 10m  (was: 4h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=548082=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-548082
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 05/Feb/21 08:17
Start Date: 05/Feb/21 08:17
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-773873107


   @clebertsuconic @michaelandrepearce 
   Plese don't merge it yet: I see that if I use the Netty connection event 
loop used for the replication I'm getting a much better performance, both 
troughput and latency-wise ie
   ```
   **
   RUN 4EndToEnd Throughput: 29681 ops/sec
   **
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean   1114.38
   min 174.08
   50.00%  987.14
   90.00% 1548.29
   99.00% 3702.78
   99.90% 7110.66
   99.99%11599.87
   max   15007.74
   count   32
   ```
   But it means that I should handle network backpressure differently
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 548082)
Time Spent: 4h  (was: 3h 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=545137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545137
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 01/Feb/21 08:08
Start Date: 01/Feb/21 08:08
Worklog Time Spent: 10m 
  Work Description: michaelandrepearce commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-770658391


   You can find it here @franz1981 https://github.com/softwaremill/mqperf



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545137)
Time Spent: 3h 50m  (was: 3h 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-02-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=545136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545136
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 01/Feb/21 08:03
Start Date: 01/Feb/21 08:03
Worklog Time Spent: 10m 
  Work Description: michaelandrepearce commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-770655449


   Next week i may get a chance. Booked out this week. Btw have you thought 
about just spinning up and running the software mill test platform? They had 
all their thing done as ansible and just spun stuff up in aws. Would be best 
anyhow as thats where the perf drop was noticed



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 545136)
Time Spent: 3h 40m  (was: 3.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-01-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=544158=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-544158
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 29/Jan/21 09:32
Start Date: 29/Jan/21 09:32
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-769689978


   @michaelandrepearce There is any chance that this PR could be tried on a 
real hardware? 
   I'm having some problem to get it ATM.
   
   That should improve replication performance (especially with small sized 
messages ie < MTU size):  this PR try to batch records appends on the live 
broker before sending it to the backup broker, maximizing the usage of the 
outbound buffer provided by Netty.
   That should help submitting the send requests saving potential signaling to 
wake-up the event loop on each submitted packet.
   In turn, this could help to better use the network bandwidth, saving from 
sending near empty TCP packets.
   
   Despite batching seems to trade-off latency vs troughput perf, here it 
shouldn't happen, because:
   - we send a batch of packets from the same thread, improving cache misses 
while reading the batch of packets
   - reducing the amount of syscalls per packet means amortizing the latency 
cost per-packet, given that syscalls vectored writes (that Netty should use to 
send the batches AFAIK) tends to have near fixed costs
   
   My only concern is that impl-wise it could be implemented much better, as 
mentioned on 
https://github.com/apache/activemq-artemis/pull/3392#issuecomment-757663864 and 
that batching could have some upper limit in the amount of bytes other then 
Netty buffering that could be maybe too big.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 544158)
Time Spent: 3.5h  (was: 3h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-01-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=544157=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-544157
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 29/Jan/21 09:32
Start Date: 29/Jan/21 09:32
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-769689978


   @michaelandrepearce There is any chance that this PR could be tried on a 
real hardware? 
   I'm having some problem to get it ATM.
   
   That should improve replication performance (especially with small sized 
messages ie < MTU size):  this PR try to batch records appends on the live 
broker before sending it to the backup broker, maximizing the usage of the 
outbound buffer provided by Netty.
   That should help submitting the send requests saving potential signaling to 
wake-up the event loop on each submitted packet.
   In turn, this could help to better use the network bandwidth, saving from 
sending near empty TCP packets.
   
   Despite batching seems to trade-off latency vs troughput perf, here it 
shouldn't happen, because:
   - we send a batch of packets from the same thread, improving cache misses 
while reading the batch of packets
   - reducing the amount of syscalls per packet means amortizing the latency 
cost per-packet, given that syscalls vectored writes tends to have near fixed 
costs
   
   My only concern is that impl-wise it could be implemented much better, as 
mentioned on 
https://github.com/apache/activemq-artemis/pull/3392#issuecomment-757663864 and 
that batching could have some upper limit in the amount of bytes other then 
Netty buffering that could be maybe too big.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 544157)
Time Spent: 3h 20m  (was: 3h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-01-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=544156=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-544156
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 29/Jan/21 09:31
Start Date: 29/Jan/21 09:31
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-769689978


   @michaelandrepearce There is any chance that this PR could be tried on a 
real hardware? 
   I'm having some problem to get it ATM.
   
   That should improve replication performance (especially with small sized 
messages ie < MTU size):  this PR try to batch records appends on the live 
broker before sending it to the backup broker, maximizing the usage of the 
outbound buffer provided by Netty.
   That should help submitting the send requests saving potential signaling to 
wake-up the event loop on each submitted packet.
   In turn, this could help to better use the network bandwidth, saving from 
sending near empty TCP packets.
   
   Despite batching seems to trade-off latency vs troughput perf, here it 
shouldn't happen, because:
   - we send a batch of packets from the same thread, improving cache misses 
while reading the batch of packets
   - reducing the amount of syscalls per packet means amortizing the latency 
cost per-packet, given that syscalls live vecored writes tends to have near 
fixed costs
   
   My only concern is that impl-wise it could be implemented much better, as 
mentioned on 
https://github.com/apache/activemq-artemis/pull/3392#issuecomment-757663864 and 
that batching could have some upper limit in the amount of bytes other then 
Netty buffering that could be maybe too big.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 544156)
Time Spent: 3h 10m  (was: 3h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-01-29 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=544155=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-544155
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 29/Jan/21 09:30
Start Date: 29/Jan/21 09:30
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-769689978


   @michaelandrepearce There is any chance that this PR could be tried on a 
real hardware? 
   I'm having some problem to get it ATM.
   
   That should improve replication performance (especially with small sized 
messages ie < MTU size:  this PR try to batch records appends on the live 
broker before sending it to the backup broker, maximizing the usage of the 
outbound buffer provided by Netty.
   That should help submitting the send requests saving potential signaling to 
wake-up the event loop on each submitted packet.
   In turn, this could help to better use the network bandwidth, saving from 
sending near empty TCP packets.
   
   Despite batching seems to trade-off latency vs troughput perf, here it 
shouldn't happen, because:
   - we send a batch of packets from the same thread, improving cache misses 
while reading the batch of packets
   - reducing the amount of syscalls per packet means amortizing the latency 
cost per-packet, given that syscalls live vecored writes tends to have near 
fixed costs
   
   My only concern is that impl-wise it could be implemented much better, as 
mentioned on 
https://github.com/apache/activemq-artemis/pull/3392#issuecomment-757663864 and 
that batching could have some upper limit in the amount of bytes other then 
Netty buffering that could be maybe too big.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 544155)
Time Spent: 3h  (was: 2h 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-01-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=541897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-541897
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 26/Jan/21 04:30
Start Date: 26/Jan/21 04:30
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-766614766


   I haven't yet tried this one on a real decent hardware yet..will do it as 
soon as possible.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 541897)
Time Spent: 2h 50m  (was: 2h 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-01-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=540914=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-540914
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 25/Jan/21 07:39
Start Date: 25/Jan/21 07:39
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-766614766


   I haven't yet tried this one on a real decent hardware yet..will do it as 
soon as possible.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 540914)
Time Spent: 2h 40m  (was: 2.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=535002=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-535002
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 12/Jan/21 19:55
Start Date: 12/Jan/21 19:55
Worklog Time Spent: 10m 
  Work Description: clebertsuconic commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-758898224


   @franz1981 is this ready to merge then?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 535002)
Time Spent: 2.5h  (was: 2h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-01-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=534134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-534134
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 11/Jan/21 07:01
Start Date: 11/Jan/21 07:01
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-757663864


   @clebertsuconic The tests are fixed now.
   TBH I would like to improve this a bit more for 3 reasons:
   
   1. would like to try its effects on some decent HW 
   2. ArrayDeque could be saved
   3. [OPTIONAL] instead of using an ordered executor we could use an `Actor` 
and expose batching of packets on it instead, allowing other placed to benefit 
from it
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 534134)
Time Spent: 2h 20m  (was: 2h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-01-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=531719=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531719
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 06/Jan/21 06:48
Start Date: 06/Jan/21 06:48
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-755118772


   Thanks @clebertsuconic I see the error: I shouldn't use
PlatformDependent.newMpscQueue(1024) because it's limiting the queue 
capacity (my bad - holidays 易): I should have used an unbounded queue.
   
   This is making me think that probably it worths to limit the amount of 
batching as well :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531719)
Time Spent: 2h 10m  (was: 2h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2021-01-05 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=531492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-531492
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 05/Jan/21 21:07
Start Date: 05/Jan/21 21:07
Worklog Time Spent: 10m 
  Work Description: clebertsuconic commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-754899798


   There are two tests failing:
   
   
   Test Name | Duration | Age
   -- | -- | --
   
org.apache.activemq.artemis.tests.smoke.replicationflow.ReplicationFlowControlTest.testPageWhileSyncFailover
 | 2 min 15 sec | 1
   
org.apache.activemq.artemis.tests.smoke.replicationflow.ReplicationFlowControlTest.testPageWhileSynchronizingReplica
 | 38 sec | 1
   
   I know you're off in vacation this week.. just leaving this for when you're 
back
   
   
   I see these messages on the logs by simply running the tests:
   
   
   ```
   replicated-static0-out:2021-01-05 16:04:17,789 ERROR 
[org.apache.activemq.artemis.core.server] AMQ224016: Caught exception: 
java.lang.IllegalStateException: Queue full
   replicated-static0-out:  at 
java.util.AbstractQueue.add(AbstractQueue.java:98) [rt.jar:1.8.0_261]
   replicated-static0-out:  at 
org.apache.activemq.artemis.core.replication.ReplicationManager.sendReplicatePacket(ReplicationManager.java:399)
 [artemis-server-2.17.0-SNAPSHOT.jar:2.17.0-SNAPSHOT]
   replicated-static0-out:  at 
org.apache.activemq.artemis.core.replication.ReplicationManager.sendReplicatePacket(ReplicationManager.java:385)
 [artemis-server-2.17.0-SNAPSHOT.jar:2.17.0-SNAPSHOT]
   replicated-static0-out:  at 
org.apache.activemq.artemis.core.replication.ReplicationManager.pageClosed(ReplicationManager.java:251)
 [artemis-server-2.17.0-SNAPSHOT.jar:2.17.0-SNAPSHOT]
   
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 531492)
Time Spent: 2h  (was: 1h 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2020-12-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=528462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528462
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 25/Dec/20 22:05
Start Date: 25/Dec/20 22:05
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-751294656


   That's correct and we were doing already that for clients (and not only) 
IIRC: check 
https://github.com/apache/activemq-artemis/blob/52263663c48082227916cc3477f8892d9f10134b/artemis-core-client/src/main/java/org/apache/activemq/artemis/core/remoting/impl/netty/NettyConnection.java#L269
   
   But replication wasn't doing it AFAIK.
   It won't help if the replicated packets are big ie big journal records, but 
ack/rollbacks and other yep because there are very small and frequent.
   The solution I have written here save using a background thread to perform 
the flush using 
https://github.com/apache/activemq-artemis/blob/52263663c48082227916cc3477f8892d9f10134b/artemis-core-client/src/main/java/org/apache/activemq/artemis/core/remoting/impl/netty/NettyConnection.java#L269
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 528462)
Time Spent: 1h 50m  (was: 1h 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2020-12-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=528461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528461
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 25/Dec/20 22:05
Start Date: 25/Dec/20 22:05
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-751294656


   That's correct and we were doing already that for clients (and not only) 
IIRC: check 
https://github.com/apache/activemq-artemis/blob/52263663c48082227916cc3477f8892d9f10134b/artemis-core-client/src/main/java/org/apache/activemq/artemis/core/remoting/impl/netty/NettyConnection.java#L269
   
   But replication wasn't doing it AFAIK.
   It won't help if the replicated packets are big ie bit journal records, but 
ack/rollbacks and other yep because there are very small and frequent.
   The solution I have written here save using a background thread to perform 
the flush using 
https://github.com/apache/activemq-artemis/blob/52263663c48082227916cc3477f8892d9f10134b/artemis-core-client/src/main/java/org/apache/activemq/artemis/core/remoting/impl/netty/NettyConnection.java#L269
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 528461)
Time Spent: 1h 40m  (was: 1.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2020-12-25 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=528459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528459
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 25/Dec/20 21:27
Start Date: 25/Dec/20 21:27
Worklog Time Spent: 10m 
  Work Description: michaelandrepearce commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-751292346


   Just reading through, surely leaving netty to batch stuff seems better, e.g. 
this solution just does replication, but i assume and correct me here, as its 
an assumption, but netty would have been prior also batching all not just 
replication, e.g. clients also would slightly benefit.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 528459)
Time Spent: 1.5h  (was: 1h 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2020-12-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=528120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528120
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 24/Dec/20 12:48
Start Date: 24/Dec/20 12:48
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-750873244


   @michaelandrepearce 
   
   >  That said this wasn't in 2.2.0 so its not addressing what change caused 
the slowdown.
   
   Correct and there are good reasons for that: as mentioned on 
https://issues.apache.org/jira/browse/ARTEMIS-2877 2.2.0 was using Netty very 
differently and the way it was batching writes "stealthy" has been replaced 
with a more aware/idiomatic way to do it although at the cost of getting some 
regressions.
   I'm investigating if such batching was present OOTB on 2.2.0 to be sure of 
it. 
   Looking at the network activity I believe so, because the amount of sent TCP 
packets on 2.2.0 is much less then master, while with this PR they're on par 
(not correct: this PR is a bit faster then 2.2.0 due to other recent 
improvements on MAPPED journal).




This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 528120)
Time Spent: 1h 20m  (was: 1h 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2020-12-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=528119=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528119
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 24/Dec/20 12:47
Start Date: 24/Dec/20 12:47
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-750873244


   @michaelandrepearce 
   
   >  That said this wasn't in 2.2.0 so its not addressing what change caused 
the slowdown.
   
   Correct and there are good reasons for that: as mentioned on 
https://issues.apache.org/jira/browse/ARTEMIS-2877 2.2.0 was using Netty very 
differently and the way it was batching writes "stealthy" has been replaced 
with a more aware/idiomatic way to do it although at the cost of getting some 
regressions.
   I'm investigating if such batching was present OOTB on 2.2.0 to be sure of 
it. 
   Looking at the network activity I believe so, because the amount of sent TCP 
packets on 2.2.0 is much less then master, while with this PR they're on par.




This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 528119)
Time Spent: 1h 10m  (was: 1h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2020-12-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=528118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528118
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 24/Dec/20 12:47
Start Date: 24/Dec/20 12:47
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-750873244


   @michaelandrepearce 
   
   >  That said this wasn't in 2.2.0 so its not addressing what change caused 
the slowdown.
   
   Correct and there are good reasons for that: as mentioned on 
https://issues.apache.org/jira/browse/ARTEMIS-2877 2.2.0 was using Netty very 
differently and the way it was batching writes "stealthy" has been replaced 
with a more aware/idiomatic way to do it although at the cost of getting some 
regressions.
   I'm investigating if such batching was present OOTB on 2.2.0 to be sure of 
it. 
   Looking at the network activity I believe so, because the amount of sent TCP 
packets on 2.2.0 is much less then master, while with PR they're on par.




This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 528118)
Time Spent: 1h  (was: 50m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2020-12-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=528114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528114
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 24/Dec/20 12:39
Start Date: 24/Dec/20 12:39
Worklog Time Spent: 10m 
  Work Description: michaelandrepearce commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-750871619


   This is some promising speed up and def +1 in this area. That said this 
wasn't in 2.2.0 so its not addressing what change caused the slowdown.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 528114)
Time Spent: 50m  (was: 40m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2020-12-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=528111=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528111
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 24/Dec/20 12:29
Start Date: 24/Dec/20 12:29
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-750867139


   @michaelandrepearce that's a PR that aim to "fix" the regression seen on 
https://softwaremill.com/mqperf/
   TBH I just reproduced the scalability issue with small message size given 
that we tends to underutilized the network because we sent replicated packets 
ASAP without batching them (that means that a single standard MTU of 1500 bytes 
could contains x10 tiny packets eg ack, small records...
   
   This PR batch replicated packets by filling the Netty pending buffer and 
it's getting a nice speed up if compared with master (and 2.2.0, surprisingly):
   master:
   ``` 
   **
   RUN 3EndToEnd Throughput: 20447 ops/sec
   **
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean   1635.00
   min 180.22
   50.00% 1523.71
   90.00% 2088.96
   99.00% 4521.98
   99.90% 8028.16
   99.99%13959.17
   max   17825.79
   count   32
   ```
   This PR:
   ```
   **
   RUN 4EndToEnd Throughput: 24524 ops/sec
   **
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean   1310.83
   min 182.27
   50.00% 1204.22
   90.00% 1736.70
   99.00% 3850.24
   99.90% 7995.39
   99.99%12713.98
   max   19267.58
   count   32
   ```
   
   this is still a DRAF PR given that we need further tests to see that's not 
regressing perf for other cases and I'm open to experiment other options.
   The work on this PR is directly related with 
https://issues.apache.org/jira/browse/ARTEMIS-2877



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 528111)
Time Spent: 40m  (was: 0.5h)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2020-12-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=528110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528110
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 24/Dec/20 12:20
Start Date: 24/Dec/20 12:20
Worklog Time Spent: 10m 
  Work Description: franz1981 edited a comment on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-750867139


   @michaelandrepearce that's a PR that aim to "fix" the regression seen on 
https://softwaremill.com/mqperf/
   TBH I just reproduced the scalability issue with small message size given 
that we tends to underutilized the network because we sent replicated packets 
ASAP without batching them (that means that a single standard MTU of 1500 bytes 
could contains x10 tiny packets eg ack, small records..).
   
   This PR batch replicated packets by filling the Netty pending buffer and 
it's getting a nice speed up if compared with master (and 2.2.0, surprisingly):
   master:
   ``` 
   **
   RUN 3EndToEnd Throughput: 20447 ops/sec
   **
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean   1635.00
   min 180.22
   50.00% 1523.71
   90.00% 2088.96
   99.00% 4521.98
   99.90% 8028.16
   99.99%13959.17
   max   17825.79
   count   32
   ```
   This PR:
   ```
   **
   RUN 4EndToEnd Throughput: 24524 ops/sec
   **
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean   1310.83
   min 182.27
   50.00% 1204.22
   90.00% 1736.70
   99.00% 3850.24
   99.90% 7995.39
   99.99%12713.98
   max   19267.58
   count   32
   ```
   
   this is still a DRAF PR given that we need further tests to see that's not 
regressing perf for other cases and I'm open to experiment other options.
   The work on this PR is directly related with 
https://issues.apache.org/jira/browse/ARTEMIS-2877



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 528110)
Time Spent: 0.5h  (was: 20m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2020-12-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=528108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528108
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 24/Dec/20 12:19
Start Date: 24/Dec/20 12:19
Worklog Time Spent: 10m 
  Work Description: franz1981 commented on pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392#issuecomment-750867139


   @michaelandrepearce that's a PR that aim to "fix" the regression seen on 
https://softwaremill.com/mqperf/
   TBH I just reproduced the scalability issue with small message size given 
that we tends to underutilized the network because we sent replicated packets 
ASAP without batching them (that means that a single standard MTU of 1500 bytes 
could contains x10 tiny packets eg ack, small records..).
   
   This PR try to batch replicated packets trying to fill the Netty pending 
buffer and it's getting a nice speed up if compared with master (and 2.2.0, 
surprisingly):
   master:
   ``` 
   **
   RUN 3EndToEnd Throughput: 20447 ops/sec
   **
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean   1635.00
   min 180.22
   50.00% 1523.71
   90.00% 2088.96
   99.00% 4521.98
   99.90% 8028.16
   99.99%13959.17
   max   17825.79
   count   32
   ```
   This PR:
   ```
   **
   RUN 4EndToEnd Throughput: 24524 ops/sec
   **
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean   1310.83
   min 182.27
   50.00% 1204.22
   90.00% 1736.70
   99.00% 3850.24
   99.90% 7995.39
   99.99%12713.98
   max   19267.58
   count   32
   ```
   
   this is still a DRAF PR given that we need further tests to see that's not 
regressing perf for other cases and I'm open to experiment other options.
   The work on this PR is directly related with 
https://issues.apache.org/jira/browse/ARTEMIS-2877



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 528108)
Time Spent: 20m  (was: 10m)

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2020-12-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3045?focusedWorklogId=528105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528105
 ]

ASF GitHub Bot logged work on ARTEMIS-3045:
---

Author: ASF GitHub Bot
Created on: 24/Dec/20 12:13
Start Date: 24/Dec/20 12:13
Worklog Time Spent: 10m 
  Work Description: franz1981 opened a new pull request #3392:
URL: https://github.com/apache/activemq-artemis/pull/3392


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 528105)
Remaining Estimate: 0h
Time Spent: 10m

> ReplicationManager can batch sent replicated packets
> 
>
> Key: ARTEMIS-3045
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

87 matches

Mail list logo