[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535397765 @wy96f It's strange that the back-pressure propagation fix isn't working: have you used the last version that include that fix? Flow control on Netty for chunked nio files should increase/decrease outbound pending bytes as with "normal" ByteBuf writes...can you check if the writeability changes are correctly propagated to ChunkedWriteHandler? I will be in vacation from today so don't have access for about 1 month to my computer: I will do my best to help as I can on my return, but we are near to fix this :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535397765 @wy96f It's strange that the back-pressure (writability) propagation fix isn't working: have you used the last version that include that fix? Flow control on Netty for chunked nio files should increase/decrease outbound pending bytes as with "normal" ByteBuf writes...can you check if the writeability changes are correctly propagated to ChunkedWriteHandler? I will be in vacation from today so don't have access for about 1 month to my computer: I will do my best to help as I can on my return, but we are near to fix this :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535399659 Maybe on ActiveMQChannelHandler there are others events that are not correctly propagated through the pipeline and that would wake up the chunk writer? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] wy96f commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
wy96f commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535440070 @franz1981 Have fun in vacation :) There is no problem with writability propagation, it works very well. when I set initial-replication-sync-timeout to a big value(E.g. 7 minutes), all of the queued up messages were sent and replication succeeded. The packet sending process with -Dio.netty.file.region=true or master is: 1. channel.writeAndFlush(artemis thread) 2. add bytebuf in outboundbuffer -- increase size, flush it -- decrease size(netty thread) The message sending process with -Dio.netty.file.region=false is: 1. channel.writeAndFlush(artemis thread) 2. add message in queue in ChunkedWriteHandler(netty thread) 3. if channel writable, add bytebuf in outboundbuffer -- increase size and flush it -- decrease size(netty thread) 4. If channel state transfers from unwritable to writable, call step3(netty thread) For -Dio.netty.file.region=false, given message will be first put into queue then put in `outboundbuffer` only when channel writable, size in `outboundbuffer` will be limited to highWaterMark(default 128k). When flush proceeds and size drops to lowWaterMark(default 32k), channel is writable again, over and over again. I guess `flowControl` often sees channel writable(actually lots of queued up messages in ChunkedWriteHandler's queue) so it's not limiting well. In the end, sync done message would not be delivered in time due to too many messages queued up. For -Dio.netty.file.region=true or master, size in `outboundbuffer` will continue to grow(netty thread is running all the time). When it exceeds 128k meaning channel not writable and `flowControl` triggers, broker(artemis thread) will not send packet until data in `outboundbuffer` is flushed. So there will be not much data queued up. Make sense? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535443555 @wy96f Good analysis :) Yep, so the possible solutions I see are are: - tune differently `lowWaterMark` and `highWaterMark` > 1 MB - split the file in smaller chunks (ie 1 MB now, probably 32K is better, configurable, even better) when `-Dio.netty.file.region=false` - flow control sync file sends by using the received sync file responses instead of using the Netty one (that as u have noticed require more configuration/tuning and behave differently depending if using the file region or not...) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] brusdev commented on a change in pull request #2851: ARTEMIS-2503 Improve wildcards for the roles access match
brusdev commented on a change in pull request #2851: ARTEMIS-2503 Improve wildcards for the roles access match URL: https://github.com/apache/activemq-artemis/pull/2851#discussion_r328550740 ## File path: artemis-server/src/main/java/org/apache/activemq/artemis/core/server/management/JMXAccessControlList.java ## @@ -25,12 +25,22 @@ import java.util.Map; import java.util.concurrent.ConcurrentHashMap; +import org.apache.activemq.artemis.core.config.WildcardConfiguration; +import org.apache.activemq.artemis.core.settings.HierarchicalRepository; +import org.apache.activemq.artemis.core.settings.impl.HierarchicalObjectRepository; + public class JMXAccessControlList { private Access defaultAccess = new Access("*"); - private Map domainAccess = new HashMap<>(); + private HierarchicalRepository domainAccess; private ConcurrentHashMap> whitelist = new ConcurrentHashMap<>(); + public JMXAccessControlList() { + WildcardConfiguration domainAccessWildcardConfiguration = new WildcardConfiguration(); Review comment: I like your solution where would you put the flag, broker.xml or management.xml? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535443555 @wy96f Good analysis :) Yep, so the possible solutions I see are: - tune differently `lowWaterMark` and `highWaterMark` > 1 MB to allow chunk writer queue to be drained faster (hopefully) - split the file in smaller chunks (ie 1 MB now, probably 32K is better, configurable, even better) when `-Dio.netty.file.region=false`: not sure it would work TBH - flow control sync file sends by using the received sync file responses instead of using the Netty one (that as u have noticed require more configuration/tuning and behave differently depending if using the file region or not...) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535443555 @wy96f Good analysis :) Yep, so the possible solution I see is to tune differently `highWaterMark` > 1 MB to allow the chunk writer queue to be drained faster and make the flow control more accurate (effectively backpressured by the TCP buffer, instead of being limited by the netty *before* it) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535443555 @wy96f Good analysis :) Yep, so the possible solution I see is to tune differently `lowWaterMark` and `highWaterMark` > 1 MB to allow chunk writer queue to be drained faster and make the flow control more accurate (effectively backpressured by the TCP buffer, instead of being limited by the netty *before* it) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535443555 @wy96f Good analysis :) Yep, so the possible solution I see is to tune differently `highWaterMark` > 1 MB to allow the chunk writer queue to be drained faster and make the flow control more accurate (effectively backpressured by the TCP buffer, instead of being limited by Netty *before* it). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] michaelandrepearce commented on a change in pull request #2851: ARTEMIS-2503 Improve wildcards for the roles access match
michaelandrepearce commented on a change in pull request #2851: ARTEMIS-2503 Improve wildcards for the roles access match URL: https://github.com/apache/activemq-artemis/pull/2851#discussion_r328571326 ## File path: artemis-server/src/main/java/org/apache/activemq/artemis/core/server/management/JMXAccessControlList.java ## @@ -25,12 +25,22 @@ import java.util.Map; import java.util.concurrent.ConcurrentHashMap; +import org.apache.activemq.artemis.core.config.WildcardConfiguration; +import org.apache.activemq.artemis.core.settings.HierarchicalRepository; +import org.apache.activemq.artemis.core.settings.impl.HierarchicalObjectRepository; + public class JMXAccessControlList { private Access defaultAccess = new Access("*"); - private Map domainAccess = new HashMap<>(); + private HierarchicalRepository domainAccess; private ConcurrentHashMap> whitelist = new ConcurrentHashMap<>(); + public JMXAccessControlList() { + WildcardConfiguration domainAccessWildcardConfiguration = new WildcardConfiguration(); Review comment: As the deviation was in management.xml probably there This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] michaelandrepearce commented on a change in pull request #2851: ARTEMIS-2503 Improve wildcards for the roles access match
michaelandrepearce commented on a change in pull request #2851: ARTEMIS-2503 Improve wildcards for the roles access match URL: https://github.com/apache/activemq-artemis/pull/2851#discussion_r328571326 ## File path: artemis-server/src/main/java/org/apache/activemq/artemis/core/server/management/JMXAccessControlList.java ## @@ -25,12 +25,22 @@ import java.util.Map; import java.util.concurrent.ConcurrentHashMap; +import org.apache.activemq.artemis.core.config.WildcardConfiguration; +import org.apache.activemq.artemis.core.settings.HierarchicalRepository; +import org.apache.activemq.artemis.core.settings.impl.HierarchicalObjectRepository; + public class JMXAccessControlList { private Access defaultAccess = new Access("*"); - private Map domainAccess = new HashMap<>(); + private HierarchicalRepository domainAccess; private ConcurrentHashMap> whitelist = new ConcurrentHashMap<>(); + public JMXAccessControlList() { + WildcardConfiguration domainAccessWildcardConfiguration = new WildcardConfiguration(); Review comment: As the deviation was in management.xml probably there. Others may have other opinions. Im not heavily opinionated on where. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] k-wall commented on issue #2852: ARTEMIS-2505: Fix wiring of the max-size-bytes-reject-threshold address-setting
k-wall commented on issue #2852: ARTEMIS-2505: Fix wiring of the max-size-bytes-reject-threshold address-setting URL: https://github.com/apache/activemq-artemis/pull/2852#issuecomment-535501875 @michaelandrepearce commits squashed, thanks for looking at this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535443555 @wy96f Good analysis :) Yep, so the possible solution I see is to tune differently `highWaterMark` > 1 MB to allow the chunk writer queue to be drained faster. Both ChunkedInput and FileRegions are missing (on Netty) a correct size estimation on ChannelOutboundBuffer and this would imply senders to push many of them in burst: what makes them behave differently is that FileRegion is getting backpressured only by TCP while ChunkedInputs start to getting backpressured by Netty itself *into* ChunkedWriterHandler, given that any read ByteBuf is being accounted into ChannelOutboundBuffer and preventing other pending writes to proceed due to the small high watermark (if compared to the chunk size). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535443555 @wy96f Good analysis :) Yep, so the possible solution I see is to tune differently `highWaterMark` > 1 MB to allow the chunk writer queue to be drained faster. Both ChunkedInput and FileRegions are missing (on Netty) a correct size estimation on ChannelOutboundBuffer and this would imply senders to push many of them in burst: what makes them behave differently is that FileRegion is getting backpressured only by TCP while ChunkedInputs start to getting backpressured by Netty itself *into* ChunkedWriterHandler, given that any read ByteBuf is being accounted into ChannelOutboundBuffer those preventing subsequent pending writes to proceed due to the small high watermark (if compared to the chunk size). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535443555 @wy96f Good analysis :) Yep, so the possible solution I see is to tune differently `highWaterMark` > 1 MB to allow the chunk writer queue to be drained faster. Both ChunkedInput and FileRegions are missing (on Netty) a correct size estimation on ChannelOutboundBuffer and this would imply senders to push many of them in burst: what makes them behave differently is that FileRegion is getting backpressured only by TCP while ChunkedInputs start to getting backpressured by Netty itself *into* ChunkedWriterHandler, given that any read ByteBuf is being accounted into ChannelOutboundBuffer thus preventing subsequent pending writes to proceed due to the small high watermark limit (if compared to the chunk size). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535443555 @wy96f Good analysis :) Yep, so the possible solution I see is to tune differently `highWaterMark` > 1 MB to allow the chunk writer queue to be drained faster. Both ChunkedInput and FileRegions are missing (on Netty) a correct size estimation on ChannelOutboundBuffer and this would imply senders to push many of them in burst: what makes them behave differently is that FileRegion is getting backpressured only by TCP while ChunkedInputs start to getting backpressured by Netty itself *into* ChunkedWriterHandler, given that any read ByteBuf is being accounted into ChannelOutboundBuffer thus preventing subsequent pending writes to proceed due to the small high watermark (if compared to the chunk size). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535443555 @wy96f Good analysis :) Yep, so the simplest possible solution I see is to tune differently `highWaterMark` > 1 MB to allow the chunk writer queue to continue to be drained. Both ChunkedInput and FileRegions are missing (on Netty) a correct size estimation on ChannelOutboundBuffer and this would imply senders to push many of them in burst: what makes them behave differently is that FileRegion is getting backpressured only by TCP while ChunkedInputs start to getting backpressured by Netty itself *into* ChunkedWriterHandler, given that any read ByteBuf is being accounted into ChannelOutboundBuffer thus preventing subsequent pending writes to proceed due to the small high watermark limit (if compared to the chunk size). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] clebertsuconic commented on a change in pull request #2853: ARTEMIS-2506 MQTT doesn't cleanup underlying connection for bad clients
clebertsuconic commented on a change in pull request #2853: ARTEMIS-2506 MQTT doesn't cleanup underlying connection for bad clients URL: https://github.com/apache/activemq-artemis/pull/2853#discussion_r328834293 ## File path: tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/mqtt/MQTTConnnectionCleanupTest.java ## @@ -0,0 +1,84 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.activemq.artemis.tests.integration.mqtt; + +import org.apache.activemq.artemis.api.core.ActiveMQException; +import org.apache.activemq.artemis.api.core.TransportConfiguration; +import org.apache.activemq.artemis.core.protocol.mqtt.MQTTConnection; +import org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor; +import org.apache.activemq.artemis.core.remoting.impl.netty.TransportConstants; +import org.apache.activemq.artemis.spi.core.protocol.RemotingConnection; +import org.apache.activemq.artemis.tests.integration.mqtt.imported.MQTTTestSupport; +import org.fusesource.mqtt.client.BlockingConnection; +import org.fusesource.mqtt.client.MQTT; +import org.junit.Test; + +import java.util.HashMap; +import java.util.Map; + +public class MQTTConnnectionCleanupTest extends MQTTTestSupport { + + @Override + protected void addMQTTConnector() { + + Map params = new HashMap<>(); + params.put(TransportConstants.PORT_PROP_NAME, "" + port); + params.put(TransportConstants.PROTOCOLS_PROP_NAME, "MQTT"); + params.put(TransportConstants.CONNECTIONS_ALLOWED, 1); + params.put(TransportConstants.HOST_PROP_NAME, "localhost"); + + TransportConfiguration mqtt = new TransportConfiguration(NETTY_ACCEPTOR_FACTORY, params, "MQTT"); + + server.getConfiguration().addAcceptorConfiguration(mqtt); + } + + @Test(timeout = 30 * 1000) + public void testBadClient() throws Exception { + MQTT mqtt = createMQTTConnection(); + mqtt.setClientId(""); + mqtt.setCleanSession(true); + BlockingConnection connection = mqtt.blockingConnection(); + connection.connect(); + + try { + connection = mqtt.blockingConnection(); + connection.connect(); + fail("second connection shouldn't be allowed"); + } catch (Exception e) { + //ignore. + } + + NettyAcceptor acceptor = (NettyAcceptor) server.getRemotingService().getAcceptor("MQTT"); + assertEquals(1, acceptor.getConnections().size()); + + //now simulate a bad client by manually fail the server connection + RemotingConnection conn = server.getRemotingService().getConnections().iterator().next(); + + assertTrue(conn instanceof MQTTConnection); + + conn.fail(new ActiveMQException("testBadClient")); + + assertEquals("Server connection not cleaned up!", 0, acceptor.getConnections().size()); Review comment: Can you use Wait.assertEquals here please? this is most likely an asynchronous operation that may eventual fail unless you use wait here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] wy96f commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
wy96f commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535777939 @franz1981 Hi, I made tests with writeBufferHighWaterMark=2MB, 10MB, 100MB, 200MB, the replication still failed(shocked). After some analysis, i think results might be reasonable. Whatever writeBufferHighWaterMark value we tune to, the total time with channel writable is similar(If using big value, it would take long to saturate channel; If using small value, it would take more times to saturate channel although shorter time). Considering netty thread would read chunk file to add ByteBuf in outboundbuffer and artemis thread just put the packets into netty executor(that's more fast), packets will definitely build up in the chunk writer queue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] wy96f edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
wy96f edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535777939 @franz1981 Hi, I made tests with writeBufferHighWaterMark=2MB, 10MB, 100MB, 200MB, the replication still failed(shocked). After some analysis, i think results might be reasonable. Whatever writeBufferHighWaterMark value we tune to, the total time with channel writable is similar(If using big value, it would take long to saturate channel; If using small value, it would take more times to saturate channel although shorter time). Considering netty thread would read chunk file to add ByteBuf in outboundbuffer(size will be added) and artemis thread just put the packets into netty executor(that's more fast), packets will definitely build up in the chunk writer queue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535786097 @wy96f thanks to have tried! It sounds strange to me: I was thinking the reason why was taking more was due to being continuosly stopped/being awaken and sending short chunks to the network (ie more syscalls with less data). I have a strong feeling that maybe the point is that sendFile with send a 1 MB chunk *directly* without using the TCP buffer at allif is the case, it means we should increase the chunkSize too (1 MB or at least 100K) and the TCP buffer accordingly... Did you observe that the network was saturated in both cases? If you use async-profiler you can check what the kernel does and where most the cost is for both cases This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535786097 @wy96f thanks to have tried! It sounds strange to me: I was thinking the reason why was taking more was due to being continuosly stopped/being awaken and sending short chunks to the network (ie more syscalls with less data). I have a strong feeling that sendFile send a 1 MB chunk *directly* without using the TCP buffer at allif is the case, it means we should increase the chunkSize (1 MB or at least 100K) and the TCP buffer accordingly (that's very small, by default afaik). Did you observe that the network was saturated in both cases? If you use async-profiler you can check what the kernel does and where most the cost is for both cases This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] wy96f commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
wy96f commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535797340 ``` 2019-09-27 13:47:01,943 DEBUG [org.apache.activemq.artemis.core.replication.ReplicationManager] sending 1048576 bytes on file 02541.page 2019-09-27 13:47:01,943 DEBUG [org.apache.activemq.artemis.core.replication.ReplicationManager] sending 1048576 bytes on file 02541.page 2019-09-27 13:47:01,943 DEBUG [org.apache.activemq.artemis.core.replication.ReplicationManager] sending 1048496 bytes on file 02541.page 2019-09-27 13:47:01,945 DEBUG [org.apache.activemq.artemis.core.replication.ReplicationManager] sending 0 bytes on file 02541.page ^C [artemis@windqpstdb05 bin]$ sar -n DEV 1 Linux 2.6.32-279.19.1.el6_sn.12.x86_64 (windqpstdb05)09/27/2019 _x86_64_(8 CPU) 01:47:37 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:38 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:38 PM eth0 4548.00 2576.00294.39 108552.23 0.00 0.00 0.00 01:47:38 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:39 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:39 PM eth0 4520.00 2528.00292.59 108009.43 0.00 0.00 0.00 01:47:39 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:40 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:40 PM eth0 4497.00 2588.00291.05 107394.71 0.00 0.00 0.00 01:47:40 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:41 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:41 PM eth0 4483.00 2561.00290.18 106670.38 0.00 0.00 0.00 01:47:41 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:42 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:42 PM eth0 4494.00 2584.00290.85 107486.57 0.00 0.00 0.00 ``` Yes, i saw the network was saturated(initial-replication-sync-timeout was set to 30, otherwise replication failed due to timeout). BTW,`tcp://10.244.201.200:61616?;tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300 ` I used this in broker.xml. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] wy96f edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
wy96f edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535797340 @franz1981 ``` 2019-09-27 13:47:01,943 DEBUG [org.apache.activemq.artemis.core.replication.ReplicationManager] sending 1048576 bytes on file 02541.page 2019-09-27 13:47:01,943 DEBUG [org.apache.activemq.artemis.core.replication.ReplicationManager] sending 1048576 bytes on file 02541.page 2019-09-27 13:47:01,943 DEBUG [org.apache.activemq.artemis.core.replication.ReplicationManager] sending 1048496 bytes on file 02541.page 2019-09-27 13:47:01,945 DEBUG [org.apache.activemq.artemis.core.replication.ReplicationManager] sending 0 bytes on file 02541.page ^C [artemis@windqpstdb05 bin]$ sar -n DEV 1 Linux 2.6.32-279.19.1.el6_sn.12.x86_64 (windqpstdb05)09/27/2019 _x86_64_(8 CPU) 01:47:37 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:38 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:38 PM eth0 4548.00 2576.00294.39 108552.23 0.00 0.00 0.00 01:47:38 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:39 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:39 PM eth0 4520.00 2528.00292.59 108009.43 0.00 0.00 0.00 01:47:39 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:40 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:40 PM eth0 4497.00 2588.00291.05 107394.71 0.00 0.00 0.00 01:47:40 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:41 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:41 PM eth0 4483.00 2561.00290.18 106670.38 0.00 0.00 0.00 01:47:41 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:42 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:42 PM eth0 4494.00 2584.00290.85 107486.57 0.00 0.00 0.00 ``` Yes, i saw the network was saturated(initial-replication-sync-timeout was set to 30, otherwise replication failed due to timeout). BTW,`tcp://10.244.201.200:61616?;tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300 ` I used this in broker.xml. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] wy96f edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
wy96f edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535797340 @franz1981 ``` 2019-09-27 13:47:01,943 DEBUG [org.apache.activemq.artemis.core.replication.ReplicationManager] sending 1048576 bytes on file 02541.page 2019-09-27 13:47:01,943 DEBUG [org.apache.activemq.artemis.core.replication.ReplicationManager] sending 1048576 bytes on file 02541.page 2019-09-27 13:47:01,943 DEBUG [org.apache.activemq.artemis.core.replication.ReplicationManager] sending 1048496 bytes on file 02541.page 2019-09-27 13:47:01,945 DEBUG [org.apache.activemq.artemis.core.replication.ReplicationManager] sending 0 bytes on file 02541.page ^C [artemis@windqpstdb05 bin]$ sar -n DEV 1 Linux 2.6.32-279.19.1.el6_sn.12.x86_64 (windqpstdb05)09/27/2019 _x86_64_(8 CPU) 01:47:37 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:38 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:38 PM eth0 4548.00 2576.00294.39 108552.23 0.00 0.00 0.00 01:47:38 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:39 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:39 PM eth0 4520.00 2528.00292.59 108009.43 0.00 0.00 0.00 01:47:39 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:40 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:40 PM eth0 4497.00 2588.00291.05 107394.71 0.00 0.00 0.00 01:47:40 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:41 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:41 PM eth0 4483.00 2561.00290.18 106670.38 0.00 0.00 0.00 01:47:41 PM IFACE rxpck/s txpck/srxkB/stxkB/s rxcmp/s txcmp/s rxmcst/s 01:47:42 PMlo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:47:42 PM eth0 4494.00 2584.00290.85 107486.57 0.00 0.00 0.00 ``` Yes, i saw the network was saturated(initial-replication-sync-timeout was set to 30, otherwise replication failed due to timeout). Note that log showed `13:47:01,945` last page sent, and sar showed `01:47:37 PM` queued up data still transferring and saturating network. BTW,`tcp://10.244.201.200:61616?;tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300 ` I used this in broker.xml. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535803060 If both cases (with/without file region) are saturating the network, why the latter will take more time? The total amount of data sent should be the same... Do you have tried master as well? I'm start to think that the pipelining happening while copying data in a non-netty thread and a separate Netty thread sending them across network is beneficial to improve the overall throughput, because we can do something (ie reading the file) while Netty is taking care to send data across network. But that means that if we don't use file regions I expect that network is not saturated 100% of the time and sometime wait ChunkInput to finish reading data and saturate it again... I don't know if your acceptor configuration of TCP buffer is working,need to inspect the logs... And profiling could be helpful as well... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
franz1981 edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535803060 If both cases (with/without file region) are saturating the network, why the latter will take more time? The total amount of data sent should be the same... Do you have tried master as well? I'm start to think that the pipelining happening while copying data in a non-netty thread and a separate Netty thread sending them across network is beneficial to improve the overall throughput, because we can do something (ie reading the file) while Netty is taking care to send data across network. But that means that if we don't use file regions I expect that network is not saturated 100% of the time and sometime wait ChunkInput to finish reading data and saturate it again: sar shows averages over sampling intervals so I believe that we can't spot those spikes... I don't know if your acceptor configuration of TCP buffer is working,need to inspect the logs... And profiling could be helpful as well... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] wy96f commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
wy96f commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535807466 @franz1981 > If both cases (with/without file region) are saturating the network, why the latter will take more time? The total amount of data sent should be the same... They're taking same time(~7 mins). As i said, `` In the case of -Dio.netty.file.region=true and master, log(something like this 2019-09-26 11:02:49,348 DEBUG [org.apache.activemq.artemis.core.replication.ReplicationManager] sending 1048576 bytes on file ) showed it took about 7 minutes to transfer files, then synchronization done message sent. However in the case of -Dio.netty.file.region=false, log showed it took about about 40 seconds to transfer files, then sync done message sent. `` `` Yes, i saw the network was saturated(initial-replication-sync-timeout was set to 30, otherwise replication failed due to timeout). Note that log showed 13:47:01,945 last page sent, and sar showed 01:47:37 PM queued up data still transferring and saturating network. `` Saturation lasted for ~7 mins. > Do you have tried master as well? The same. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] wy96f commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
wy96f commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535813732 @franz1981 profiling without file region: [https://filebin.net/r9o4bupoym9zxwk9/netty_false.svg?t=ld5f197t](url) Note I profiled after 40s(after log showed last page sent) so most of samples were about netty. profiling with file region: [https://filebin.net/r9o4bupoym9zxwk9/netty_true.svg?t=mlott712](url) profiling master: [https://filebin.net/r9o4bupoym9zxwk9/profiler_master.svg?t=mlott712](url) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] wy96f edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
wy96f edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535813732 @franz1981 profiling without file region: https://filebin.net/r9o4bupoym9zxwk9/netty_false.svg?t=ld5f197t Note I profiled after 40s(after log showed last page sent) so most of samples were about netty. profiling with file region: https://filebin.net/r9o4bupoym9zxwk9/netty_true.svg?t=mlott712 profiling master: https://filebin.net/r9o4bupoym9zxwk9/profiler_master.svg?t=mlott712 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [activemq-artemis] wy96f edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)
wy96f edited a comment on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN) URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535813732 @franz1981 profiling without file region: https://filebin.net/r9o4bupoym9zxwk9/netty_false.svg?t=ld5f197t Note I profiled after 40s(after log showed last page sent) so most of samples were about netty. profiling with file region: https://filebin.net/r9o4bupoym9zxwk9/netty_true.svg?t=mo3tho33 profiling master: https://filebin.net/r9o4bupoym9zxwk9/profiler_master.svg?t=mo3tho33 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services