[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2020-01-22 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-577389284
 
 
   @michaelandrepearce Please wait a bit more: I've an opened issue on Netty 
related to `ChunkedWriteHandler` that could simplify this one, when merged. 
sadly I haven't had much time to work on the Netty side of it and it has 
blocked this one by consequence :(


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-11-06 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-550951027
 
 
   @clebertsuconic I believe this is getting in good shape for a review, 
according to the test results. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-11-04 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-549300079
 
 
   @wy96f I've added an additional commit to control the backlog of unflushed 
Netty writes, to improve the precision of the replication done timeout 
performed right after "sending" all the replicated files: due to how Netty 
works (the previous comment has the relevant info about it), probably is the 
safer way to fail fast due to slow network/dead connections on replication. Let 
me know if your test won't be too slow due to this (with both zero copy or not) 
and if it seems correct now (without changing the default timeout on initial 
replication). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-10-31 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-548529114
 
 
   @wy96f I've tried to understand what's happening for ChunkedNioFile and 
probably it needs to be fixed Netty-side (see 
https://github.com/netty/netty/issues/3413#issuecomment-548501688), let's see.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-10-29 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-547275684
 
 
   Ok, now it is more clear, thanks! 
   So ByteBufs start to be accumulated due to FileRegions time to send.
   I will take another look to the file region = false case, but probably 
accounting correctly with a custom msg estimator could make the back-pressure 
to be triggered before, but i'm not sure...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-10-28 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-547052727
 
 
   @wy96f 
   One quick question: on 
https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535440070 I 
see that you've said:
   > with -Dio.netty.file.region=true or master
   
   But actually master and  `-Dio.netty.file.region=true` are very different!
   master has the feature with zero copy not enabled and `ByteBuf` are 
correctly estimated, while  `-Dio.netty.file.region=true` on this PR is using 
custom `FileRegion`s that doesn't seems correctly estimated according to 
https://github.com/netty/netty/blob/ff7a9fa091a8bf2e10020f83fc4df1c44098/transport/src/main/java/io/netty/channel/DefaultMessageSizeEstimator.java#L45.
   In theory `-Dio.netty.file.region=true` should time out due to the wrong 
estimation: do you have tested `-Dio.netty.file.region=true` on this PR with 
your long replication backlog test?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-10-13 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-541495785
 
 
   @wy96f sadly the rate limiter algorithm in guava is not very good fore fast 
paced networks and if with small token size, despite what the doc says about it 
:)
   
   Netty is considering 0 sized the FileRegion sent on the outbound buffer 
afaik and just "unknown" the one based on chunked files, but the point is that 
the latter will be accumulated and won't back-pressure the channel.becauee are 
sent through a background accumulator (the additional outbound channel in the 
pipeline) ie the solution to provide a custom estimator makes sense imho, but I 
need my computer at hand to try it.
   
   My point on the existing sync done is that is not actually correct because 
it assume that the previous files are being sent before sending that sync done 
while is not in any case, because Netty is asynchronous...the chunked file case 
just make it more evident..
   My 2 cents :)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-10-12 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-541386793
 
 
   @wy96f I'll look better into this on my back, but please take a look 
yourself if you have any idea.
   From what I've understood we can:
   - fix 
https://github.com/netty/netty/blob/ff7a9fa091a8bf2e10020f83fc4df1c44098/transport/src/main/java/io/netty/channel/DefaultMessageSizeEstimator.java#L45
 in order to account for chunked files correctly and flow control them while 
being enqueued into chunk writer 
   - change how sync done timeout is being calculated and consider the total 
time to replicate (from the first file region/chunk sent) instead of the time 
from when the sync done is being enqueued in Netty 
   
   wdyt?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-09-27 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535862167
 
 
   @wy96f probably the size estimator of Netty or/and the pending writes count 
is not working as I was expecting with file regions...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-09-27 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535837012
 
 
   To make profiling traces comparable in number of samples you should need to 
limit the duration and make them equals eg -d 10 (seconds) 
   Ok to do it after 40 seconds depending which part of the catching up we are 
interested in...
   Re the timeout I need to re-read your previous answers: you said that both 
the approaches (file region/chunked files) max out network and take the same 
time, but only the latter will fail on replication timeout?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-09-27 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535824970
 
 
   Sorry I have missed that the file region=true version was failing on timeout 
as well: tbh I'm not sure it makes sense to have to have such small timeout in 
that case, given that depends by disk speed/network bandwidth and total 
transferred size
   Just one advice: try collecting samples with -t..I wasn't expecting 
blockUntiWritable to be that costy


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-09-27 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535803060
 
 
   If both cases (with/without file region) are saturating the network, why the 
latter will take more time? The total amount of data sent should be the same...
   Do you have tried master as well? 
   I'm start to think that the pipelining happening while copying data in a 
non-netty thread and a separate Netty thread sending them across network is 
beneficial to improve the overall throughput, because we can do something (ie 
reading the file) while Netty is taking care to send data across network. But 
that means that if we don't use file regions I expect that network is not 
saturated 100% of the time and sometime wait ChunkInput to finish reading data 
and saturate it again...
   I don't know if your acceptor configuration of TCP buffer is working,need to 
inspect the logs...
   And profiling could be helpful as well...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-09-26 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535786097
 
 
   @wy96f thanks to have tried! It sounds strange to me: I was thinking the 
reason why was taking more was due to being continuosly stopped/being awaken 
and sending short chunks to the network (ie more syscalls with less data). I 
have a strong feeling that maybe the point is that sendFile with send a 1 MB 
chunk *directly* without using the TCP buffer at allif is the case, it 
means we should increase the chunkSize too (1 MB or at least 100K) and the TCP 
buffer accordingly...
   Did you observe that the network was saturated in both cases?
   If you use async-profiler you can check what the kernel does and where most 
the cost is for both cases


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-09-26 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535443555
 
 
   @wy96f 
   Good analysis :)
   Yep, so the possible solutions I see are are:
   - tune differently `lowWaterMark` and `highWaterMark`  > 1 MB
   - split the file in smaller chunks (ie 1 MB now, probably 32K is better, 
configurable, even better) when `-Dio.netty.file.region=false`
   - flow control sync file sends by using the received sync file responses 
instead of using the Netty one (that as u have noticed require more 
configuration/tuning and behave differently depending if using the file region 
or not...)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-09-26 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535399659
 
 
   Maybe on ActiveMQChannelHandler there are others events that are not 
correctly propagated through the pipeline and that would wake up the chunk 
writer?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-09-26 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-535397765
 
 
   @wy96f
   It's strange that the back-pressure propagation fix isn't working: have you 
used the last version that include that fix? 
   Flow control on Netty for chunked nio files should increase/decrease 
outbound pending bytes as with "normal" ByteBuf writes...can you check if the 
writeability changes are correctly propagated to ChunkedWriteHandler?
   I will be in vacation from today so don't have access for about 1 month to 
my computer: I will do my best to help as I can on my return, but we are near 
to fix this :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-09-25 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-534999649
 
 
   @wy96f 
   
   > . So only non file packet accounts for the pending write bytes in channel, 
and flowControl is not working very well in this case, will this have a 
negative impact?
   
   I see that `DefaultMessageSizeEstimator` for `FileRegion` isn't handled nor 
`ChannelOutboundBuffer::decrementPendingOutboundBytes` is called afted 
`sendFile` succeed: it means that our `lazy` backpressure is quite limiting 
while using `FileRegion`s.
   For `ChunkedNioFile` is a different story, but equally interesting: 
`DefaultMessageSizeEstimator` is not able to recognize `ChunkedNioFile` , but 
`ChannelOutboundBuffer::decrementPendingOutboundBytes` is correctly handling it 
because `ChunkedWriteHandler` is transparently handling the files as `ByteBuf`s.
   I think that is time to simplify `blockUntilWritable` to make it more lazy 
ie to just leverage on Netty `isWritable` instead of trying to calculate 
exactly the pending bytes :) 
   I'm adding a commit to address that :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-09-22 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-533885763
 
 
   @clebertsuconic yes, agree and I would like to run a soak test + wait @wy96f 
results as well :+1
   The wildfly tests seems pretty stable with this, but we have many of them to 
be tried first 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-09-22 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-533881207
 
 
   @wy96f I remember you've provided some numbers for ARTEMIS-2336 using one of 
yours load generator: it would be nice to compare the results with this pr 
using `-Dio.netty.file.region=true|false` and vs master to verify that 
`-Dio.netty.file.region=false` is not worst then master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [activemq-artemis] franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate journal/page/large message file (AGAIN)

2019-09-22 Thread GitBox
franz1981 commented on issue #2845: ARTEMIS-2336 Use zero copy to replicate 
journal/page/large message file (AGAIN)
URL: https://github.com/apache/activemq-artemis/pull/2845#issuecomment-533874887
 
 
   @ehsavoie @clebertsuconic This version is not failing anymore the wildfly 
tests on https://issues.apache.org/jira/browse/ARTEMIS-2496


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services