[jira] [Commented] (LOG4J2-1855) Add an optional random delay in TimeBasedTriggeringPolicy
[ https://issues.apache.org/jira/browse/LOG4J2-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938357#comment-15938357 ] Anthony Maire commented on LOG4J2-1855: --- Pull Request : https://github.com/apache/logging-log4j2/pull/68 > Add an optional random delay in TimeBasedTriggeringPolicy > - > > Key: LOG4J2-1855 > URL: https://issues.apache.org/jira/browse/LOG4J2-1855 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.8.1 >Reporter: Anthony Maire > > When there are lots of JVM on the same physical host configured to rolling > their log file with a TimeBasedTriggeringPolicy, it will create a temporary > load that can impact the behavior of the applications since all JVM will roll > and compress the old file at the same time > The purpose of this enhancement is to add an additional property on > TimeBasedTriggeringPolicy to delay the rolling by a random amount of time to > avoid that massive simultaneous rolling. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397718#comment-15397718 ] Anthony Maire commented on LOG4J2-1430: --- Running up to 10 more spinning threads than available core can NEVER be relevant, it's just like trying to mesure the speed of a car without releasing handbrake. For the "simulating a desktop system" and the "perf enthusiasts use taskset" parts, the issue is that your using cores from different CPUs : in a desktop system, cores won't never communicate through a QPI link, and perf enthusiasts will try to exploit NUMA locality as much as possible, the taskset setting you used is a counter-productive tuning since it will cause lots of cross-node memory access However, as I said in another comment, these throughput test are not really relevant, I will try to run some other tests in a near future. > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, AsyncLogBenchmarks.log, > conversantvsjctoolsnumthreads.jpg, jctools-vs-conversant-service-time.png, > log4j2-1430-jctools-tmp-patch.txt, log4jHaswell2cpu2core.jpg, > log4jHaswell2cpu4core.jpg, log4jrafile.log, log4jthread2cpu2core.log, > log4jthread2cpu4core.log > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 6193288.988 ± 93545.144 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 5715621.712 ± 131878.581 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 5745187.005 ± 213854.016 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 5307137.396 ± 88135.709 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 4953015.419 ± 72100.403 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 4833836.418 ± 52919.314 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 4353791.507 ± 79047.812 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 4136761.624 ± 67804.253 ops/s >
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397704#comment-15397704 ] Anthony Maire commented on LOG4J2-1430: --- Hi Remko I had a look on the performance page, the tests included in this one are far more relevant that the JMH throughput test that measure something that will never exist in real world. I will definitely try to run this one with the various queue implementations when I will have some time (but I'm currently pretty busy). However I'm very surprised with the figures for AsyncLogger : how can the throughput still double when thread count double but there are already as much thread as available core/hyperthreads ? I suspect there is something wrong there. Did you made some investigation on that point ? About the tests with the RAF appender, I made some further test using the rolling variant, with maximum 1 file, and all of that writing to /dev/shm to avoid real fs/disk jitter, the maximum rate I had was around 900K msg/s, very far from the 3 to 7M msg/s I had with a no-op appender with custom queues. Since the JMH test only mesure how fast the appender thread can dequeue (cf a huge increase with JCTools queue when I changed the offer method to avoid false sharing between producer and consumer), doing comparison on it are really pointless. The degree of contention in real world will be really lower, even when it is not blocking on disk > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, AsyncLogBenchmarks.log, > conversantvsjctoolsnumthreads.jpg, jctools-vs-conversant-service-time.png, > log4j2-1430-jctools-tmp-patch.txt, log4jHaswell2cpu2core.jpg, > log4jHaswell2cpu4core.jpg, log4jrafile.log, log4jthread2cpu2core.log, > log4jthread2cpu4core.log > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 6193288.988 ± 93545.144 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 5715621.712 ± 131878.581 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 5745187.005 ± 213854.016 ops/s >
[jira] [Comment Edited] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387588#comment-15387588 ] Anthony Maire edited comment on LOG4J2-1430 at 7/21/16 3:02 PM: Hello John Thank you for all these benchmark, but I'm afraid they are not really relevant. Can you explain why you constrained the JVM to run on only 2 or 4 cores, with QPI transfers involved? I cannot see the point. As soon as the number of threads is high enough, we expect the queue to be mostly full, so for the non-JDK queues we will have a lot of threads spinning on very few cores, which is really really bad. Can you either : - use park based strategy for queues (i.e WAITING for Conversant and the default PARK for JCTools) if this kind of taskset setup made sense to your main use cases : I cannot imagine an application with 10 times more active threads than available core that want to use a spin or yield-based strategy - don't constraint the JVM to let it use the 24 cores available on your box - constraint the JVM to a whole NUMA node (using numactl -N 1 -m 1 java -jar ) but don't run more than 11 spinning producer threads. Moreover in such heavily contented scenario, there is usually a throughput vs latency tradeoff (if some threads are frozen, contention is released and overall throughput increases), I'm pretty sure that running the same benchmark with JMH "sample time" mode will show that JCTools queue latency is lower and more stable than the Conversant one. That's why in the 2 core scenario you have JCTools queue that perform worse than ABQ when thread count increases : this kind of thing cannot occur with realistic loads (i.e with no more spinning threads than available core, or with a park-based waiting strategy) Kind regards, was (Author: anthony maire): Hello John Thank you for all these benchmark, but I'm afraid they are not really relevant. Can you explain why you constrained the JVM to run on only 2 or 4 cores, with QPI transfers involved? I cannot see the point. As soon as the number of threads is high enough, we expect the queue to be mostly full, so for the non-JDK queues we will have a lot of threads spinning on very few cores, which is really really bad. Can you either : - use park based strategy for queues (i.e WAITING for Conversant and the default PARK for JCTools) if this kind of taskset setup made sense to your main use cases : I cannot imagine an application with 10 times more active threads than available core that want to use a spin or yield-based strategy - don't constraint the JVM to let it use the 24 cores available on your box - constraint the JVM to a whole NUMA node (using numactl -N 1 -m 1 java -jar ) but don't run more than 11 spinning producer threads. Moreover in such heavily contented scenario, there is usually a throughput vs latency tradeoff (if some threads are frozen, contention is released and overall throughput increases), I'm pretty sure that running the same benchmark with JMH "sample time" mode will show that JCTools queue latency is lower and more stable than the Conversant one. That's why in the 2 core scenario you have JCTools queue that perform worse than ABQ when thread count increases : this kind of think cannot occurs with realistic loads (i.e with no more spinning threads than available core, or with a park-based waiting strategy) Kind regards, > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, AsyncLogBenchmarks.log, > conversantvsjctoolsnumthreads.jpg, jctools-vs-conversant-service-time.png, > log4j2-1430-jctools-tmp-patch.txt, log4jHaswell2cpu2core.jpg, > log4jHaswell2cpu4core.jpg, log4jrafile.log, log4jthread2cpu2core.log, > log4jthread2cpu4core.log > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387588#comment-15387588 ] Anthony Maire commented on LOG4J2-1430: --- Hello John Thank you for all these benchmark, but I'm afraid they are not really relevant. Can you explain why you constrained the JVM to run on only 2 or 4 cores, with QPI transfers involved? I cannot see the point. As soon as the number of threads is high enough, we expect the queue to be mostly full, so for the non-JDK queues we will have a lot of threads spinning on very few cores, which is really really bad. Can you either : - use park based strategy for queues (i.e WAITING for Conversant and the default PARK for JCTools) if this kind of taskset setup made sense to your main use cases : I cannot imagine an application with 10 times more active threads than available core that want to use a spin or yield-based strategy - don't constraint the JVM to let it use the 24 cores available on your box - constraint the JVM to a whole NUMA node (using numactl -N 1 -m 1 java -jar ) but don't run more than 11 spinning producer threads. Moreover in such heavily contented scenario, there is usually a throughput vs latency tradeoff (if some threads are frozen, contention is released and overall throughput increases), I'm pretty sure that running the same benchmark with JMH "sample time" mode will show that JCTools queue latency is lower and more stable than the Conversant one. That's why in the 2 core scenario you have JCTools queue that perform worse than ABQ when thread count increases : this kind of think cannot occurs with realistic loads (i.e with no more spinning threads than available core, or with a park-based waiting strategy) Kind regards, > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, AsyncLogBenchmarks.log, > conversantvsjctoolsnumthreads.jpg, jctools-vs-conversant-service-time.png, > log4j2-1430-jctools-tmp-patch.txt, log4jHaswell2cpu2core.jpg, > log4jHaswell2cpu4core.jpg, log4jrafile.log, log4jthread2cpu2core.log, > log4jthread2cpu4core.log > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s >
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376524#comment-15376524 ] Anthony Maire commented on LOG4J2-1430: --- LinkedBlockingQueue should be considered I think, it has better concurrency property than ABQ, but it can be bounded. It is basically a good tradeoff between ABQ and LTQ > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, > conversantvsjctoolsnumthreads.jpg, jctools-vs-conversant-service-time.png, > log4j2-1430-jctools-tmp-patch.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 6193288.988 ± 93545.144 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 5715621.712 ± 131878.581 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 5745187.005 ± 213854.016 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 5307137.396 ± 88135.709 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 4953015.419 ± 72100.403 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 4833836.418 ± 52919.314 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 4353791.507 ± 79047.812 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 4136761.624 ± 67804.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 6719456.722 ± 187433.301 ops/s > {code} > h3. AsyncLogger > {code} > BenchmarkMode Samples > ScoreError Units > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput10Paramsthrpt 20 > 5075883.371 ± 180465.316 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput11Paramsthrpt 20 > 4867362.030 ± 193909.465 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput1Param thrpt 20 > 10294733.024 ± 226536.965 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput2Params thrpt 20 > 9021650.667 ± 351102.255 ops/s >
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376500#comment-15376500 ] Anthony Maire commented on LOG4J2-1430: --- If you have enough time (I don't have access for a while on the Haswell box I was using earlier), can you test what is the max throughput that a real appender can sustain. Can you please change the NoOpAppender used in the test to a RandomAccessFile appender with immediateFlush=false (I think that is the appender that offer the best throughput currently) with 8 threads and either JCTools or Conversant queue Basically if all queue implementation can deliver more throughput that a real underlying appender can sustain, comparing the throughput of the queues does not make any sense, and only other characteristic such as offer latency and GC pressure really matters. > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, > conversantvsjctoolsnumthreads.jpg, jctools-vs-conversant-service-time.png, > log4j2-1430-jctools-tmp-patch.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 6193288.988 ± 93545.144 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 5715621.712 ± 131878.581 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 5745187.005 ± 213854.016 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 5307137.396 ± 88135.709 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 4953015.419 ± 72100.403 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 4833836.418 ± 52919.314 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 4353791.507 ± 79047.812 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 4136761.624 ± 67804.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 6719456.722 ± 187433.301 ops/s > {code} > h3. AsyncLogger > {code} > Benchmark
[jira] [Commented] (LOG4J2-1439) Add optional support for JCTools in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357521#comment-15357521 ] Anthony Maire commented on LOG4J2-1439: --- Sorry, I forget to include the pom when generating the patch. You need version 1.2.1 > Add optional support for JCTools in AsyncAppender > - > > Key: LOG4J2-1439 > URL: https://issues.apache.org/jira/browse/LOG4J2-1439 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: LOG4J2-1439.patch.txt > > > Spawning from the discussion in LOG4J2-1430, this feature request is to add > support for using JCTools MPSC bounded lock-free queue in AsyncAppender. > As the work for this issue stems from the work in 1430, the code is all > contained in the same branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Updated] (LOG4J2-1439) Add optional support for JCTools in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Maire updated LOG4J2-1439: -- Attachment: LOG4J2-1439.patch.txt - Support for several wait strategies - Always keep some empty space in the queue to avoid the producer thread to suffer from false sharing when the queue is full - Add a queue latency test to check performance without using AsyncAppender which is often hiding the queue performance issues. > Add optional support for JCTools in AsyncAppender > - > > Key: LOG4J2-1439 > URL: https://issues.apache.org/jira/browse/LOG4J2-1439 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: LOG4J2-1439.patch.txt > > > Spawning from the discussion in LOG4J2-1430, this feature request is to add > support for using JCTools MPSC bounded lock-free queue in AsyncAppender. > As the work for this issue stems from the work in 1430, the code is all > contained in the same branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356953#comment-15356953 ] Anthony Maire commented on LOG4J2-1430: --- the patch has been attached to LOG4J2-1439 > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, > jctools-vs-conversant-service-time.png, log4j2-1430-jctools-tmp-patch.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 6193288.988 ± 93545.144 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 5715621.712 ± 131878.581 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 5745187.005 ± 213854.016 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 5307137.396 ± 88135.709 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 4953015.419 ± 72100.403 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 4833836.418 ± 52919.314 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 4353791.507 ± 79047.812 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 4136761.624 ± 67804.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 6719456.722 ± 187433.301 ops/s > {code} > h3. AsyncLogger > {code} > BenchmarkMode Samples > ScoreError Units > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput10Paramsthrpt 20 > 5075883.371 ± 180465.316 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput11Paramsthrpt 20 > 4867362.030 ± 193909.465 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput1Param thrpt 20 > 10294733.024 ± 226536.965 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput2Params thrpt 20 > 9021650.667 ± 351102.255 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput3Params thrpt 20 > 8079337.905 ± 115824.975 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput4Params
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351150#comment-15351150 ] Anthony Maire commented on LOG4J2-1430: --- I will finish the JCTools wrapper implementation (since there is a new JCTool release with a bug fix that I needed) and release a cleaner patch in a very near future > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, > jctools-vs-conversant-service-time.png, log4j2-1430-jctools-tmp-patch.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 6193288.988 ± 93545.144 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 5715621.712 ± 131878.581 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 5745187.005 ± 213854.016 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 5307137.396 ± 88135.709 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 4953015.419 ± 72100.403 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 4833836.418 ± 52919.314 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 4353791.507 ± 79047.812 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 4136761.624 ± 67804.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 6719456.722 ± 187433.301 ops/s > {code} > h3. AsyncLogger > {code} > BenchmarkMode Samples > ScoreError Units > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput10Paramsthrpt 20 > 5075883.371 ± 180465.316 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput11Paramsthrpt 20 > 4867362.030 ± 193909.465 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput1Param thrpt 20 > 10294733.024 ± 226536.965 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput2Params thrpt 20 > 9021650.667 ± 351102.255 ops/s >
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350921#comment-15350921 ] Anthony Maire commented on LOG4J2-1430: --- I made some additional tests on a dual xeon server (with 24 hyperthreads) to see how both queues behave under heavy contention (running lots of threads on a 4-cores machine does not show contention for these kind of queue since the CAS operations will never be contended by a sleeping thread) : - I first try to made some tests with the original ResponseTimeTest and AsyncAppender configuration files. It was pretty hard to find differences between both queue, because they both perform pretty well when the thoughput is way below the AsyncThread max thorughput. And the small difference between them are not significant because the amplitude is orders of magnitude below the GC pause caused by the appender itself - I tried to run 10 threads at 100K msg/s while in the background another process was spinning on 18 threads to simulate contention by other processes (without tuning, another JVM will run GC on 18 threads on this machine). As expected, I saw that the test with Conversant Queue was performing a little worse than the JCTools queue, but the difference is not that impressive I made additionnal tests on the queue by themselves, not used in an AsyncAppender (by writing custom RunXXX classes in the ResponseTimeTest package) : - 20 producer threads and 1 consumer thread, all with a spinning wait policy, so I expect to see a 2100% CPU usage in top - Pacer uses a Yield wait strategy (service time was worse with spinning since there were more threads than physical cores) - 20 minutes test (with 1 minute of warmup before) Since the AsyncThread was able to sustain a throughput of nearly 5 millions msg/s on this machine, I tried to run the consumer threads at 250 K ops/s each (so roughly 6 trillions operation for the whole test) - JCTools queue was able to execute the test, with a *maximum* response time of 11.8 ms and service time *99.99th percentile* is 2.8µs - Conversant queue failed, only 2.82 trillions ops were made, *median* response time is 10.4ms and *median* service time is 3.1µs. Moreover CPU usage never reached 2100%, but was often between 1600% and 1800% because of the LockSupport.parkNanos() call in Condition.progressiveYield() in their offer method when producers are contending each other (this queue is not lock-free, although the contention impact is way lighter than in a lock-based implementation such as ABQ) I made another test with 50K ops/s per threads (1M ops/s total, way below AsyncThread max throughput), both queues are able to execute the test, but Conversant queue service time is worse than the JCTools one (which has nearly the same service time in the two tests). Look at the vertical line in the attached screenshot for the Conversant curve (blue) : this 50µs gap match the minimal pause achievable with lockSupport.parkNanos() on this machine. It shows that it's the same contention issue between producer with prevented this queue to succeed in the 5M ops/s test. => JCtools queue provides more predictable latencies and a higher maximal throughput because of its lock-free nature By the way, I wonder if it does really make sense to provide specific alternative queues support for the async appender, since this support implicitly mean "we tested this queue, it rocks hard". Being able to inject a custom factory is maybe enough. Currently the ABQ is probably enough for 99% of the users of asynchronous logging, and "performance hardcore users" should use AsyncLogger since it is garbage-free and more efficient (and using the original LMAX disruptor will offer some possibilities than will never be possible with a queue, such as LOG4J2-1424) ... > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, > jctools-vs-conversant-service-time.png, log4j2-1430-jctools-tmp-patch.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 >
[jira] [Updated] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Maire updated LOG4J2-1430: -- Attachment: jctools-vs-conversant-service-time.png > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, > jctools-vs-conversant-service-time.png, log4j2-1430-jctools-tmp-patch.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 6193288.988 ± 93545.144 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 5715621.712 ± 131878.581 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 5745187.005 ± 213854.016 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 5307137.396 ± 88135.709 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 4953015.419 ± 72100.403 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 4833836.418 ± 52919.314 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 4353791.507 ± 79047.812 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 4136761.624 ± 67804.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 6719456.722 ± 187433.301 ops/s > {code} > h3. AsyncLogger > {code} > BenchmarkMode Samples > ScoreError Units > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput10Paramsthrpt 20 > 5075883.371 ± 180465.316 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput11Paramsthrpt 20 > 4867362.030 ± 193909.465 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput1Param thrpt 20 > 10294733.024 ± 226536.965 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput2Params thrpt 20 > 9021650.667 ± 351102.255 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput3Params thrpt 20 > 8079337.905 ± 115824.975 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput4Params thrpt 20 > 7347356.788
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346328#comment-15346328 ] Anthony Maire commented on LOG4J2-1430: --- I had a look on the ResponseTimeTest, it's exactly what I was thinking about for a good latency benchmark. I will try to find some time to play with it > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, > log4j2-1430-jctools-tmp-patch.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 6193288.988 ± 93545.144 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 5715621.712 ± 131878.581 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 5745187.005 ± 213854.016 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 5307137.396 ± 88135.709 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 4953015.419 ± 72100.403 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 4833836.418 ± 52919.314 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 4353791.507 ± 79047.812 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 4136761.624 ± 67804.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 6719456.722 ± 187433.301 ops/s > {code} > h3. AsyncLogger > {code} > BenchmarkMode Samples > ScoreError Units > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput10Paramsthrpt 20 > 5075883.371 ± 180465.316 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput11Paramsthrpt 20 > 4867362.030 ± 193909.465 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput1Param thrpt 20 > 10294733.024 ± 226536.965 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput2Params thrpt 20 > 9021650.667 ± 351102.255 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput3Params thrpt 20 > 8079337.905 ±
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345211#comment-15345211 ] Anthony Maire commented on LOG4J2-1430: --- The approch of using a NoOp appender for testing AsyncAppender isn't a bad idea at all. The "issue" is that it's easy to do a wrong interpretation of the result. The issue is that we naturally expect the queue to be the bottleneck (and in fact it is the bottleneck in a lots of ABQ tests), but with those high-performance queue, the single-threaded code between queue.take() and appender.append() is the bottleneck. With 8 producer threads, I had the consumer thread 100% busy but producers were not. So we are measuring something that doesn't reflect the real world (since in real application we don't expect the queue to be always full), and this benchmark is not good to show which queue implementation is the best. I'm probably biaised because I'm working for a low-latency company, but the best metrics to evaluate the "performance" of the AsyncAppender are latency, and latency is really damn hard to measure (cf this excellent talk from Gil Tene on this subject: https://www.infoq.com/presentations/latency-response-time ). For such cases, I usually mesure latency percentiles at a load that is below maximal throughput (since all systems suck from a latency point of view when running near their saturation point) by recording all individual results in a HdrHistogram. I don't think that is something that can be done with JMH, I usually write custom load generator to do this A major caveat (among lots of others) with measuring latency is that the production environment may be very different from the idealized environment you have when doing benchmarks. Especially, the system might be overloaded (i.e there are more active threads in the whole machine, including kernel threads, than number of available core) and/or virtualized. In this situation the kind of concurrency (wait-free, lock-free ...) will matter a lot since threads can be suspended for several milliseconds Showing this effect in a benchmark is hard, because it occurs at a very low frequency with "production-like" load and there are other causes of low frequency outliers. It may require lots of analysis. The usual way to show that an algorithm as poorer concurrency properties is to increase the throughput and/or the number of threads of the load generator until some candidates behave poorly: lock-free algorithm will scale better than blocking algorithms. But with current AsyncAppender code, doing this kind of test is not possible, because the async thread isn't performant enough. To compare queues, the best thing is to compare them without being integrated into AsyncAppender. Otherwise, we need to optimise the AsyncAppender itself a lot to show differences between queues, but this might be a pretty useless effort since current code is probably efficient enough with a real underlying appender. > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, > log4j2-1430-jctools-tmp-patch.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s >
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344294#comment-15344294 ] Anthony Maire commented on LOG4J2-1430: --- A made another test with the JCTools queue with the "yield then park" wait strategy that cause the throughput to drop earlier. I changed the implementation of the put() method to use offerIfBelowThreshold() instead of offer() (it requires a snapshot version of JCTools, this method is broken in 1.2, the fix has been merged earlier today). By this way, I made sure that at least 32 slots in the underlying buffer are free, so producer threads won't cause a cache miss for the consumer thread anymore. Now the throughput is roughly the same as the "pure park" wait strategy. That confirms what the feelings i wrote in the previous post : *for any good enough (i.e non-blocking) implementation of the queue, the AsyncAppender throughput performance test does only measure how bad the consumer thread is impacted by the wait strategy when the queue is full* > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, > log4j2-1430-jctools-tmp-patch.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 6193288.988 ± 93545.144 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 5715621.712 ± 131878.581 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 5745187.005 ± 213854.016 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 5307137.396 ± 88135.709 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 4953015.419 ± 72100.403 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 4833836.418 ± 52919.314 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 4353791.507 ± 79047.812 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 4136761.624 ± 67804.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 >
[jira] [Comment Edited] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340369#comment-15340369 ] Anthony Maire edited comment on LOG4J2-1430 at 6/20/16 8:52 PM: I made further tests today, and while I still believe that the Conversant Disruptor is not the best alternative because of the coupling between the claiming slot order an the data publishing order, the main issue is not there. These throughput benchmarks only mesure one thing: how much is the consumer thread impacted by the wait strategy. And as soon as the strategy does not use condition variables, the more the consumer threads are paused, the better the throughput will be As I said in a previous comment, even with 24 or 36 threads, only 12 hyperthreads were busy with the JCTools queue with a LockSupport.parkNanos wait strategy. So most of the time, the queue is full. I tried to play with the implementation of the wait strategy in my JCTools blocking wrapper to use something that is more like the one used in LMAX or Conversant disruptor with some spin/yield before falling back to park. The result is not what I expected : the yield loop made the JCTool queue throughput lower !! I think it can be explained by false sharing : when the queue is nearly full, the producers will write to slots that are adjacent to the one that is read by the consumer. Every time an event is inserted in the queue, that cache line is evicted from consumer core L1 cache, causing a cache miss. If producers are paused for a longer time, then the consumer can append events without these cache misses for a longer time, and the throughput is better That's why the Conversant Disruptor was performing way better under window than under Linux : the pause time of parkNanos is 20 times longer on windows and it was enough to avoid the spin/yield loop negative impact in the wait strategy. The main question with these benchmarks is: what do we want to measure, how can we say that an implementation is more "performant" than another one ? The throughput with a no-op appender is maybe not the best metric: in real applications, there will be real filters + appenders that will not support such a throughput, so we are mesuring something that does not exist in the real world. In my opinion, there are 2 usecases where an async appender is useful : - when we need to increase the maximal throughput of the underlying appender thanks to batching (cf RandomAccessFile appender with immediateFlush = false). The queue implementation has few impact on this - when we need to keep latency low in the application thread for a realistic load (i.e we don't expect the queue to be always nearly full, if this does happen, then another part of the system need improvements) As Remko said, log4J is used in a lot of situations that we cannot control. And saying something like "you shouldn't have more active threads than cores in your app" is not a solution, if the server is used for other apps it will not be enough. Being lock-free on the producer side (as soon as the queue is not full) is a mandatory characteristic in my opinion. The intrisinc latency of the queue is often not the main characterisic : there are lot's of things done in the main thread before enqueuing, especially with parametrized messages. The call to the logger can take a few microseconds, whereas any reasonable high-performance collection will enqueue in something like 100ns under low contention (and the saturation rate of the underlying appender will often happen way before achieving high contention on the queue). was (Author: anthony maire): I made further tests today, and while I still believe that the Conversant Disruptor is not the best alternative because of the coupling between the claiming slot order an the data publishing order, the main issue is not there. These throughput benchmarks only mesure one thing: how much is the consumer thread impacted by the wait strategy. And as soon as the strategy does not use condition variables, the more the consumer threads are paused, the better the throughput will be As I said in a previous comment, even with 24 or 36 threads, only 12 hyperthreads were busy with the JCTools queue with a LockSupport.parkNanos wait strategy. So most of the time, the queue is full. I tried to play with so implementation of the wait strategy in my JCTools blocking wrapper to use something that is more like the one used in LMAX or Conversant disruptor with some spin/yield before falling back to park. The result is not what I expected : the yield loop made the JCTool queue throughput lower !! I think it can be explained by false sharing : when the queue is nearly full, the producers will write to slots that are adjacent to the one that is read by the consumer. Every time an event is inserted in the queue, that cache line is evicted
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340369#comment-15340369 ] Anthony Maire commented on LOG4J2-1430: --- I made further tests today, and while I still believe that the Conversant Disruptor is not the best alternative because of the coupling between the claiming slot order an the data publishing order, the main issue is not there. These throughput benchmarks only mesure one thing: how much is the consumer thread impacted by the wait strategy. And as soon as the strategy does not use condition variables, the more the consumer threads are paused, the better the throughput will be As I said in a previous comment, even with 24 or 36 threads, only 12 hyperthreads were busy with the JCTools queue with a LockSupport.parkNanos wait strategy. So most of the time, the queue is full. I tried to play with so implementation of the wait strategy in my JCTools blocking wrapper to use something that is more like the one used in LMAX or Conversant disruptor with some spin/yield before falling back to park. The result is not what I expected : the yield loop made the JCTool queue throughput lower !! I think it can be explained by false sharing : when the queue is nearly full, the producers will write to slots that are adjacent to the one that is read by the consumer. Every time an event is inserted in the queue, that cache line is evicted from consumer core L1 cache, causing a cache miss. If producers are paused for a longer time, then the consumer can append events without these cache misses for a longer time, and the throughput is better That's why the Conversant Disruptor was performing way better under window than under Linux : the pause time of parkNanos is 20 times longer on windows and it was enough to avoid the spin/yield loop negative impact in the wait strategy. The main question with these benchmarks is: what do we want to measure, how can we say that an implementation is more "performant" than another one ? The throughput with a no-op appender is maybe not the best metric: in real applications, there will be real filters + appenders that will not support such a throughput, so we are mesuring something that does not exist in the real world. In my opinion, there are 2 usecases where an async appender is useful : - when we need to increase the maximal throughput of the underlying appender thanks to batching (cf RandomAccessFile appender with immediateFlush = false). The queue implementation has few impact on this - when we need to keep latency low in the application thread for a realistic load (i.e we don't expect the queue to be always nearly full, if this does happen, then another part of the system need improvements) As Remko said, log4J is used in a lot of situations that we cannot control. And saying something like "you shouldn't have more active threads than cores in your app" is not a solution, if the server is used for other apps it will not be enough. Being lock-free on the producer side (as soon as the queue is not full) is a mandatory characteristic. The intrisinc latency of the queue is often not the main characterisic : there are lot's of things done in the main thread before enqueuing, especially with parametrized messages. The call to the logger can take a few microseconds, whereas any reasonable high-performance collection will enqueue in something like 100ns under low contention (and the saturation rate of the underlying appender will often happen way before achieving high contention on the queue). > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, > log4j2-1430-jctools-tmp-patch.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s >
[jira] [Updated] (LOG4J2-1441) Asynchronous logging + immediateFlush=false appenders : potential data retention on end of batch event
[ https://issues.apache.org/jira/browse/LOG4J2-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Maire updated LOG4J2-1441: -- Attachment: example_LOG4J2-1441.zip flush_fail_1441.zip > Asynchronous logging + immediateFlush=false appenders : potential data > retention on end of batch event > -- > > Key: LOG4J2-1441 > URL: https://issues.apache.org/jira/browse/LOG4J2-1441 > Project: Log4j 2 > Issue Type: Bug >Affects Versions: 2.6 >Reporter: Anthony Maire > Attachments: example_LOG4J2-1441.zip, flush_fail_1441.zip > > > When using asynchronous logging, suggestion is made to use > immediateFlush=false for file-based appenders and rely on the end of batch > detection to perform the flush, which is supposed to happen in a timely manner > As per AsyncLogger JavaDoc > {quote} > For best performance, use AsyncLogger with the RandomAccessFileAppender or > RollingRandomAccessFileAppender, with immediateFlush=false > {quote} > As per File/RandomAccessFile appenders doc on the site > {quote} > Flushing after every write is only useful when using this appender with > synchronous loggers. Asynchronous loggers and appenders will automatically > flush at the end of a batch of events, even if immediateFlush is set to > false. This also guarantees the data is written to disk but is more efficient. > {quote} > However if there are multiple appenders for the same asynchronous mechanism > (either multiple appenderRef for an AsyncAppender or multiple AsyncLoggers > since they share the same disruptor instance), the last event of the batch > will only flush the appenders that are actually logging it, and data on other > appenders won't be flushed. > I made a small example using 2 asynchronous loggers, same kind of issue > should occur with several appenders linked to the same AsyncAppender when the > last event is filtered out because of it's level or a custom filter. > - 10 second after the start, all events are processed but the main thread > will not exit because of an infinite loop > - 2 events where processed by root logger and correctly logged in the console > appender > - only one event has been flushed to the file (the file appender is attached > to root logger too) > - I made a heap dump on the JVM, it shows that the RandomAccessFileManager > internal byte buffer has pending data ( byteBuffer.position=61) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Commented] (LOG4J2-1441) Asynchronous logging + immediateFlush=false appenders : potential data retention on end of batch event
[ https://issues.apache.org/jira/browse/LOG4J2-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15339598#comment-15339598 ] Anthony Maire commented on LOG4J2-1441: --- I'm not sure I know the code base well enough to provide a clean solution to fix this, but I can try :) > Asynchronous logging + immediateFlush=false appenders : potential data > retention on end of batch event > -- > > Key: LOG4J2-1441 > URL: https://issues.apache.org/jira/browse/LOG4J2-1441 > Project: Log4j 2 > Issue Type: Bug >Affects Versions: 2.6 >Reporter: Anthony Maire > Attachments: example_LOG4J2-1441.zip, flush_fail_1441.zip > > > When using asynchronous logging, suggestion is made to use > immediateFlush=false for file-based appenders and rely on the end of batch > detection to perform the flush, which is supposed to happen in a timely manner > As per AsyncLogger JavaDoc > {quote} > For best performance, use AsyncLogger with the RandomAccessFileAppender or > RollingRandomAccessFileAppender, with immediateFlush=false > {quote} > As per File/RandomAccessFile appenders doc on the site > {quote} > Flushing after every write is only useful when using this appender with > synchronous loggers. Asynchronous loggers and appenders will automatically > flush at the end of a batch of events, even if immediateFlush is set to > false. This also guarantees the data is written to disk but is more efficient. > {quote} > However if there are multiple appenders for the same asynchronous mechanism > (either multiple appenderRef for an AsyncAppender or multiple AsyncLoggers > since they share the same disruptor instance), the last event of the batch > will only flush the appenders that are actually logging it, and data on other > appenders won't be flushed. > I made a small example using 2 asynchronous loggers, same kind of issue > should occur with several appenders linked to the same AsyncAppender when the > last event is filtered out because of it's level or a custom filter. > - 10 second after the start, all events are processed but the main thread > will not exit because of an infinite loop > - 2 events where processed by root logger and correctly logged in the console > appender > - only one event has been flushed to the file (the file appender is attached > to root logger too) > - I made a heap dump on the JVM, it shows that the RandomAccessFileManager > internal byte buffer has pending data ( byteBuffer.position=61) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Created] (LOG4J2-1441) Asynchronous logging + immediateFlush=false appenders : potential data retention on end of batch event
Anthony Maire created LOG4J2-1441: - Summary: Asynchronous logging + immediateFlush=false appenders : potential data retention on end of batch event Key: LOG4J2-1441 URL: https://issues.apache.org/jira/browse/LOG4J2-1441 Project: Log4j 2 Issue Type: Bug Affects Versions: 2.6 Reporter: Anthony Maire When using asynchronous logging, suggestion is made to use immediateFlush=false for file-based appenders and rely on the end of batch detection to perform the flush, which is supposed to happen in a timely manner As per AsyncLogger JavaDoc {quote} For best performance, use AsyncLogger with the RandomAccessFileAppender or RollingRandomAccessFileAppender, with immediateFlush=false {quote} As per File/RandomAccessFile appenders doc on the site {quote} Flushing after every write is only useful when using this appender with synchronous loggers. Asynchronous loggers and appenders will automatically flush at the end of a batch of events, even if immediateFlush is set to false. This also guarantees the data is written to disk but is more efficient. {quote} However if there are multiple appenders for the same asynchronous mechanism (either multiple appenderRef for an AsyncAppender or multiple AsyncLoggers since they share the same disruptor instance), the last event of the batch will only flush the appenders that are actually logging it, and data on other appenders won't be flushed. I made a small example using 2 asynchronous loggers, same kind of issue should occur with several appenders linked to the same AsyncAppender when the last event is filtered out because of it's level or a custom filter. - 10 second after the start, all events are processed but the main thread will not exit because of an infinite loop - 2 events where processed by root logger and correctly logged in the console appender - only one event has been flushed to the file (the file appender is attached to root logger too) - I made a heap dump on the JVM, it shows that the RandomAccessFileManager internal byte buffer has pending data ( byteBuffer.position=61) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338834#comment-15338834 ] Anthony Maire commented on LOG4J2-1430: --- In theory you are perfectly right, but - Most of the JVM currently in the market does support Unsafe, especially all the JVM that are usually used in the high-performance community - The Unsafe features that are required should become standard in JDK 9 (http://openjdk.java.net/jeps/193), so there will be easy to make this queue implementation standard in the near future. - If you would like the JCTool queue to become the standard for AsyncAppender but this dependency to Unsafe is a blocking issue, I can write a portable version without Unsafe so that you can test it and make a comparison. It should be a little less fast than the original version but it should keep the same kind of characteristics under heavy contention. Just let me know if you're interested in this :) > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, > log4j2-1430-jctools-tmp-patch.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 6193288.988 ± 93545.144 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 5715621.712 ± 131878.581 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 5745187.005 ± 213854.016 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 5307137.396 ± 88135.709 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 4953015.419 ± 72100.403 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 4833836.418 ± 52919.314 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 4353791.507 ± 79047.812 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 4136761.624 ± 67804.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 6719456.722 ± 187433.301 ops/s > {code} > h3. AsyncLogger > {code} > Benchmark
[jira] [Updated] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Maire updated LOG4J2-1430: -- Attachment: log4j2-1430-jctools-tmp-patch.txt > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt, > log4j2-1430-jctools-tmp-patch.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 6193288.988 ± 93545.144 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 5715621.712 ± 131878.581 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 5745187.005 ± 213854.016 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 5307137.396 ± 88135.709 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 4953015.419 ± 72100.403 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 4833836.418 ± 52919.314 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 4353791.507 ± 79047.812 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 4136761.624 ± 67804.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 6719456.722 ± 187433.301 ops/s > {code} > h3. AsyncLogger > {code} > BenchmarkMode Samples > ScoreError Units > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput10Paramsthrpt 20 > 5075883.371 ± 180465.316 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput11Paramsthrpt 20 > 4867362.030 ± 193909.465 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput1Param thrpt 20 > 10294733.024 ± 226536.965 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput2Params thrpt 20 > 9021650.667 ± 351102.255 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput3Params thrpt 20 > 8079337.905 ± 115824.975 ops/s > o.a.l.l.p.j.AsyncLoggersBenchmark.throughput4Params thrpt 20 > 7347356.788 ± 66598.738 ops/s >
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338648#comment-15338648 ] Anthony Maire commented on LOG4J2-1430: --- I will attach my patch right after posting this comment. This is clearly a quick & dirty patch to show that this "DisruptorBlockingQueue" is not so performant compared to something that is way closer to current "state of the art". The way that the JCTool queue is converted to a BlockingQueue does matter (as the choice of the wait strategy does matter for the "real" disruptor). I removed the LinkedTranferQueue when testing since I had some OutOfMemoryError with it and hadn't the time to investigate why the JCTool queue is free of the caveats than I spotted with the Conversant Disruptor. The main issue when dealing with multi-producer scenario is that you have 2 orders to consider : - the order of slots claiming for producers - the order of data availability. Both queues solve this issue differently : - Conversant disruptor has an AtomicLong for each order, and CAS one of them with the value of the other => it forces threads to publish data in the same orders slot where clamed, which doesn't scale at all with lots of producer. It is not a lock-free algorithm since a frozen thread can prevent the whole system to make progress - JCTool queue has a single "Atomic" variable (although it's a volatile long, Unsafe is used to add the semantics of an AtomicLong). Elements are stored using an ordered store which is a pretty cheap operation, and the consumer will see published elements as soon as they are available. Both orders are independant, so the only contention between producer is the CAS loop => if a producer is temporarily frozen, others producers can make progress (until the queue is full of course) => It scales way better About others caveat : - JCTool is safe on 32 bits architectures - Padding is inheritance-based so I'm sure it does work on current JVMs - About the isEmpy potential issue, I have to think a little more about it to be sure, but I think it's OK for both Conversant and JCTools queue, since isEmpty() will always be called by the singe consumer threads so there shouldn't be any issue > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ±
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338427#comment-15338427 ] Anthony Maire commented on LOG4J2-1430: --- I made some tests with the branch to compare Conversant DisruptorBlockingQueue and the MpscArrayQueue from JCTools ( https://github.com/JCTools/JCTools ). Since the JCTool queue isn't a BlockingQueue, I extended it to add the BlockingQueue methods, using LockSupport.parkNanos(1L) when waiting was needed I first ran the JMH benchmark on my machine (Windows 7 on core i5-2500K, 4 cores, no HT) with 8 and 16 threads and there were no noticeable difference between both alternative implementations Then since the LockSupport.parkNanos(1L) behavior is really dependant on the OS (cf http://vanillajava.blogspot.fr/2012/04/yield-sleep0-wait01-and-parknanos1.html ), I made some test on a Linux server : CentOS 6.7 with dual Xeon E5-2643 v3 (24 hyperthreads for the whole machine), the parkNanos pause was around 50µs instead of 1ms on windows There was no noticeable difference up to 8 threads, when the number of thread increase, the DisruptorBlockingQueue was performing bad, and has no clear advantage on ArrayBlockingQueue for 36 threads. the MPSC queue has a constant throughput Here are the rough figure I had in the following order ABQ, Conversant Disruptor and JCTools MPSC : {noformat} 1 thread : 925k 3M 3.15M 8 threads : 600k 2.95M 3.15M 12 threads : 550k 2.2M 3.10M 16 threads : 500K 1.9M 3.6M 24 threads : 600K 650K 3.45M 36 threads : 530K 600K 3.5M {noformat} I assume than the high pause time from parkNanos on windows (used by conversant wait strategy too) was artificially reducing contention between producers and allow it to perform pretty well. When I compare latency profiles of both implementation (with JMH "SampleTime" mode) for 8 threads, JCTool perform better at higher percentile (the 99.% cannot be trust because the number of samples is way too low) {noformat} Conversant Disruptor results : Result: 2761.833 ±(99.9%) 26.908 ns/op [Average] Statistics: (min, avg, max) = (344.000, 2761.833, 16039936.000), stdev = 11613.072 Confidence interval (99.9%): [2734.925, 2788.741] Samples, N = 2016821 mean = 2761.833 ±(99.9%) 26.908 ns/op min =344.000 ns/op p( 0.) =344.000 ns/op p(50.) = 2148.000 ns/op p(90.) = 5024.000 ns/op p(95.) = 6728.000 ns/op p(99.) = 11440.000 ns/op p(99.9000) = 18912.000 ns/op p(99.9900) = 27328.000 ns/op p(99.9990) = 35509.234 ns/op p(99.) = 2346156.268 ns/op max = 16039936.000 ns/op JCTools MPSC results Result: 3115.571 ±(99.9%) 23.081 ns/op [Average] Statistics: (min, avg, max) = (325.000, 3115.571, 6905856.000), stdev = 9438.165 Confidence interval (99.9%): [3092.490, 3138.651] Samples, N = 1810522 mean = 3115.571 ±(99.9%) 23.081 ns/op min =325.000 ns/op p( 0.) =325.000 ns/op p(50.) = 2984.000 ns/op p(90.) = 4352.000 ns/op p(95.) = 4816.000 ns/op p(99.) = 5736.000 ns/op p(99.9000) = 6872.000 ns/op p(99.9900) = 7976.000 ns/op p(99.9990) = 10652.653 ns/op p(99.) = 6872656.978 ns/op max = 6905856.000 ns/op {noformat} Interesting fact: with 16 to 36 threads, top reported a cpu usage for the JCtool queue that doesn't exceed 1200% (when Conversant queue was using the full CPU), so half of the cpu cycles of the machine where not used. I think there is something that can be improved, probably around my BlockingQueue quick and dirty implementation. Anyway, I'm really not sure that using Conversant DisruptorBlockingQueue is a good alternative to ABQ. > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > Attachments: AsyncAppenderPerf01.txt > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s >
[jira] [Commented] (LOG4J2-1438) Add method getParameter() to ObjectMessage (and ReusableObjectMessage)
[ https://issues.apache.org/jira/browse/LOG4J2-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338406#comment-15338406 ] Anthony Maire commented on LOG4J2-1438: --- I think that having getParametersCount() + getParameter(int) would be better: it may be useful to write some gc-free custom filters that need access to parameters for instance > Add method getParameter() to ObjectMessage (and ReusableObjectMessage) > -- > > Key: LOG4J2-1438 > URL: https://issues.apache.org/jira/browse/LOG4J2-1438 > Project: Log4j 2 > Issue Type: Improvement > Components: API >Affects Versions: 2.6, 2.6.1 >Reporter: Remko Popma > Fix For: 2.7 > > > While analysing LOG4J2-1397 I found it is currently not possible for client > code to get a reference to the Object in the ObjectMessage without creating a > temporary (single-element) array (returned from {{getParameters()}}. > This ticket proposes to add a method to the ObjectMessage and > ReusableObjectMessage interface that gives direct access to the object > parameter. > I'm flexible with regards to the name of this new method. I thought of > {{getParameter()}} but an alternative could be {{getObject()}} or something > else. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Commented] (LOG4J2-1430) Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender
[ https://issues.apache.org/jira/browse/LOG4J2-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337649#comment-15337649 ] Anthony Maire commented on LOG4J2-1430: --- There was a thread on the mechanical sympathy group on this queue : https://groups.google.com/forum/#!searchin/mechanical-sympathy/conversant/mechanical-sympathy/c5x0c2Zsfpc/BmcnRihGFAAJ I had a very quick look on the current code, and the same issues seems to be still present: - contention between producers, will not scale if there are more producers than number of cores. That's basically the difference that there was between disruptor 3.x and earlier version, more details in the second half of this conference from Michael Barker (LMAX) : https://www.youtube.com/watch?v=VBnLW9mKMh4 - Potentially broken on 32 bits system (write to non-volatile long are not guaranteed to be atomic) - Not sure that the padding mechanism used in their code is working, the JVM can re-order fields. The most secure way to pad is to use inheritance More over, due to the asynchronous nature of this kind of queue, i'm not sure that we will always detect when the queue is empty, which is critical since the isEmpty() method is called to detect if flush are required with immediateFlush = false. Further investigation are needed on this subject > Add optional support for Conversant DisruptorBlockingQueue in AsyncAppender > --- > > Key: LOG4J2-1430 > URL: https://issues.apache.org/jira/browse/LOG4J2-1430 > Project: Log4j 2 > Issue Type: New Feature > Components: Appenders >Affects Versions: 2.6.1 >Reporter: Matt Sicker >Assignee: Matt Sicker > Fix For: 2.7 > > > [Conversant Disruptor|https://github.com/conversant/disruptor] works as an > implementation of BlockingQueue that is much faster than ArrayBlockingQueue. > I did some benchmarks earlier and found it to be a bit faster: > h3. AsyncAppender/ArrayBlockingQueue > {code} > Benchmark Mode Samples >Score Error Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 1101267.173 ± 17583.204 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 1128269.255 ± 12188.910 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 1525470.805 ± 56515.933 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 1789434.196 ± 42733.475 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 1803276.278 ± 34938.176 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 1468550.776 ± 26402.286 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 1322304.349 ± 22417.997 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 1179756.489 ± 16502.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 1324660.677 ± 18893.944 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 1309365.962 ± 19602.489 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput9Params thrpt 20 > 1422144.180 ± 20815.042 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughputSimple thrpt 20 > 1247862.372 ± 18300.764 ops/s > {code} > h3. AsyncAppender/DisruptorBlockingQueue > {code} > Benchmark Mode Samples >ScoreError Units > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput10Paramsthrpt 20 > 3704735.586 ± 59766.253 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput11Paramsthrpt 20 > 3622175.410 ± 31975.353 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput1Param thrpt 20 > 6862480.428 ± 121473.276 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput2Params thrpt 20 > 6193288.988 ± 93545.144 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput3Params thrpt 20 > 5715621.712 ± 131878.581 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput4Params thrpt 20 > 5745187.005 ± 213854.016 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput5Params thrpt 20 > 5307137.396 ± 88135.709 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput6Params thrpt 20 > 4953015.419 ± 72100.403 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput7Params thrpt 20 > 4833836.418 ± 52919.314 ops/s > o.a.l.l.p.j.AsyncAppenderLog4j2Benchmark.throughput8Params thrpt 20 > 4353791.507 ± 79047.812 ops/s >
[jira] [Closed] (LOG4J2-1415) Avoid calling toString() on auto-boxed primitive message parameters
[ https://issues.apache.org/jira/browse/LOG4J2-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Maire closed LOG4J2-1415. - tested on master, it works as expected > Avoid calling toString() on auto-boxed primitive message parameters > --- > > Key: LOG4J2-1415 > URL: https://issues.apache.org/jira/browse/LOG4J2-1415 > Project: Log4j 2 > Issue Type: Improvement >Affects Versions: 2.6 >Reporter: Anthony Maire >Assignee: Remko Popma >Priority: Minor > Fix For: 2.6.2 > > > When using Log4j 2 through the SLF4J binding, the Unboxer mecanism is not > available and a parameterized message with a primitive type parameter will be > auto-boxed. > Then this boxed value will be formatted by > ParameterFormatter.recursiveDeepToString() and further allocations will happen > To lower allocation rate for SLF4J users, > ParameterFormatter.appendSpecialTypes() should handle boxed primitive types > too. > {code} > private static boolean appendSpecialTypes(final Object o, final StringBuilder > str) { > ... > } else if (o instanceof Long) { > str.append(((Long) o).longValue()); > return true; > } else if (o instanceof Integer) { > str.append(((Integer) o).intValue()); > return true; > } else if (o instanceof Double) { > str.append(((Double) o).doubleValue()); > return true; > } // similarly for float, short, boolean and char. > ... > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Updated] (LOG4J2-1421) Custom plugins packaged in jar files are not loaded by "packages" property mecanism
[ https://issues.apache.org/jira/browse/LOG4J2-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Maire updated LOG4J2-1421: -- Attachment: log4j-plugin-load-issue.zip minimal example attached : when the DefaultFilter.java file is in the eclipse project, the plugin does work. If it is not present and there is the jar file instead, it doesn't work (the application verifies if the DefaultFilter class is available in the classpath) > Custom plugins packaged in jar files are not loaded by "packages" property > mecanism > --- > > Key: LOG4J2-1421 > URL: https://issues.apache.org/jira/browse/LOG4J2-1421 > Project: Log4j 2 > Issue Type: Bug > Components: Plugins >Affects Versions: 2.6 >Reporter: Anthony Maire > Attachments: log4j-plugin-load-issue.zip > > > the "packages" attribute in configuration file works properly to load custom > plugins from Eclipse. However, if the plugin class is packaged alone in a jar > file (without the Log4j2Plugins.dat file in it), the "packages" property load > mecanism is not working. > 2016-06-09 19:40:44,823 main DEBUG Took 0,001222 seconds to load 0 plugins > from package test -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Created] (LOG4J2-1421) Custom plugins packaged in jar files are not loaded by "packages" property mecanism
Anthony Maire created LOG4J2-1421: - Summary: Custom plugins packaged in jar files are not loaded by "packages" property mecanism Key: LOG4J2-1421 URL: https://issues.apache.org/jira/browse/LOG4J2-1421 Project: Log4j 2 Issue Type: Bug Components: Plugins Affects Versions: 2.6 Reporter: Anthony Maire the "packages" attribute in configuration file works properly to load custom plugins from Eclipse. However, if the plugin class is packaged alone in a jar file (without the Log4j2Plugins.dat file in it), the "packages" property load mecanism is not working. 2016-06-09 19:40:44,823 main DEBUG Took 0,001222 seconds to load 0 plugins from package test -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Commented] (LOG4J2-1397) Support ByteBufferLayout
[ https://issues.apache.org/jira/browse/LOG4J2-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322196#comment-15322196 ] Anthony Maire commented on LOG4J2-1397: --- I agree that this is a narrow use case Another possible approach is to let the user inject it's implementation of the ringbuffer slot factory (and provide the method to transfer data from the application thread to the ringbuffer slot, then from the slot to the background thread). Does it seems OK to you ? Basically all use cases where the user want to access parameters at the appender level (background formatting, custom filters ...) are potentially broken with async logger if these parameters are mutable. And it will almost always be narrow use cases. I think that it can be a good solution to let the user a way to copy the piece of data that need to be copied for its use case > Support ByteBufferLayout > > > Key: LOG4J2-1397 > URL: https://issues.apache.org/jira/browse/LOG4J2-1397 > Project: Log4j 2 > Issue Type: New Feature > Components: Layouts >Affects Versions: 2.6 >Reporter: Remko Popma > > As requested by Kirk Pepperdine on the Mechanical Sympathy [mailing > list|https://groups.google.com/d/msg/mechanical-sympathy/klefjRqlpQE/-1WF59IGAwAJ]: > Support a new Layout that takes ObjectMessages containing a ByteBuffer > payload and write them to the appender without modification. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Commented] (LOG4J2-1397) Support ByteBufferLayout
[ https://issues.apache.org/jira/browse/LOG4J2-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322161#comment-15322161 ] Anthony Maire commented on LOG4J2-1397: --- It just doesn't work because of different orderings in the preallocated array and the ringbuffer. Let's take a simple example: - 2 threads - a ringbuffer with 2 slots So we need a preallocated array with 4 buffers - Thread A get sequence #1 from the atomic counter, then get interrupted by the system scheduler before doing the copy - Thread B is still logging, it get sequence #2, #3, #4 from the counter, pushed 2 events to the async logger (so the ringbuffer is full) and is blocked trying to push the third - Async Logger processed the first event submitted by thread B - Thread B can push another event (with preallocated buffer sequence #4) - Thread B wants to log another event, it get sequence #5 for the preallocated array - Thread A is resumed by the scheduler => Data race, both thread will copy data in the same preAllocated buffer > Support ByteBufferLayout > > > Key: LOG4J2-1397 > URL: https://issues.apache.org/jira/browse/LOG4J2-1397 > Project: Log4j 2 > Issue Type: New Feature > Components: Layouts >Affects Versions: 2.6 >Reporter: Remko Popma > > As requested by Kirk Pepperdine on the Mechanical Sympathy [mailing > list|https://groups.google.com/d/msg/mechanical-sympathy/klefjRqlpQE/-1WF59IGAwAJ]: > Support a new Layout that takes ObjectMessages containing a ByteBuffer > payload and write them to the appender without modification. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Commented] (LOG4J2-1397) Support ByteBufferLayout
[ https://issues.apache.org/jira/browse/LOG4J2-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322038#comment-15322038 ] Anthony Maire commented on LOG4J2-1397: --- At first sight, this approach may work correctly in a single-threaded context, but I think you need 1 more preallocated buffer than the ringbuffer size (so need to use a modulo instead of a binary mask). However in our application, this code can be invoked by several threads. Since the order in which buffers are obtained from the pre-allocated pool might not be the same as the order they are in the ringbuffer, the application will then need a callback to know that a buffer has been released (i.e logged to disk). > Support ByteBufferLayout > > > Key: LOG4J2-1397 > URL: https://issues.apache.org/jira/browse/LOG4J2-1397 > Project: Log4j 2 > Issue Type: New Feature > Components: Layouts >Affects Versions: 2.6 >Reporter: Remko Popma > > As requested by Kirk Pepperdine on the Mechanical Sympathy [mailing > list|https://groups.google.com/d/msg/mechanical-sympathy/klefjRqlpQE/-1WF59IGAwAJ]: > Support a new Layout that takes ObjectMessages containing a ByteBuffer > payload and write them to the appender without modification. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Commented] (LOG4J2-1397) Support ByteBufferLayout
[ https://issues.apache.org/jira/browse/LOG4J2-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321285#comment-15321285 ] Anthony Maire commented on LOG4J2-1397: --- Basically, a layout that have a dependency on a mutable object will not be compatible with asynchronous logger/appender. A copy of the content has to be made. And to avoid excessive allocation, the copy should be reusable. If we want to safely use the buffer content without excessive allocation, we already have some solutions that use String (such as wrapping it in a StringBuilderFormattable that decode the buffer content into a StringBuilder). But this is slow (it takes a few extra microseconds for a message of a few hundreds bytes compared to bulk byte copy) If we want to use a ByteBufferLayout, that means that these few microseconds matters, so it's nearly 100% sure that asynchronous logging will be used. Without some byte copy support provided by the async logger, I'm afraid that this layout can't be really useful. > Support ByteBufferLayout > > > Key: LOG4J2-1397 > URL: https://issues.apache.org/jira/browse/LOG4J2-1397 > Project: Log4j 2 > Issue Type: New Feature > Components: Layouts >Affects Versions: 2.6 >Reporter: Remko Popma > > As requested by Kirk Pepperdine on the Mechanical Sympathy [mailing > list|https://groups.google.com/d/msg/mechanical-sympathy/klefjRqlpQE/-1WF59IGAwAJ]: > Support a new Layout that takes ObjectMessages containing a ByteBuffer > payload and write them to the appender without modification. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Commented] (LOG4J2-1397) Support ByteBufferLayout
[ https://issues.apache.org/jira/browse/LOG4J2-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320175#comment-15320175 ] Anthony Maire commented on LOG4J2-1397: --- Here is my usecase: - The application is receiving FIX messages (Ascii-based protocol used in financial services) at a rate from a few thousands to a hundred thousands messages per second - The application need to log every message received (mostly for legal and auditing reasons) - Latency is critical, so logging need to be asynchronous since the jitter induced by kernel write back can be very high at this rate Our (simplified) architecture is something like this : {code} public class Connection { private SocketChannel channel; private ByteBuffer buffer; private MessageListener listener; ... //called when the main loop involving a NIO Selector told us that there is data to process for this socket public void onDataReady(){ channel.read(buffer); if(containsCompleteMessage(buffer)){ listener.onIncomingMessage(this, buffer); } } public String toString(){ return "Connection [remoteAddress=" + channel.getRemoteAddress() + "]"; } } public class MessageListener(){ private static final Logger messageLog = LoggerFactory.getLogger("message.logger"); private static final Logger log = LoggerFactory.getLogger(MessageListener.class); private FIXMessageCodec codec; ... public void onIncomingMessage(ByteBuffer buffer, Connection connection){ messageLog.info("Message on {}: IN {}", connection, buffer); FIXMessage businessMessage = codec.decode(buffer); buffer.clear(); log.debug("start business processing for {}", businessMessage); doBusinessProcessing(businessMessage); } } {code} So we have a dedicated logger for these raw messages that we need to store on disk. Currently we are using logback through SLF4J, and the first line of onIncomingMessage is replaced by something like this: {code} if(messageLog.isInfoEnabled()){ ByteBuffer copy = ByteBuffer.allocate(buffer.remaining()); copy.put(buffer); buffer.rewind(); copy.flip(); messageLog.info("Message on {}: IN {}", connection, copy); } {code} And we have some custom layout that will process the content of the ByteBuffer in the background thread of the AsyncAppender. Making this copy is not currently an issue since logback is already allocating roughly 20 times the buffer's size per event, but since we are now looking to use log4j for it's low allocation rate, we want to avoid that garbage. > Support ByteBufferLayout > > > Key: LOG4J2-1397 > URL: https://issues.apache.org/jira/browse/LOG4J2-1397 > Project: Log4j 2 > Issue Type: New Feature > Components: Layouts >Affects Versions: 2.6 >Reporter: Remko Popma > > As requested by Kirk Pepperdine on the Mechanical Sympathy [mailing > list|https://groups.google.com/d/msg/mechanical-sympathy/klefjRqlpQE/-1WF59IGAwAJ]: > Support a new Layout that takes ObjectMessages containing a ByteBuffer > payload and write them to the appender without modification. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Commented] (LOG4J2-1397) Support ByteBufferLayout
[ https://issues.apache.org/jira/browse/LOG4J2-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320112#comment-15320112 ] Anthony Maire commented on LOG4J2-1397: --- The issue with TL is that we need to pass the copy between 2 threads through the disruptor. That means that there should be some support for swapping cached buffers in MutableLogEvent for example (same idea that the parameter swap that was made for LOG4J2-1342). Without this kind of support, we need either to create a defensive copy (and create garbage) or to have a shared pool of ByteBuffer (and create synchronization performance penalty) > Support ByteBufferLayout > > > Key: LOG4J2-1397 > URL: https://issues.apache.org/jira/browse/LOG4J2-1397 > Project: Log4j 2 > Issue Type: New Feature > Components: Layouts >Affects Versions: 2.6 >Reporter: Remko Popma > > As requested by Kirk Pepperdine on the Mechanical Sympathy [mailing > list|https://groups.google.com/d/msg/mechanical-sympathy/klefjRqlpQE/-1WF59IGAwAJ]: > Support a new Layout that takes ObjectMessages containing a ByteBuffer > payload and write them to the appender without modification. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org
[jira] [Commented] (LOG4J2-1415) Avoid calling toString() on auto-boxed primitive message parameters
[ https://issues.apache.org/jira/browse/LOG4J2-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318186#comment-15318186 ] Anthony Maire commented on LOG4J2-1415: --- I don't think the way the "switch" is implemented as a major impact on performance I made a small JMH benchmark, running on a Xeon E5-2643 v3 with JVM 1.8.0_92 (same kind of results on my desktop with a core i7 2600) and the cost of instanceof is barely noticeable (and less than the class name switch) {noformat} Benchmark Mode Cnt ScoreError Units SwitchClassTest.referenceWithCast avgt 25 1958.885 ± 7.773 ns/op SwitchClassTest.referenceWithToString avgt 25 2050.443 ± 10.347 ns/op SwitchClassTest.useClassName avgt 25 2015.931 ± 10.190 ns/op SwitchClassTest.useInstanceOf avgt 25 1978.622 ± 10.030 ns/op {noformat} Here is the benchmark code: {code} import java.util.concurrent.TimeUnit; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.BenchmarkMode; import org.openjdk.jmh.annotations.Fork; import org.openjdk.jmh.annotations.Measurement; import org.openjdk.jmh.annotations.Mode; import org.openjdk.jmh.annotations.OutputTimeUnit; import org.openjdk.jmh.annotations.Scope; import org.openjdk.jmh.annotations.State; import org.openjdk.jmh.annotations.Warmup; @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) @Warmup(iterations=5) @Measurement(iterations=5) @Fork(value=5) public class SwitchClassTest { private static final String STRING_CLASS_NAME = "java.lang.String"; private static final String CHARACTER_CLASS_NAME = "java.lang.Character"; private static final String LONG_CLASS_NAME = "java.lang.Long"; private static final String INTEGER_CLASS_NAME = "java.lang.Integer"; private static final String BOOLEAN_CLASS_NAME = "java.lang.Boolean"; private static final String DOUBLE_CLASS_NAME = "java.lang.Double"; private static final String FLOAT_CLASS_NAME = "java.lang.Float"; private Object[] params = new Object[]{"abc", 'a', Long.MAX_VALUE, Integer.MAX_VALUE, Double.MAX_VALUE, Float.MAX_VALUE, Boolean.TRUE}; private StringBuilder sb = new StringBuilder(); @Benchmark public int referenceWithToString() { for (Object param : params) { sb.append(param); } int size = sb.length(); sb.setLength(0); return size; } @Benchmark public int referenceWithCast() { sb.append(params[0]); sb.append(((Character) params[1]).charValue()); sb.append(((Long) params[2]).longValue()); sb.append(((Integer) params[3]).intValue()); sb.append(((Double) params[4]).doubleValue()); sb.append(((Float) params[5]).floatValue()); sb.append(((Boolean) params[6]).booleanValue()); int size = sb.length(); sb.setLength(0); return size; } @Benchmark public int useInstanceOf() { for (Object param : params) { if (param instanceof String) { sb.append(param); } else if (param instanceof Character) { sb.append(((Character) param).charValue()); } else if (param instanceof Long) { sb.append(((Long) param).longValue()); } else if (param instanceof Integer) { sb.append(((Integer) param).intValue()); } else if (param instanceof Double) { sb.append(((Double) param).doubleValue()); } else if (param instanceof Float) { sb.append(((Float) param).floatValue()); } else if (param instanceof Boolean) { sb.append(((Boolean) param).booleanValue()); } } int size = sb.length(); sb.setLength(0); return size; } @Benchmark public int useClassName() { for (Object param : params) { String className = param.getClass().getName(); switch (className){ case STRING_CLASS_NAME: sb.append(param); break; case CHARACTER_CLASS_NAME: sb.append(((Character) param).charValue()); break;
[jira] [Created] (LOG4J2-1415) When running in garbage-free mode, boxed primitive types should be formatted without allocating
Anthony Maire created LOG4J2-1415: - Summary: When running in garbage-free mode, boxed primitive types should be formatted without allocating Key: LOG4J2-1415 URL: https://issues.apache.org/jira/browse/LOG4J2-1415 Project: Log4j 2 Issue Type: Improvement Affects Versions: 2.6 Reporter: Anthony Maire Priority: Minor When using Log4j 2 through the SLF4J binding, the Unboxer mecanism is not available and a parameterized message with a primitive type parameter will be auto-boxed. Then this boxed value will be formatted by ParameterFormatter.recursiveDeepToString() and further allocations will happen To lower allocation rate for SLF4J users, ParameterFormatter.appendSpecialTypes() should handle boxed primitive types too. {code} private static boolean appendSpecialTypes(final Object o, final StringBuilder str) { ... } else if (o instanceof Long) { str.append(((Long) o).longValue()); return true; } else if (o instanceof Integer) { str.append(((Integer) o).intValue()); return true; } else if (o instanceof Double) { str.append(((Double) o).doubleValue()); return true; } // similarly for float, short, boolean and char. ... } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org