[jira] [Commented] (CASSANDRA-14318) Debug logging can create massive performance issues
[ https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415923#comment-16415923 ] Alexander Dejanovski commented on CASSANDRA-14318: -- For the record, the same tests on 3.11.2 didn't show any notable performance difference between debug on and off : Cassandra 3.11.2 debug on : {noformat} Results: Op rate : 18 777 op/s [read_event_1: 3 165 op/s, read_event_2: 3 109 op/s, read_event_3: 12 562 op/s] Partition rate : 6 215 pk/s [read_event_1: 3 165 pk/s, read_event_2: 3 109 pk/s, read_event_3: 0 pk/s] Row rate : 6 215 row/s [read_event_1: 3 165 row/s, read_event_2: 3 109 row/s, read_event_3: 0 row/s] Latency mean : 6,7 ms [read_event_1: 6,7 ms, read_event_2: 6,7 ms, read_event_3: 6,6 ms] Latency median : 5,0 ms [read_event_1: 5,0 ms, read_event_2: 5,0 ms, read_event_3: 4,9 ms] Latency 95th percentile : 15,6 ms [read_event_1: 15,5 ms, read_event_2: 15,9 ms, read_event_3: 15,5 ms] Latency 99th percentile : 43,3 ms [read_event_1: 42,7 ms, read_event_2: 44,2 ms, read_event_3: 43,2 ms] Latency 99.9th percentile : 82,0 ms [read_event_1: 80,3 ms, read_event_2: 82,4 ms, read_event_3: 82,1 ms] Latency max : 272,4 ms [read_event_1: 272,4 ms, read_event_2: 268,7 ms, read_event_3: 245,1 ms] Total partitions : 330 970 [read_event_1: 165 386, read_event_2: 165 584, read_event_3: 0] Total errors : 0 [read_event_1: 0, read_event_2: 0, read_event_3: 0] Total GC count : 42 Total GC memory : 13,102 GiB Total GC time : 1,8 seconds Avg GC time : 42,4 ms StdDev GC time : 1,3 ms Total operation time : 00:00:53{noformat} Cassandra 3.11.2 debug off : {noformat} Results: Op rate : 18 853 op/s [read_event_1: 3 138 op/s, read_event_2: 3 137 op/s, read_event_3: 12 578 op/s] Partition rate : 6 275 pk/s [read_event_1: 3 138 pk/s, read_event_2: 3 137 pk/s, read_event_3: 0 pk/s] Row rate : 6 275 row/s [read_event_1: 3 138 row/s, read_event_2: 3 137 row/s, read_event_3: 0 row/s] Latency mean : 6,7 ms [read_event_1: 6,7 ms, read_event_2: 6,7 ms, read_event_3: 6,7 ms] Latency median : 5,0 ms [read_event_1: 5,1 ms, read_event_2: 5,1 ms, read_event_3: 5,0 ms] Latency 95th percentile : 15,5 ms [read_event_1: 15,5 ms, read_event_2: 15,6 ms, read_event_3: 15,4 ms] Latency 99th percentile : 39,9 ms [read_event_1: 41,0 ms, read_event_2: 39,6 ms, read_event_3: 39,6 ms] Latency 99.9th percentile : 73,3 ms [read_event_1: 73,4 ms, read_event_2: 71,6 ms, read_event_3: 73,6 ms] Latency max : 367,0 ms [read_event_1: 240,5 ms, read_event_2: 250,3 ms, read_event_3: 367,0 ms] Total partitions : 332 852 [read_event_1: 166 447, read_event_2: 166 405, read_event_3: 0] Total errors : 0 [read_event_1: 0, read_event_2: 0, read_event_3: 0] Total GC count : 46 Total GC memory : 14,024 GiB Total GC time : 2,0 seconds Avg GC time : 42,7 ms StdDev GC time : 3,9 ms Total operation time : 00:00:53{noformat} The improvement over 2.2 is nice though :) > Debug logging can create massive performance issues > --- > > Key: CASSANDRA-14318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14318 > Project: Cassandra > Issue Type: Bug >Reporter: Alexander Dejanovski >Assignee: Alexander Dejanovski >Priority: Major > Labels: lhf, performance > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: cassandra-2.2-debug.yaml, debuglogging.png, flame22 > nodebug sjk svg.png, flame22-nodebug-sjk.svg, flame22-sjk.svg, > flame_graph_snapshot.png > > > Debug logging can involve in many cases (especially very low latency ones) a > very important overhead on the read path in 2.2 as we've seen when upgrading > clusters from 2.0 to 2.2. > The performance impact was especially noticeable on the client side metrics, > where p99 could go up to 10 times higher, while ClientRequest metrics > recorded by Cassandra didn't show any overhead. > Below shows latencies recorded on the client side with debug logging on > first, and then without it : > !debuglogging.png! > We generated a flame graph before turning off debug logging that shows the > read call stack is dominated by debug logging : > !flame_graph_snapshot.png! > I've attached the original flame graph for exploration. > Once disabled, the new flame graph shows that the read call stack gets > extremely thin, which is further confirmed by client recorded metrics : > !flame22 nodebug sjk svg.png! > The query pager code has been reworked since 3.0 and it looks like > log.debug() calls are gone there, but for 2.2 users and to prevent such > issues to appear with default settings, I really think debug logging should > be disabled by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-ma
[jira] [Commented] (CASSANDRA-14318) Debug logging can create massive performance issues
[ https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415846#comment-16415846 ] Alexander Dejanovski commented on CASSANDRA-14318: -- [~jjirsa]: apparently the ReadCallback class already logs at TRACE and not DEBUG on the latest 2.2. I've created the fix that downgrades debug logging to trace logging in the query pager classes, and here are the results : debug on - no fix : {noformat} Results: op rate : 6681 [read_event_1:1109, read_event_2:1119, read_event_3:4452] partition rate : 6681 [read_event_1:1109, read_event_2:1119, read_event_3:4452] row rate : 6681 [read_event_1:1109, read_event_2:1119, read_event_3:4452] latency mean : 19,1 [read_event_1:15,4, read_event_2:15,4, read_event_3:21,0] latency median : 15,6 [read_event_1:14,2, read_event_2:14,0, read_event_3:16,3] latency 95th percentile : 39,1 [read_event_1:28,4, read_event_2:28,6, read_event_3:44,2] latency 99th percentile : 75,6 [read_event_1:52,9, read_event_2:53,6, read_event_3:87,7] latency 99.9th percentile : 315,7 [read_event_1:101,0, read_event_2:110,1, read_event_3:361,1] latency max : 609,1 [read_event_1:319,6, read_event_2:315,9, read_event_3:609,1] Total partitions : 993050 [read_event_1:164882, read_event_2:166381, read_event_3:661787] Total errors : 0 [read_event_1:0, read_event_2:0, read_event_3:0] total gc count : 189 total gc mb : 56464 total gc time (s) : 7 avg gc time(ms) : 37 stdev gc time(ms) : 8 Total operation time : 00:02:28{noformat} debug off - no fix : {noformat} Results: op rate : 12655 [read_event_1:2141, read_event_2:2093, read_event_3:8422] partition rate : 12655 [read_event_1:2141, read_event_2:2093, read_event_3:8422] row rate : 12655 [read_event_1:2141, read_event_2:2093, read_event_3:8422] latency mean : 10,1 [read_event_1:10,1, read_event_2:10,1, read_event_3:10,1] latency median : 9,2 [read_event_1:9,2, read_event_2:9,2, read_event_3:9,3] latency 95th percentile : 15,2 [read_event_1:15,8, read_event_2:15,9, read_event_3:15,7] latency 99th percentile : 29,3 [read_event_1:44,5, read_event_2:45,1, read_event_3:41,3] latency 99.9th percentile : 52,7 [read_event_1:67,9, read_event_2:66,9, read_event_3:67,1] latency max : 268,0 [read_event_1:257,1, read_event_2:263,3, read_event_3:268,0] Total partitions : 983056 [read_event_1:166311, read_event_2:162570, read_event_3:654175] Total errors : 0 [read_event_1:0, read_event_2:0, read_event_3:0] total gc count : 100 total gc mb : 31529 total gc time (s) : 4 avg gc time(ms) : 37 stdev gc time(ms) : 5 Total operation time : 00:01:17{noformat} debug on - with fix : {noformat} Results: op rate : 12289 [read_event_1:2058, read_event_2:2051, read_event_3:8181] partition rate : 12289 [read_event_1:2058, read_event_2:2051, read_event_3:8181] row rate : 12289 [read_event_1:2058, read_event_2:2051, read_event_3:8181] latency mean : 10,4 [read_event_1:10,4, read_event_2:10,4, read_event_3:10,4] latency median : 9,4 [read_event_1:9,4, read_event_2:9,4, read_event_3:9,4] latency 95th percentile : 16,3 [read_event_1:16,8, read_event_2:17,3, read_event_3:16,2] latency 99th percentile : 36,6 [read_event_1:44,3, read_event_2:46,6, read_event_3:37,2] latency 99.9th percentile : 62,2 [read_event_1:78,0, read_event_2:77,1, read_event_3:80,8] latency max : 251,2 [read_event_1:246,9, read_event_2:249,9, read_event_3:251,2] Total partitions : 100 [read_event_1:167422, read_event_2:166861, read_event_3:665717] Total errors : 0 [read_event_1:0, read_event_2:0, read_event_3:0] total gc count : 102 total gc mb : 31843 total gc time (s) : 4 avg gc time(ms) : 38 stdev gc time(ms) : 6 Total operation time : 00:01:21{noformat} So we have similar performance with debug logging off and with the fix and debug on. The difference in throughput is pretty massive as we roughly get *twice the read throughput* with the fix. Latencies without the fix and with the fix : p95 : 35ms -> 16ms p99 : 75ms -> 36ms I've ran all tests several times, alternating with and without the fix to make sure caches were not making a difference, and results were consistent with what's pasted above. It's been running on a single node using an i3.xlarge instance for Cassandra and another i3.large for running cassandra-stress. *One pretty interesting thing to note* is that when I tested with the predefined mode of cassandra-stress, no paging occurred and the performance difference was not noticeable. This is due to the fact that the predefined mode generates COMPACT STORAGE tables, which involve a different read path (apparently). I think anyone performing benchmarks for Cassandra changes should be aware that the predefined mode isn't relevant and that a user defined test should be used (maybe we should create one that would be used as standard benchmark). Here's the one I used : [^cassandra-2.2-debug.yaml] With the following c
[jira] [Commented] (CASSANDRA-14318) Debug logging can create massive performance issues
[ https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405580#comment-16405580 ] Jeff Jirsa commented on CASSANDRA-14318: If you havent yet started the second run of benchmarks, you may also want to move the debug log statements from ReadCallback.java (one deals with digest mismatch exceptions, forgot what the other is) to trace as well. > Debug logging can create massive performance issues > --- > > Key: CASSANDRA-14318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14318 > Project: Cassandra > Issue Type: Bug >Reporter: Alexander Dejanovski >Assignee: Alexander Dejanovski >Priority: Major > Labels: lhf, performance > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: debuglogging.png, flame22 nodebug sjk svg.png, > flame22-nodebug-sjk.svg, flame22-sjk.svg, flame_graph_snapshot.png > > > Debug logging can involve in many cases (especially very low latency ones) a > very important overhead on the read path in 2.2 as we've seen when upgrading > clusters from 2.0 to 2.2. > The performance impact was especially noticeable on the client side metrics, > where p99 could go up to 10 times higher, while ClientRequest metrics > recorded by Cassandra didn't show any overhead. > Below shows latencies recorded on the client side with debug logging on > first, and then without it : > !debuglogging.png! > We generated a flame graph before turning off debug logging that shows the > read call stack is dominated by debug logging : > !flame_graph_snapshot.png! > I've attached the original flame graph for exploration. > Once disabled, the new flame graph shows that the read call stack gets > extremely thin, which is further confirmed by client recorded metrics : > !flame22 nodebug sjk svg.png! > The query pager code has been reworked since 3.0 and it looks like > log.debug() calls are gone there, but for 2.2 users and to prevent such > issues to appear with default settings, I really think debug logging should > be disabled by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14318) Debug logging can create massive performance issues
[ https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405051#comment-16405051 ] Alexander Dejanovski commented on CASSANDRA-14318: -- Fair point. I'll write the patch and run the benchmarks. > Debug logging can create massive performance issues > --- > > Key: CASSANDRA-14318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14318 > Project: Cassandra > Issue Type: Bug >Reporter: Alexander Dejanovski >Assignee: Alexander Dejanovski >Priority: Major > Labels: lhf, performance > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: debuglogging.png, flame22 nodebug sjk svg.png, > flame22-nodebug-sjk.svg, flame22-sjk.svg, flame_graph_snapshot.png > > > Debug logging can involve in many cases (especially very low latency ones) a > very important overhead on the read path in 2.2 as we've seen when upgrading > clusters from 2.0 to 2.2. > The performance impact was especially noticeable on the client side metrics, > where p99 could go up to 10 times higher, while ClientRequest metrics > recorded by Cassandra didn't show any overhead. > Below shows latencies recorded on the client side with debug logging on > first, and then without it : > !debuglogging.png! > We generated a flame graph before turning off debug logging that shows the > read call stack is dominated by debug logging : > !flame_graph_snapshot.png! > I've attached the original flame graph for exploration. > Once disabled, the new flame graph shows that the read call stack gets > extremely thin, which is further confirmed by client recorded metrics : > !flame22 nodebug sjk svg.png! > The query pager code has been reworked since 3.0 and it looks like > log.debug() calls are gone there, but for 2.2 users and to prevent such > issues to appear with default settings, I really think debug logging should > be disabled by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14318) Debug logging can create massive performance issues
[ https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404814#comment-16404814 ] Jeremiah Jordan commented on CASSANDRA-14318: - No matter what is decided we should move those log messages to TRACE. So I think we can proceed here no matter what. > Debug logging can create massive performance issues > --- > > Key: CASSANDRA-14318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14318 > Project: Cassandra > Issue Type: Bug >Reporter: Alexander Dejanovski >Assignee: Alexander Dejanovski >Priority: Major > Labels: lhf, performance > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: debuglogging.png, flame22 nodebug sjk svg.png, > flame22-nodebug-sjk.svg, flame22-sjk.svg, flame_graph_snapshot.png > > > Debug logging can involve in many cases (especially very low latency ones) a > very important overhead on the read path in 2.2 as we've seen when upgrading > clusters from 2.0 to 2.2. > The performance impact was especially noticeable on the client side metrics, > where p99 could go up to 10 times higher, while ClientRequest metrics > recorded by Cassandra didn't show any overhead. > Below shows latencies recorded on the client side with debug logging on > first, and then without it : > !debuglogging.png! > We generated a flame graph before turning off debug logging that shows the > read call stack is dominated by debug logging : > !flame_graph_snapshot.png! > I've attached the original flame graph for exploration. > Once disabled, the new flame graph shows that the read call stack gets > extremely thin, which is further confirmed by client recorded metrics : > !flame22 nodebug sjk svg.png! > The query pager code has been reworked since 3.0 and it looks like > log.debug() calls are gone there, but for 2.2 users and to prevent such > issues to appear with default settings, I really think debug logging should > be disabled by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14318) Debug logging can create massive performance issues
[ https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404797#comment-16404797 ] Alexander Dejanovski commented on CASSANDRA-14318: -- [~pauloricardomg], thanks for assigning the ticket to me. Whatever the implementation, I'll run new performance tests and generate flame graphs to verify the impact of the changes, no worries. I'll wait a bit for a consensus to come out of the discussion on the dev ML before moving on, as there seem to be some conflicting views on what should be done. > Debug logging can create massive performance issues > --- > > Key: CASSANDRA-14318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14318 > Project: Cassandra > Issue Type: Bug >Reporter: Alexander Dejanovski >Assignee: Alexander Dejanovski >Priority: Major > Labels: lhf, performance > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: debuglogging.png, flame22 nodebug sjk svg.png, > flame22-nodebug-sjk.svg, flame22-sjk.svg, flame_graph_snapshot.png > > > Debug logging can involve in many cases (especially very low latency ones) a > very important overhead on the read path in 2.2 as we've seen when upgrading > clusters from 2.0 to 2.2. > The performance impact was especially noticeable on the client side metrics, > where p99 could go up to 10 times higher, while ClientRequest metrics > recorded by Cassandra didn't show any overhead. > Below shows latencies recorded on the client side with debug logging on > first, and then without it : > !debuglogging.png! > We generated a flame graph before turning off debug logging that shows the > read call stack is dominated by debug logging : > !flame_graph_snapshot.png! > I've attached the original flame graph for exploration. > Once disabled, the new flame graph shows that the read call stack gets > extremely thin, which is further confirmed by client recorded metrics : > !flame22 nodebug sjk svg.png! > The query pager code has been reworked since 3.0 and it looks like > log.debug() calls are gone there, but for 2.2 users and to prevent such > issues to appear with default settings, I really think debug logging should > be disabled by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14318) Debug logging can create massive performance issues
[ https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404134#comment-16404134 ] Paulo Motta commented on CASSANDRA-14318: - Hey Alexander, would you care to re-run the experiments with the query pager DEBUG logs changed to TRACE level to check if debug.log logging overhead will still persist? Thanks! > Debug logging can create massive performance issues > --- > > Key: CASSANDRA-14318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14318 > Project: Cassandra > Issue Type: Bug >Reporter: Alexander Dejanovski >Assignee: Alexander Dejanovski >Priority: Major > Labels: lhf, performance > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: debuglogging.png, flame22 nodebug sjk svg.png, > flame22-nodebug-sjk.svg, flame22-sjk.svg, flame_graph_snapshot.png > > > Debug logging can involve in many cases (especially very low latency ones) a > very important overhead on the read path in 2.2 as we've seen when upgrading > clusters from 2.0 to 2.2. > The performance impact was especially noticeable on the client side metrics, > where p99 could go up to 10 times higher, while ClientRequest metrics > recorded by Cassandra didn't show any overhead. > Below shows latencies recorded on the client side with debug logging on > first, and then without it : > !debuglogging.png! > We generated a flame graph before turning off debug logging that shows the > read call stack is dominated by debug logging : > !flame_graph_snapshot.png! > I've attached the original flame graph for exploration. > Once disabled, the new flame graph shows that the read call stack gets > extremely thin, which is further confirmed by client recorded metrics : > !flame22 nodebug sjk svg.png! > The query pager code has been reworked since 3.0 and it looks like > log.debug() calls are gone there, but for 2.2 users and to prevent such > issues to appear with default settings, I really think debug logging should > be disabled by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14318) Debug logging can create massive performance issues
[ https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403694#comment-16403694 ] Alexander Dejanovski commented on CASSANDRA-14318: -- Fair enough, I'll go down the read and write path to see if there's debug logging in v3 and above, and switch debug to trace in the query pager in 2.2. > Debug logging can create massive performance issues > --- > > Key: CASSANDRA-14318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14318 > Project: Cassandra > Issue Type: Bug >Reporter: Alexander Dejanovski >Priority: Major > Labels: lhf, performance > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: debuglogging.png, flame22 nodebug sjk svg.png, > flame22-nodebug-sjk.svg, flame22-sjk.svg, flame_graph_snapshot.png > > > Debug logging can involve in many cases (especially very low latency ones) a > very important overhead on the read path in 2.2 as we've seen when upgrading > clusters from 2.0 to 2.2. > The performance impact was especially noticeable on the client side metrics, > where p99 could go up to 10 times higher, while ClientRequest metrics > recorded by Cassandra didn't show any overhead. > Below shows latencies recorded on the client side with debug logging on > first, and then without it : > !debuglogging.png! > We generated a flame graph before turning off debug logging that shows the > read call stack is dominated by debug logging : > !flame_graph_snapshot.png! > I've attached the original flame graph for exploration. > Once disabled, the new flame graph shows that the read call stack gets > extremely thin, which is further confirmed by client recorded metrics : > !flame22 nodebug sjk svg.png! > The query pager code has been reworked since 3.0 and it looks like > log.debug() calls are gone there, but for 2.2 users and to prevent such > issues to appear with default settings, I really think debug logging should > be disabled by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14318) Debug logging can create massive performance issues
[ https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403671#comment-16403671 ] Jeremiah Jordan commented on CASSANDRA-14318: - CASSANDRA-10241 creates that debug.log to be a “production” debug log. If there are things being logged at DEBUG which cause performance issues we should disable those or move them to TRACE, not turn off the debug.log. > Debug logging can create massive performance issues > --- > > Key: CASSANDRA-14318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14318 > Project: Cassandra > Issue Type: Bug >Reporter: Alexander Dejanovski >Priority: Major > Labels: lhf, performance > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: debuglogging.png, flame22 nodebug sjk svg.png, > flame22-nodebug-sjk.svg, flame22-sjk.svg, flame_graph_snapshot.png > > > Debug logging can involve in many cases (especially very low latency ones) a > very important overhead on the read path in 2.2 as we've seen when upgrading > clusters from 2.0 to 2.2. > The performance impact was especially noticeable on the client side metrics, > where p99 could go up to 10 times higher, while ClientRequest metrics > recorded by Cassandra didn't show any overhead. > Below shows latencies recorded on the client side with debug logging on > first, and then without it : > !debuglogging.png! > We generated a flame graph before turning off debug logging that shows the > read call stack is dominated by debug logging : > !flame_graph_snapshot.png! > I've attached the original flame graph for exploration. > Once disabled, the new flame graph shows that the read call stack gets > extremely thin, which is further confirmed by client recorded metrics : > !flame22 nodebug sjk svg.png! > The query pager code has been reworked since 3.0 and it looks like > log.debug() calls are gone there, but for 2.2 users and to prevent such > issues to appear with default settings, I really think debug logging should > be disabled by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14318) Debug logging can create massive performance issues
[ https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402526#comment-16402526 ] Vinay Chella commented on CASSANDRA-14318: -- +1 on your perf test results. Patch LGTM. > Debug logging can create massive performance issues > --- > > Key: CASSANDRA-14318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14318 > Project: Cassandra > Issue Type: Bug >Reporter: Alexander Dejanovski >Priority: Major > Labels: performance > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: debuglogging.png, flame22 nodebug sjk svg.png, > flame22-nodebug-sjk.svg, flame22-sjk.svg, flame_graph_snapshot.png > > > Debug logging can involve in many cases (especially very low latency ones) a > very important overhead on the read path in 2.2 as we've seen when upgrading > clusters from 2.0 to 2.2. > The performance impact was especially noticeable on the client side metrics, > where p99 could go up to 10 times higher, while ClientRequest metrics > recorded by Cassandra didn't show any overhead. > Below shows latencies recorded on the client side with debug logging on > first, and then without it : > !debuglogging.png! > We generated a flame graph before turning off debug logging that shows the > read call stack is dominated by debug logging : > !flame_graph_snapshot.png! > I've attached the original flame graph for exploration. > Once disabled, the new flame graph shows that the read call stack gets > extremely thin, which is further confirmed by client recorded metrics : > !flame22 nodebug sjk svg.png! > The query pager code has been reworked since 3.0 and it looks like > log.debug() calls are gone there, but for 2.2 users and to prevent such > issues to appear with default settings, I really think debug logging should > be disabled by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14318) Debug logging can create massive performance issues
[ https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401971#comment-16401971 ] Alexander Dejanovski commented on CASSANDRA-14318: -- Here's the [patch for 2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...thelastpickle:disable-debug-logging-by-default] It should be mergeable in 3.0/3.11/4.0 without a problem. > Debug logging can create massive performance issues > --- > > Key: CASSANDRA-14318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14318 > Project: Cassandra > Issue Type: Bug >Reporter: Alexander Dejanovski >Priority: Major > Labels: performance > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: debuglogging.png, flame22 nodebug sjk svg.png, > flame22-nodebug-sjk.svg, flame22-sjk.svg, flame_graph_snapshot.png > > > Debug logging can involve in many cases (especially very low latency ones) a > very important overhead on the read path in 2.2 as we've seen when upgrading > clusters from 2.0 to 2.2. > The performance impact was especially noticeable on the client side metrics, > where p99 could go up to 10 times higher, while ClientRequest metrics > recorded by Cassandra didn't show any overhead. > Below shows latencies recorded on the client side with debug logging on > first, and then without it : > !debuglogging.png! > We generated a flame graph before turning off debug logging that shows the > read call stack is dominated by debug logging : > !flame_graph_snapshot.png! > I've attached the original flame graph for exploration. > Once disabled, the new flame graph shows that the read call stack gets > extremely thin, which is further confirmed by client recorded metrics : > !flame22 nodebug sjk svg.png! > The query pager code has been reworked since 3.0 and it looks like > log.debug() calls are gone there, but for 2.2 users and to prevent such > issues to appear with default settings, I really think debug logging should > be disabled by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org