[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883581#comment-16883581 ] Sumanth Pasupuleti commented on CASSANDRA-15013: Sure [~benedict]. Here are the patches: *3.0* Patch: [^15013-3.0.txt] Passing UTs and DTests https://circleci.com/workflow-run/c7889003-9c58-4099-9530-0439bf241238 Github branch: https://github.com/apache/cassandra/compare/cassandra-3.0...sumanth-pasupuleti:15013_3.0?expand=1 *3.11* Patch: [^15013-3.11.txt] Passing UTs and DTests https://circleci.com/workflow-run/46de0958-850a-4531-a15f-fd1df0c65aac Github branch: https://github.com/apache/cassandra/compare/cassandra-3.11...sumanth-pasupuleti:15013_3.11?expand=1 *trunk* Patch: [^15013-trunk.txt] Passing UTs and DTests https://circleci.com/workflow-run/67e43b0b-7f13-4de2-8fbd-7cab3d72b607 Github branch: https://github.com/apache/cassandra/compare/trunk...sumanth-pasupuleti:15013_trunk?expand=1 > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: 15013-3.0.txt, 15013-3.11.txt, 15013-trunk.txt, > BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png, > perftest2_15013_base_flamegraph.svg, perftest2_15013_patch_flamegraph.svg, > perftest2_blocked_threadpool.png, perftest2_cpu_usage.png, > perftest2_heap.png, perftest2_read_latency_99th.png, > perftest2_read_latency_avg.png, perftest2_readops.png, > perftest2_write_latency_99th.png, perftest2_write_latency_avg.png, > perftest2_writeops.png, perftest_blockedthreads.png, > perftest_connections_count.png, perftest_cpu_usage.png, > perftest_heap_usage.png, perftest_readlatency_99th.png, > perftest_readlatency_avg.png, perftest_readops.png, > perftest_writelatency_99th.png, perftest_writelatency_avg.png, > perftest_writeops.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882103#comment-16882103 ] Sumanth Pasupuleti commented on CASSANDRA-15013: Thanks [~benedict] and [~jjirsa]. I've re-run the perf test such that throughput is same across both the clusters (I had to tone down the ndbench client pointing to patch version of C* by quite a lot) to make it equal to trunk throughput. I have attached the flamegraphs - CPU usage is a tad lower in patch vs trunk (based on avg). Also attached all the metrics of this perf run (files starting with perftest2*). Following is the summary of perf run #2 * Very similar readops and write ops * Read latency (99th and avg) slightly better for patch vs trunk * Write latency 99th similar between patch and trunk. Write latency avg is slightly better for patch vs trunk * No blocked threadpool for patch * Cpu usage (avg) is slightly better for patch vs trunk * Heap usage pattern was similar between patch and trunk > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png, > perftest2_15013_base_flamegraph.svg, perftest2_15013_patch_flamegraph.svg, > perftest2_blocked_threadpool.png, perftest2_cpu_usage.png, > perftest2_heap.png, perftest2_read_latency_99th.png, > perftest2_read_latency_avg.png, perftest2_readops.png, > perftest2_write_latency_99th.png, perftest2_write_latency_avg.png, > perftest2_writeops.png, perftest_blockedthreads.png, > perftest_connections_count.png, perftest_cpu_usage.png, > perftest_heap_usage.png, perftest_readlatency_99th.png, > perftest_readlatency_avg.png, perftest_readops.png, > perftest_writelatency_99th.png, perftest_writelatency_avg.png, > perftest_writeops.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881793#comment-16881793 ] Benedict commented on CASSANDRA-15013: -- [~jjirsa] has just pointed out that we are seeing _double_ the throughput of reads (I hadn't carefully looked at all the graphs), which very likely explains the increase in CPU It would be nice to perform a run with fixed throughput at a rate trunk can manage, so we can get directly comparable results. But this is a _huge_ win, nice work! (If we can grab flame graphs anyway, there'd be no harm and it would be nice to take a look to confirm nothing unexpected) > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png, > perftest_blockedthreads.png, perftest_connections_count.png, > perftest_cpu_usage.png, perftest_heap_usage.png, > perftest_readlatency_99th.png, perftest_readlatency_avg.png, > perftest_readops.png, perftest_writelatency_99th.png, > perftest_writelatency_avg.png, perftest_writeops.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881781#comment-16881781 ] Benedict commented on CASSANDRA-15013: -- Thanks for these [~sumanth.pasupuleti]! Just to log for watchers, I have had a brief chat with Sumanth, and we intend to capture flame graphs to see if we can explain the 10% (5 percentage point) bump in average CPU utilisation, which may well be down to competition on a single variable for every operation. This is a worst case cost, given the formulation of this test, which was the whole point - but it's potentially still significant, so we might need to reduce friction by e.g. assigning each connection its own share of the pie at connection, so that we only have to compete for the shared resource infrequently (when we overshot our share, or need to dis/connect). We'll see what the flame graphs show. We will also try to explain the different shape of heap utilisation graph - which might be as simple as only one node is coordinating instead of all three, for instance. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png, > perftest_blockedthreads.png, perftest_connections_count.png, > perftest_cpu_usage.png, perftest_heap_usage.png, > perftest_readlatency_99th.png, perftest_readlatency_avg.png, > perftest_readops.png, perftest_writelatency_99th.png, > perftest_writelatency_avg.png, perftest_writeops.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881762#comment-16881762 ] Sumanth Pasupuleti commented on CASSANDRA-15013: Performance tests were run against two C* clusters, one running latest trunk, and one running (latest trunk + 15013 patch). Two NDBench clusters, with similar configuration to emit similar traffic, were setup to throw load at each of the C* clusters. Each of the C* clusters is a single region, six i3.8xl nodes, and each of the NDBench clusters is 450 nodes. Following is the analysis of the perf run: # No blocked threadpool in patch, vs blocked threadpool in trunk !perftest_blockedthreads.png|thumbnail! # Similar writeops !perftest_writeops.png|thumbnail! # Patch does more readops vs trunk !perftest_readops.png|thumbnail! # Comparable read and write latencies (99th and avg) !perftest_readlatency_99th.png|thumbnail! !perftest_readlatency_avg.png|thumbnail! !perftest_writelatency_99th.png|thumbnail! !perftest_writelatency_avg.png|thumbnail! # Comparable CPU usage !perftest_cpu_usage.png|thumbnail! # Comparable heap usage !perftest_heap_usage.png|thumbnail! # Connections count (~1000 connections per C* node) !perftest_connections_count.png|thumbnail! > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png, > perftest_blockedthreads.png, perftest_connections_count.png, > perftest_cpu_usage.png, perftest_heap_usage.png, > perftest_readlatency_99th.png, perftest_readlatency_avg.png, > perftest_readops.png, perftest_writelatency_99th.png, > perftest_writelatency_avg.png, perftest_writeops.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874970#comment-16874970 ] Benedict commented on CASSANDRA-15013: -- Fantastic, that sounds great. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874969#comment-16874969 ] Sumanth Pasupuleti commented on CASSANDRA-15013: Absolutely. I was thinking along similar lines as well, as to performance test this patch - I was more thinking about having two clusters with the same configuration, one with and one without this patch, and measuring their performance when put under similar load. As you said, we can employ LOCAL_ONE traffic and see that data is small enough to fit into memory. I am thinking about at least a 3-node i3.8xl /i3.16xl cluster setup to even out any instance related issues but will ensure we have thousands of TCP connections per host and ops in the order of ~100k qps per host (or at least as high as we can get). Will start these tests today. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874927#comment-16874927 ] Benedict commented on CASSANDRA-15013: -- Thanks [~sumanth.pasupuleti]. I'm happy this resolves the issue, but it occurs to me it would be great to confirm the worst case performance impact of this change is manageable - particularly given your performance testing infrastructure. The worst case behaviour is probably LOCAL_ONE in-memory reads on a high core count multi-socket machine, serving thousands of TCP connections from the same host at a total rate of hundreds of thousands of QPS. So a single i3.16xlarge system would be optimal. Does that sound reasonable to you? > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874366#comment-16874366 ] Sumanth Pasupuleti commented on CASSANDRA-15013: [~benedict] Makes sense. I pushed a [change|https://github.com/sumanth-pasupuleti/cassandra/commit/0c75ecf7b6f0824786b840c6cba167eb393b92ce] to [this|https://github.com/sumanth-pasupuleti/cassandra/commits/15013_trunk_2] branch. Per node defaults to 1/10th of heap size, per IP defaults to 1/40th of heap size. Similar test [results|https://circleci.com/workflow-run/c61d7df2-c77a-4eab-a954-a59f6165f372] as previous run. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874184#comment-16874184 ] Benedict commented on CASSANDRA-15013: -- Yes, I guess 3GiB per IP is probably too high, as is 5GiB per node. Not really sure what a good default is - probably it should be a function of heap size like most of our other limits. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874183#comment-16874183 ] Sumanth Pasupuleti commented on CASSANDRA-15013: Thanks for catching the {{channelInactive}} case. lgtm. One thing remaining I suppose, is to revisit the defaults [https://github.com/apache/cassandra/commit/98126f5d887228f5e88eca66f007873b52a0aacf#diff-b66584c9ce7b64019b5db5a531deeda1R173] {code:java} // TODO: Revisit limit public volatile long native_transport_max_concurrent_requests_in_bytes_per_ip = 30L; public volatile long native_transport_max_concurrent_requests_in_bytes = 50L;{code} > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874159#comment-16874159 ] Benedict commented on CASSANDRA-15013: -- Thanks [~sumanth.pasupuleti]. I realised I had forgotten to handle the case of {{channelInactive}} whilst paused, so I've pushed a tiny follow-up modification. If it looks good to you, I'll merge the lot into trunk. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873870#comment-16873870 ] Sumanth Pasupuleti commented on CASSANDRA-15013: +1 on the suggestions [~benedict]. I have applied your commit on my branch and ran the tests. UTs and JVM DTests pass. All Dtests pass except for 6 failures which seem unrelated. https://circleci.com/workflow-run/04b77dd7-7dca-49d4-8328-e55b357fcca6 > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873477#comment-16873477 ] Benedict commented on CASSANDRA-15013: -- Thanks [~sumanth.pasupuleti]. I think this is very close to commit. I've pushed a small number of extra suggestions [here|https://github.com/belliottsmith/cassandra/tree/15013-suggestions]. Mostly just minor stylistic simplifications, as well as a modification to of back pressure deployed to simply the number of connections currently experiencing back pressure, since it's not entirely clear how an operator would meaningfully interpret the number of times it was independently applied (since it would be applied more often for small messages than large ones) Let me know what you think, and we can hopefully see about merging this soon. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843339#comment-16843339 ] Sumanth Pasupuleti commented on CASSANDRA-15013: Thanks for the additional feedback. I've made following further changes to the patch: * Removed requestsProcessed metric. * Added BackpressureDeployed metric to indicate how many times server attempted to apply backpressure. * EndpointPayloadTracker manages requestPayloadInFlightPerEndpoint, and an external consumer of EndpointPayloadTracker calls a static get that internally does tryRef * EndpointAndGlobal.release returns ABOVE_LIMIT or BELOW_LIMIT EndpointPayloadTracker is the only accessor of globalRequestPayloadInFlight, however, it did not seem to make semantic sense of moving globalRequestPayloadInFlight as a member of EndpointPayloadTracker. Regarding starting/stopping all channels for an endpoint at once, I agree it would make the system better respect the limits than otherwise letting each channel cross once (incase of backpressure mode). However, since each channel's connection properties maybe different (THROW_ON_OVERLOAD vs BACKPRESSURE), we will then have to check the property and make a decision accordingly. I am inclining towards punting this for now. Let me know what you think. Regarding per-client limit, I see it as a good logical extension to this patch, and I would like to tackle it as a separate ticket. Also, as an additional ticket, I would like to explore extending this concurrency limitting patch to go beyond the current parameter of incoming payload which works well for writes, but not necessarily for reads, maybe considering the response payload and/or no. of concurrent tasks in flight. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16842121#comment-16842121 ] Benedict commented on CASSANDRA-15013: -- Thanks [~sumanth.pasupuleti], the patch is looking really good. Some remaining questions: * Do we need requestsProcessed metric? We already have regularStatementsExecuted and preparedStatementsExecuted which should track closely for the traffic we care about. * Conversely, do we want some metric to track back pressure being deployed? It’s not clear exactly what semantics we would want to maintain here, since we don’t _currently_ pause all channels for a given endpoint when the endpoint overflows, and it’s also unclear if we would want to track this per-client (probably not, although it would be really nice to do so) * I think it would be nice to manage {{requestPayloadInFlightPerEndpoint}} entirely inside EndpointPayloadTracker}}; it's presently only accessed once outside in an adjacent class, but it would be very simple to hide the map entirely, as well as {{tryRef}}, and simply offer a {{public static get}} method in {{EndpointPayloadTracker}}. WDYT? What do you also think about starting/stopping all channels for an endpoint at once, when we cross the threshold? I don't think it is essential, but is probably worth considering, as it makes our limits even less clearly defined (given we're permitted to cross them already, once per channel; it would be nice to tighten that to once per-endpoint) > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841835#comment-16841835 ] Sumanth Pasupuleti commented on CASSANDRA-15013: [~benedict] incorporated the feedback from your branch (naming and TODOs) and from the jira comments. Here is the updated change: https://github.com/sumanth-pasupuleti/cassandra/commit/45e31829e839d7e74b08566d7e501a46ed818330. A couple of major changes * Dispatcher would never query the map for getting EndpointPayloadTracker, rather it uses the reference it already has. * FlushItem gets a reference to the corresponding Dispatcher, so it calls releaseItem on the right Dispatcher. All UTs and DTests pass. https://circleci.com/workflow-run/bb6b2eb6-daa6-41c1-9a3d-44b53bc7fb50 > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840301#comment-16840301 ] Benedict commented on CASSANDRA-15013: -- I've pushed some minor suggestions [here|https://github.com/belliottsmith/cassandra/tree/15013-suggestions] around naming: # Tried to make the native_transport config parameters have more consistent naming with prior parameters - feel free to modify them further, if you think you can improve them still # {{forceAllocate}} -> {{allocate}}, which is usually the alternative to {{tryAllocate}} # Shortened THROW_ON_OVERLOAD parameter There are three remaining bugs, and I've paused review until they can be addressed: # {{this::releaseItem}} is unsafe to provide to the {{Flusher}} constructor, since these are unique to an address, and the {{Flusher}} is per-eventLoop. If we choose to hash all connections on an endpoint to a single eventLoop this would be easy to accommodate, or otherwise {{FlushItem}} needs to be the implementor of {{release()}} # I don't think we can use {{Ref}} for management of the {{EndpointPayloadTracker}}. The {{Tidy}} implementation requires a reference to the object itself, and anyway logically deleting itself after release defeats the point of Ref (which is leak detection). It's impossible for it to detect a leak and cleanup, if the strong reference is cleaned up by this process (since there will always be a strong reference until it invokes, and it requires there to be no strong references, it will never invoke). Probably we should use a simple AtomicInteger to manage reference counts. I think it would be cleanest to encapsulate the map management inside a static method in {{EndpointPayloadTracker}} as well. # I think we currently have a race condition around the release of a channel (and its {{EndpointPayloadTracker}}) and the attempt to release capacity from the {{EndpointPayloadTracker}} we have requests in flight for. Channels can be invalidated before we complete requests issued by them, so we must be sure to release from the tracker we allocated from, so that we do not wrap into negative on release. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840266#comment-16840266 ] Benedict commented on CASSANDRA-15013: -- So, thinking on it a bit more, I don't think this would actually be a very large change, but it also wouldn't simplify things as much as I might like. It might only save concurrency for endpoint resource allocation. So I'll review the patch as it is, and we can consider after that if we want to make any further changes. It looks like I also made an error in my first skim of the patch, or I was looking at a different version - there's no need to set the queue limit to -1; Integer.MAX_VALUE is fine - if we hit that limit we have bigger problems :) > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839326#comment-16839326 ] Benedict commented on CASSANDRA-15013: -- Thanks [~sumanth.pasupuleti]. I started reviewing your new changes, and part way through realised we could potentially simplify this all a great deal with a slightly different approach. Namely, if we were to hash endpoints to a specific eventLoop when accepting the connection. If we were to do this, we could have very simple per-thread accounting, and we could even aggregate all of the per-endpoint channels into a single flusher for stopping/starting together once they exceed their limits. Everything would be single threaded, so our logic would be much simpler to reason about. This isn't without its tradeoffs - potentially users might have a setup with a single application node speaking to the cluster, but this would be a very peculiar system design to pair with Cassandra, and a single dedicated eventLoop for this node would still likely suffice for a majority of workloads. We also have the potential issue of endpoint collisions, but if we use a cryptographic hash function this should only be a problem for a very small number of nodes (and if we ever find it is a real problem, we can remedy it) What do you think? I'm sorry for moving the goal posts suddenly, it just hadn't occurred to me until now. My goal is only the best patch, so I'm interested to hear your thoughts. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838232#comment-16838232 ] Sumanth Pasupuleti commented on CASSANDRA-15013: Thanks for the feedback [~benedict]. I have incorporated it in the branch https://github.com/sumanth-pasupuleti/cassandra/commits/15013_trunk_2 However, with this change, we will lose the ability to change the endpoint/global limits on a node on the fly, without restarting C* process. Added forceAllocate method to ResourceLimits, to accommodate backpressure scenario. A few DTest failures, seemingly from race conditions on allocate/release. Working on figuring out where the race is coming from. Dtest results: https://circleci.com/gh/sumanth-pasupuleti/cassandra/369#tests https://circleci.com/gh/sumanth-pasupuleti/cassandra/368#tests > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831069#comment-16831069 ] Benedict commented on CASSANDRA-15013: -- Hi Sumanth, Thanks for the updated patch. This is not a full review, just some initial feedback. First, I think we could do with revisiting the naming of a couple of things. The config parameters should probably be prefixed with native_transport for consistency, and the connection parameter should be shorter (since we encode every byte) and perhaps convey the intent - maybe THROW_ON_OVERLOAD? Secondly, the patch looks to have some data races when updating shared state. It might be sensible to reuse what we have already produced for this in CASSANDRA-15066 - if you look in [this branch|https://github.com/belliottsmith/cassandra/tree/messaging-improvements] you will find a class called {{ResourceLimits}} that we use to impose per-endpoint and global limits, which is exactly what you’re doing here. There’s not much sense in duplicating the work, so perhaps you can copy the class and use it in this patch; it shouldn’t change before we commit. Similarly, the accesses of {{requestPayloadInFlightPerEndpoint}} need to be made atomic. In {{initChannel}} this means grabbing the result of {{computeIfAbsent}} and to {{tryRef}} this - if we fail, we need to immediately remove the object you fetched from the map and try again (not just loop, else we may block on another thread removing it). In {{tidy}} we need to remove the (key, value) pair to ensure we do not remove a newer object that has replaced us. Similarly for invocations of {{containsKey}}, we need to instead invoke {{get}} and check the result is not null. In general, in concurrent operation we need to access the shared state once up front, and only refresh it when necessary, as it can change underneath us at any time. Finally, we should remove the task queue length limit from {{requestExecutor}} else this patch won’t actually stop us blocking the event loop. Look forward to giving the final patch a full review in the near future. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822372#comment-16822372 ] Sumanth Pasupuleti commented on CASSANDRA-15013: Updated patch: [https://github.com/apache/cassandra/pull/313] Thanks [~benedict]. I learnt from your suggestion, {{Ref}} class is useful for getting around the race conditions I was initially worried about, to evict endpoint from the map. Attached patch evicts endpoint along the lines of your proposal, except that, I used a new class {{EndpointPayloadTracker}}, in place of suggested class ({{Dispatcher}}). Having Dispatcher mapped against endpoint makes it as 1:1 Dispatcher per endpoint, whereas currently it is one Dispatcher per Channel, and I rely on that association to store channel level inflight payload, which is then useful to turn off backpressure on a channel (one of the conditions I check to {{setAutoRead}}(true) is when channel level inflight payload comes down to zero). A few other changes I have made as part of this updated patch * Removed channel level threshold with the worry of too many config knobs (channel level, endpoint level, global level). So each time endpoint/global thresholds are exceeded, a channel is put backpressure on, or an overloadedexception is thrown. * In addition to memory based limit, added another tracker and limit check based on number of requests in flight - this is to keep a check on a situation where there are too many in-coming requests with small enough payload that get around memory limit checks, but result in blocking event loop threads. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819331#comment-16819331 ] Benedict commented on CASSANDRA-15013: -- [~sumanth.pasupuleti] thanks for the update. As to addressing the slow leak, there are a number of similar approaches, but I would propose the following: * Instead of a {{Map}} in {{Dispatcher}} have a {{Map}} in {{Server.Initializer}} * Add a reference count to the {{Dispatcher}} as well as an atomic {{bytesInFlight}} * In {{Server.Initializer.initChannel}}, lookup the socket InetAddress: *# If there is no {{Dispatcher}} create one *# If the {{Dispatcher}} cannot increment its reference count, remove it from the map and goto (1) *# Otherwise we've taken ownership of the {{Dispatcher}} and can use it * Then, in the {{Dispatcher}}, override {{channelInactive}} to decrement our reference count and remove ourselves from the map if we've been freed This also marginally reduces the per-message cost of enforcing these constraints. WDYT? > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819299#comment-16819299 ] Sumanth Pasupuleti commented on CASSANDRA-15013: [~benedict] here is the PR that includes per-endpoint limit https://github.com/apache/cassandra/pull/311/commits/a03cd0550118e3c1dc98694c0dc0ed84824853d1 This patch has a drawback of a slow leak (we never remove an endpoint from the endpoint inflight map) - looking for advice. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813902#comment-16813902 ] Sumanth Pasupuleti commented on CASSANDRA-15013: [~benedict] Regarding the per-endpoint limit, we have had multiple back and forth discussions within our team, and since driver offers maxConnectionsPerHost option, it is quite possible for one client instance to have more than one connection to a given C* instance. I will be working on changes to implement the per-endpoint limit, in addition to per-connection and global limit. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807994#comment-16807994 ] Benedict commented on CASSANDRA-15013: -- {quote}In other words, we would never discard a message if the client chose to go with backpressure option {quote} +1 {quote}I propose cutting a separate ticket for that work, and keeping the scope limited for this current ticket {quote} How about a middle ground: we implement the per-endpoint (IP address) limit (which would be easily generalised to incorporate an application identifier) in this patch, so that the logical behaviour of the message control flow isn't really revisited, we just have to change the inputs and introduce any client API changes in the follow-up patch? I personally have a preference for trying to get all of the logical semantics settled in the first patch, though I'm not deeply wed to that. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807988#comment-16807988 ] Sumanth Pasupuleti commented on CASSANDRA-15013: Thanks for the feedback [~benedict] +1 on unconditionally enqueuing the message to the executor when we setAutoRead(false), and throwing OverloadedException each time a message is discarded. In other words, we would never discard a message if the client chose to go with backpressure option, rather we just setAutoRead(false) and process the message. Regarding in-flight per-endpoint, and having an application identifier, I like the suggestion, as it offers better guarantees on throttling client instances, however, I propose cutting a separate ticket for that work, and keeping the scope limited for this current ticket. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806963#comment-16806963 ] Benedict commented on CASSANDRA-15013: -- Thanks [~sumanth.pasupuleti] for the patch. I've only taken a quick glance at it so far, but it looks pretty good, and I'm looking forward to integrating it. There's only one substantive comment I have for the moment, which is that I think when we {{setAutoread(false)}}, we need to not discard the message. We should perform only one of the two options; if we ever discard, I think we should always let the user know by throwing {{OverloadedException}}. If we want to use back pressure without throwing away any messages that are over our limit, I can think of two fairly straight forward mechanisms, with the best being unfortunately quite difficult given our current Netty pipeline. # Like we have done for CASSANDRA-15066, we would ideally leave the message unparsed in the buffer until we have capacity to process it. # Alternatively, as an easier solution, we can unconditionally enqueue the message to the executor, assuming that we must have a fairly limited quantity of bytes we can overshoot by Also, as a point of consideration only, we _might_ also want to limit the number of bytes we have in-flight per-endpoint, rather than per-channel, to avoid a given host spamming the database with many connections. It's perhaps not a very good unit to limit by, either, since an IP address may host many applications, but an application can also open an arbitrary number of connections, and crowd out all other hosts... Just something to consider, I'm not sure what the best constraints are here. Perhaps, similar to CASSANDRA-15066, a very small per-connection limit that can always be consumed, to ensure progress on any given channel, with per-endpoint and global limits for when these are exceeded (though this is less obvious than for internode, as there could be many more clients connected, so it would be plausible for these low limits to consume a great deal in combination). It might be that we want to introduce the concept of an application identifier, so that users who want to run multiple applications per host can do so, while still ensuring QoS to other applications if one goes awry. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806412#comment-16806412 ] Sumanth Pasupuleti commented on CASSANDRA-15013: [~benedict] Pull Request: https://github.com/apache/cassandra/pull/308 Passing UTs: https://circleci.com/gh/sumanth-pasupuleti/cassandra/261 Passing JVM DTests: https://circleci.com/gh/sumanth-pasupuleti/cassandra/260 Passing DTests with vnodes: https://circleci.com/gh/sumanth-pasupuleti/cassandra/262 DTests without vnodes with two failures: https://circleci.com/gh/sumanth-pasupuleti/cassandra/263 (looking into the two failures) > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769340#comment-16769340 ] Jason Brown commented on CASSANDRA-15013: - Ahh, I just reread the {{doc/native_protocol_v5.spec}}, and the OPTIONS are an open map, basically. I thought they were a fixed listing (primarily because we only support a fixed set of compression types). OK, so any version works for me :). > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769331#comment-16769331 ] Jason Brown commented on CASSANDRA-15013: - Yup, I agree the harder part, programming wise, is {{requestExecutor}} stuffs, and let's plow through that first. The {{OptionsMessage/client protocol work}} is significantly easier, as I think we agree, but would that qualify as a change to the native protocol, for which we need to wait for a major rev (as in, 4.0)? Or are additive additions ok acceptable for previous native protocol versions? We might have a policy or general advice around this, but I don't know. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769335#comment-16769335 ] Benedict commented on CASSANDRA-15013: -- I would be OK with either approach. There's no strong reason not to add a feature to the protocol that is optional, though - it's not actually a protocol change, just a behavioural change to a message that is permitted on the protocol today. Since the riskiest change will be fixing the underlying bug, I'd be in favour of at least supporting this option for clients in 3.0, if we intend to fix this behaviour that far back. But I'm also comfortable with limiting the client option to 4.0 > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769310#comment-16769310 ] Benedict commented on CASSANDRA-15013: -- I guess maybe let's wait and see how much more complicated it would be? You're right that the {{OptionsMessage}} should be sufficient for supplying the option, and I think the complicated bit is going to be negotiating safely with the {{requestExecutor}} when we should start and stop reading. If it turns out to be super challenging, by all means let's make it a 4.0-only follow-up, but if (as I suspect) it's nominally extra work, I think it's better to tie up the work while there's pressure to do so. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769292#comment-16769292 ] Jason Brown commented on CASSANDRA-15013: - [~benedict] A, I see now that's what you intended by \{{connection-configurable option}}. I'm fine with that. I'm not sure if specifying the 'backpressure type' would require a change to the native protocol. I think it would be most appropriate in the OPTIONS section (and thus {{OptionasMessage}}), but I might be mistaken. However, I wonder if we should break that work out into a separate ticket to unblock the other work here, so that it can be backported and fixed in production. wdyt? > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769041#comment-16769041 ] Benedict commented on CASSANDRA-15013: -- [~jasobrown]: FWIW, I was proposing a _client configurable_ option. So operators shouldn't need to do anything - in fact, perhaps only client _authors_ would ever specify this, though some might make this available to the developer using their library if their application semantics prefer one or the other. I don't mind which we pick as the default, and don't mind if this has a user configurable option, but while tcp back pressure should be the preferred mechanism some clients probably don't behave well in the face of it, and for these clients specifying OverloadedException behaviour is probably useful. [~sumanth.pasupuleti]: Also, if you do the work of implementing back pressure, I am happy to make the change to monitor bytes instead of queued items. I don't think it should be significantly more challenging, and it would permit us to more tightly bound system resource consumption. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16768913#comment-16768913 ] Jason Brown commented on CASSANDRA-15013: - I agree with upping the max queue depth (or unbounded plus size monitoring) as well as stop reading from the socket (by setting netty's {{autoRead}} to false). I'm not, however, convinced about adding yet another configuration option; adding more configs options only complicates the lives of operators. How will an operator know how to set it most appropriately to their use case(s)? We should choose the best solution, *document it*, and go with that as a built-in behavior. (Note: I'm amenable to throwing the OverloadedException, as well.) > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16768875#comment-16768875 ] Sumanth Pasupuleti commented on CASSANDRA-15013: Regarding the fix, * Changes to requestExecutor queue size a. Making it unbounded - tracking the total number of bytes we have read off the wire but not answered b. Keep it bounded, by giving a bigger size I think both options are good, but to keep things simple I am inclined towards (b) - increasing the default queue size * When requestExecutor is full, based on the new configuration option, either a. stop reading from incoming channels (TCP back pressure) b. Throw OverloadedException I think we should do both, and have a configurable as you suggest. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png, > image-2019-02-14-17-59-50-794.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16768865#comment-16768865 ] Sumanth Pasupuleti commented on CASSANDRA-15013: [~benedict] Your theory seems to be spot on (I have all the evidence supporting it from the heap dumps and thread dumps now). * Evidence of requestExecutor queue full (indicated by taskPermit), and all 128 workers busy (indicated by workPermit) [ !RequestExecutorQueueFull.png|thumbnail! ] * Evidence of blocked epollEventLoopGroup threads (from heap) [ !BlockedEpollEventLoopFromHeapDump.png|thumbnail! ] * Evidence of blocked epollEventLoopGroup threads (from thread dump) [ !BlockedEpollEventLoopFromThreadDump.png|thumbnail! ] > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png, > image-2019-02-14-17-59-50-794.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765781#comment-16765781 ] Benedict commented on CASSANDRA-15013: -- There is a potential semantic change here that could have negative consequences for clients, so we need to be careful. Presently, clients can expect a response to every message they send to the server while they are connected; the server handles time outs, failures etc. and makes sure to respond to the client in some way (unless the connection is lost). If we change this, we would need to corroborate client behaviour - do they all gracefully timeout outstanding requests, for instance, or do they kill them on a per-connection basis? In either case, though, it is a shame to discard work that we've gone to the effort of performing. From a cluster stability perspective, this is more likely to put the cluster under steadily more pressure, rather than less, as the client will no doubt want to retry this work. It is better to either discard work that has yet to be initiated, or to try not to discard work at all and provide back pressure signals to the client. I think it would also be more relevant as an initial step to remove the blocking behaviour on incoming, so that the eventLoop can always service the outgoing queue to prevent this build up. There's a strong chance the build up of outgoing messages you see is down to the eventLoop that must process it being blocked on offering work to the {{requestExecutor}}, and by removing this block the outgoing queue will not accumulate so readily. There are two options if we do this: stop reading from incoming channels when the {{requestExecutor}} is full, or throw {{OverloadedException}}. In my opinion, this is exactly what TCP back pressure is for, but we also have a world where clients have been depending on the server trying its best to never push back, so they have inadequate queueing models internally, with no support for noticing or handling this back pressure. This to me is a design flaw that should be addressed in clients, but we could mitigate it for now by increasing the size of our {{requestExecutor}} queue (which is actually unnecessarily small), or even making it unbounded and simply tracking the total number of bytes we have read off the wire but not answered. Perhaps we could even make the behaviour of {{OverloadedException}} vs back pressure a connection-configurable option, so that clients with poor flow control can utilise {{OverloadedException}} to handle this, and those with better control can use normal TCP flow control mechanisms. What do you think? > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: heap dump showing each ImmediateFlusher taking upto > 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765356#comment-16765356 ] Sumanth Pasupuleti commented on CASSANDRA-15013: [~benedict] By making the flusher queue bounded, my intention is not to make the request executor blocked on enqueuing if the queue is full, rather it would drop the response if the flusher queue is full. This should avoid the deadlock situation you are referring to, I believe. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: heap dump showing each ImmediateFlusher taking upto > 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763453#comment-16763453 ] Benedict commented on CASSANDRA-15013: -- Have you tested that this approach resolves your issues? There's a deadlock that could occur with this change, as the request executor is also blocking, so the Netty event loop could block for room on the request executor, and the request executor could block on queueing to the Flusher (that will be executed on the eventLoop). Probably we should be disabling reads from the inbound channel during overflow, in both cases, rather than blocking either the eventLoop or the requestExecutor. The behaviour of blocking the eventLoop could also be the cause of your flusher queue growing so large. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: heap dump showing each ImmediateFlusher taking upto > 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763176#comment-16763176 ] Sumanth Pasupuleti commented on CASSANDRA-15013: Working on a fix to make the Flusher queue bounded. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > --- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client >Reporter: Sumanth Pasupuleti >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: heap dump showing each ImmediateFlusher taking upto > 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org