This is an automated email from the ASF dual-hosted git repository. mck pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/cassandra-website.git
The following commit(s) were added to refs/heads/master by this push: new c99bd7e Blog Post 2020-09-03 Improving Resiliency c99bd7e is described below commit c99bd7eed1a33b8ceff6e045475114b5c004b807 Author: Melissa Logan <loganloganlogan@Logan-2018.local> AuthorDate: Wed Sep 2 15:35:24 2020 -0700 Blog Post 2020-09-03 Improving Resiliency --- .../2020-09-03-improving-resiliency.markdown | 105 +++++++++++++++++++++ src/img/blog-post-improving-resiliency/image1.png | Bin 0 -> 256522 bytes src/img/blog-post-improving-resiliency/image10.png | Bin 0 -> 278163 bytes src/img/blog-post-improving-resiliency/image11.png | Bin 0 -> 509083 bytes src/img/blog-post-improving-resiliency/image12.png | Bin 0 -> 234728 bytes src/img/blog-post-improving-resiliency/image13.png | Bin 0 -> 199026 bytes src/img/blog-post-improving-resiliency/image14.png | Bin 0 -> 252461 bytes src/img/blog-post-improving-resiliency/image15.png | Bin 0 -> 260371 bytes src/img/blog-post-improving-resiliency/image16.png | Bin 0 -> 466079 bytes src/img/blog-post-improving-resiliency/image2.png | Bin 0 -> 355440 bytes src/img/blog-post-improving-resiliency/image3.png | Bin 0 -> 354831 bytes src/img/blog-post-improving-resiliency/image4.png | Bin 0 -> 392171 bytes src/img/blog-post-improving-resiliency/image5.png | Bin 0 -> 274880 bytes src/img/blog-post-improving-resiliency/image6.png | Bin 0 -> 274174 bytes src/img/blog-post-improving-resiliency/image7.png | Bin 0 -> 147739 bytes src/img/blog-post-improving-resiliency/image8.png | Bin 0 -> 606925 bytes src/img/blog-post-improving-resiliency/image9.png | Bin 0 -> 461089 bytes 17 files changed, 105 insertions(+) diff --git a/src/_posts/2020-09-03-improving-resiliency.markdown b/src/_posts/2020-09-03-improving-resiliency.markdown new file mode 100644 index 0000000..1acea24 --- /dev/null +++ b/src/_posts/2020-09-03-improving-resiliency.markdown @@ -0,0 +1,105 @@ +--- +layout: post +title: "Improving Apache Cassandra’s Front Door and Backpressure" +date: 2020-09-03 09:00:00 -0700 +author: the Apache Cassandra Community +categories: blog +--- + +As part of [CASSANDRA-15013](https://issues.apache.org/jira/browse/CASSANDRA-15013), we have improved Cassandra’s ability to handle high throughput workloads, while having enough safeguards in place to protect itself from potentially going out of memory. In order to better explain the change we have made, let us understand at a high level, on how an incoming request is processed by Cassandra before the fix, followed by what we changed, and the new relevant configuration knobs available. + +### How inbound requests were handled before + +Let us take the scenario of a client application sending requests to C* cluster. For the purpose of this blog, let us focus on one of the C* coordinator nodes. + +![alt_text](img/blog-post-improving-resiliency/image1.png "image_tooltip") + +Below is the microscopic view of client-server interaction at the C* coordinator node. Each client connection to Cassandra node happens over a netty channel, and for efficiency purposes, each Netty eventloop thread is responsible for more than one netty channel. + +![alt_text](img/blog-post-improving-resiliency/image2.png "image_tooltip") + +The eventloop threads read requests coming off of netty channels and enqueue them into a bounded inbound queue in the Cassandra node. + +![alt_text](img/blog-post-improving-resiliency/image3.png "image_tooltip") + +A thread pool dequeues requests from the inbound queue, processes them asynchronously and enqueues the response into an outbound queue. There exist multiple outbound queues, one for each eventloop thread to avoid races. + +![alt_text](img/blog-post-improving-resiliency/image4.png "image_tooltip") + +![alt_text](img/blog-post-improving-resiliency/image5.png "image_tooltip") + +![alt_text](img/blog-post-improving-resiliency/image6.png "image_tooltip") + +The same eventloop threads that are responsible for enqueuing incoming requests into the inbound queue, are also responsible for dequeuing responses off from the outbound queue and shipping responses back to the client. + +![alt_text](img/blog-post-improving-resiliency/image7.png "image_tooltip") + +![alt_text](img/blog-post-improving-resiliency/image8.png "image_tooltip") + +#### Issue with this workflow + +Let us take a scenario where there is a spike in operations from the client. The eventloop threads are now enqueuing requests at a much higher rate than the rate at which the requests are being processed by the native transport thread pool. Eventually, the inbound queue reaches its limit and says it cannot store any more requests in the queue. + +![alt_text](img/blog-post-improving-resiliency/image9.png "image_tooltip") + +Consequently, the eventloop threads get into a blocked state as they try to enqueue more requests into an already full inbound queue. They wait until they can successfully enqueue the request in hand, into the queue. + +![alt_text](img/blog-post-improving-resiliency/image10.png "image_tooltip") + +As noted earlier, these blocked eventloop threads are also supposed to dequeue responses from the outbound queue. Given they are in blocked state, the outbound queue (which is unbounded) grows endlessly, with all the responses, eventually resulting in C* going out of memory. This is a vicious cycle because, since the eventloop threads are blocked, there is no one to ship responses back to the client; eventually client side timeout triggers, and clients may send more requests due to retr [...] + +![alt_text](img/blog-post-improving-resiliency/image11.png "image_tooltip") + +So far, we have built a fair understanding of how the front door of C* works with regard to handling client requests, and how blocked eventloop threads can affect Cassandra. + +### What we changed + +#### Backpressure + +The essential root cause of the issue is that eventloop threads are getting blocked. Let us not block them by making the bounded inbound queue unbounded. If we are not careful here though, we could have an out of memory situation, this time because of the unbounded inbound queue. So we defined an overloaded state for the node based on the memory usage of the inbound queue. + +We introduced two levels of thresholds, one at the node level, and the other more granular, at client IP. The one at client IP helps to isolate rogue client IPs, while not affecting other good clients, if there is such a situation. + +These thresholds can be set using cassandra yaml file. + +``` +native_transport_max_concurrent_requests_in_bytes_per_ip +native_transport_max_concurrent_requests_in_bytes +``` + +These thresholds can be further changed at runtime ([CASSANDRA-15519](https://issues.apache.org/jira/browse/CASSANDRA-15519)). + +#### Configurable server response to the client as part of backpressure + +If C* happens to be in overloaded state (as defined by the thresholds mentioned above), C* can react in one of the following ways: + +* Apply backpressure by setting “Autoread” to false on the netty channel in question (default behavior). +* Respond back to the client with Overloaded Exception (if client sets “THROW_ON_OVERLOAD” connection startup option to “true.” + +Let us look at the client request-response workflow again, in both these cases. + +#### **THROW_ON_OVERLOAD = false (default)** + +If the inbound queue is full (i.e. the thresholds are met). + +![alt_text](img/blog-post-improving-resiliency/image12.png "image_tooltip") + +C* sets autoread to false on the netty channel, which means it will stop reading bytes off of the netty channel. + +![alt_text](img/blog-post-improving-resiliency/image13.png "image_tooltip") + +Consequently, the kernel socket inbound buffer becomes full since no bytes are being read off of it by netty eventloop. + +![alt_text](img/blog-post-improving-resiliency/image14.png "image_tooltip") + +Once the Kernel Socket Inbound Buffer is full on the server side, things start getting piled up in the Kernel Socket Outbound Buffer on the client side, and once this buffer gets full, client will start experiencing backpressure. + +![alt_text](img/blog-post-improving-resiliency/image15.png "image_tooltip") + +#### **THROW_ON_OVERLOAD = true** + +If the inbound queue is full (i.e. the thresholds are met), eventloop threads do not enqueue the request into the Inbound Queue. Instead, the eventloop thread creates an OverloadedException response message and enqueues it into the flusher queue, which will then be shipped back to the client. + +![alt_text](img/blog-post-improving-resiliency/image16.png "image_tooltip") + +This way, Cassandra is able to serve very large throughput, while protecting itself from getting into memory starvation issues. This patch has been vetted through thorough performance benchmarking. Detailed performance analysis can be found [here](https://issues.apache.org/jira/browse/CASSANDRA-15013?focusedCommentId=16881762&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16881762). diff --git a/src/img/blog-post-improving-resiliency/image1.png b/src/img/blog-post-improving-resiliency/image1.png new file mode 100644 index 0000000..8edf5f8 Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image1.png differ diff --git a/src/img/blog-post-improving-resiliency/image10.png b/src/img/blog-post-improving-resiliency/image10.png new file mode 100644 index 0000000..ffed820 Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image10.png differ diff --git a/src/img/blog-post-improving-resiliency/image11.png b/src/img/blog-post-improving-resiliency/image11.png new file mode 100644 index 0000000..c2b4a69 Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image11.png differ diff --git a/src/img/blog-post-improving-resiliency/image12.png b/src/img/blog-post-improving-resiliency/image12.png new file mode 100644 index 0000000..675e84f Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image12.png differ diff --git a/src/img/blog-post-improving-resiliency/image13.png b/src/img/blog-post-improving-resiliency/image13.png new file mode 100644 index 0000000..70f0887 Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image13.png differ diff --git a/src/img/blog-post-improving-resiliency/image14.png b/src/img/blog-post-improving-resiliency/image14.png new file mode 100644 index 0000000..fd53d62 Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image14.png differ diff --git a/src/img/blog-post-improving-resiliency/image15.png b/src/img/blog-post-improving-resiliency/image15.png new file mode 100644 index 0000000..df90bd0 Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image15.png differ diff --git a/src/img/blog-post-improving-resiliency/image16.png b/src/img/blog-post-improving-resiliency/image16.png new file mode 100644 index 0000000..64dcde5 Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image16.png differ diff --git a/src/img/blog-post-improving-resiliency/image2.png b/src/img/blog-post-improving-resiliency/image2.png new file mode 100644 index 0000000..edea7fe Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image2.png differ diff --git a/src/img/blog-post-improving-resiliency/image3.png b/src/img/blog-post-improving-resiliency/image3.png new file mode 100644 index 0000000..7e1f291 Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image3.png differ diff --git a/src/img/blog-post-improving-resiliency/image4.png b/src/img/blog-post-improving-resiliency/image4.png new file mode 100644 index 0000000..367f7c4 Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image4.png differ diff --git a/src/img/blog-post-improving-resiliency/image5.png b/src/img/blog-post-improving-resiliency/image5.png new file mode 100644 index 0000000..2c2e65d Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image5.png differ diff --git a/src/img/blog-post-improving-resiliency/image6.png b/src/img/blog-post-improving-resiliency/image6.png new file mode 100644 index 0000000..67b9bd2 Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image6.png differ diff --git a/src/img/blog-post-improving-resiliency/image7.png b/src/img/blog-post-improving-resiliency/image7.png new file mode 100644 index 0000000..ce49186 Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image7.png differ diff --git a/src/img/blog-post-improving-resiliency/image8.png b/src/img/blog-post-improving-resiliency/image8.png new file mode 100644 index 0000000..e9f0f6f Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image8.png differ diff --git a/src/img/blog-post-improving-resiliency/image9.png b/src/img/blog-post-improving-resiliency/image9.png new file mode 100644 index 0000000..c54ec71 Binary files /dev/null and b/src/img/blog-post-improving-resiliency/image9.png differ --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org