Re: Pluggable throttling of read and write queries
On 2017-02-20 22:47 (-0800), Benjamin Roth wrote: > Thanks. > > Depending on the whole infrastructure and business requirements, isn't it > easier to implement throttling at the client side? > I did this once to throttle bulk inserts to migrate whole CFs from other > DBs. > Sometimes it's better to enforce policy centrally rather than expecting lots of different application teams to each do the right thing.
Re: Pluggable throttling of read and write queries
Thanks. Depending on the whole infrastructure and business requirements, isn't it easier to implement throttling at the client side? I did this once to throttle bulk inserts to migrate whole CFs from other DBs. 2017-02-21 7:43 GMT+01:00 Jeff Jirsa : > > > On 2017-02-20 21:35 (-0800), Benjamin Roth > wrote: > > Stupid question: > > Why do you rate limit a database, especially writes. Wouldn't that cause > a > > lot of new issues like back pressure on the rest of your system or > timeouts > > in case of blocking requests? > > Also rate limiting has to be based on per coordinator calculations and > not > > cluster wide. It reminds me on hinted handoff throttling. > > > > If you're sharing one cluster with 10 (or 20, or 100) applications, > breaking one application may be better than slowing down 10/20/100. In many > cases, workloads can be throttled and still meet business goals - nightly > analytics jobs, for example, may be fine running over the course of 3 hours > instead of 15 minutes, especially if the slightly-higher-response-latency > over 3 hours is better than much-worse-response-latency for that 15 minute > window. > -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
Re: Pluggable throttling of read and write queries
On 2017-02-20 21:35 (-0800), Benjamin Roth wrote: > Stupid question: > Why do you rate limit a database, especially writes. Wouldn't that cause a > lot of new issues like back pressure on the rest of your system or timeouts > in case of blocking requests? > Also rate limiting has to be based on per coordinator calculations and not > cluster wide. It reminds me on hinted handoff throttling. > If you're sharing one cluster with 10 (or 20, or 100) applications, breaking one application may be better than slowing down 10/20/100. In many cases, workloads can be throttled and still meet business goals - nightly analytics jobs, for example, may be fine running over the course of 3 hours instead of 15 minutes, especially if the slightly-higher-response-latency over 3 hours is better than much-worse-response-latency for that 15 minute window.
Re: Pluggable throttling of read and write queries
On 2017-02-17 18:12 (-0800), Abhishek Verma wrote: > > > Is there a way to throttle read and write queries in Cassandra currently? > If not, what would be the right place in the code to implement a pluggable > interface for doing it. I have briefly considered using triggers, but that > is invoked only in the write path. The initial goal is to have a custom > pluggable class which would be a no-op. > The only real tunable you have now are the yaml properties for limiting concurrent reads and concurrent writes - you can tune those per-server to limit impact (set them low enough that queries back up rather than crash the cluster). Other databases tend to express this per-user ( MySQL, for example https://dev.mysql.com/doc/refman/5.7/en/user-resources.html, provides options like MAX_QUERIES_PER_HOUR and MAX_UPDATES_PER_HOUR ) - I'm not aware of any proposed tickets like that (other than some per-partition, which is a different protection), but there are some tangential tickets floating around that involve "limiting" user accounts. For example, https://issues.apache.org/jira/browse/CASSANDRA-8303 was designed to protect clusters from less experienced users (notably some of the "features" that tend to be misused) - the linked design doc describes it as being modeled after posix capabilities, so it's not really the same proposal, however, you could look at the code there and come up with something similar. Given how involved a rate-limiting patch would be, you'll want to take the same approach that Sam took there - write a design doc first (like https://docs.google.com/document/d/1SEdBEF5c4eN2VT2TyQ1j-iE0iJNM6aKFlYdD7bieHhQ/edit ) , before you write any code, and ask for feedback.
Re: Pluggable throttling of read and write queries
Stupid question: Why do you rate limit a database, especially writes. Wouldn't that cause a lot of new issues like back pressure on the rest of your system or timeouts in case of blocking requests? Also rate limiting has to be based on per coordinator calculations and not cluster wide. It reminds me on hinted handoff throttling. Am 18.02.2017 03:13 schrieb "Abhishek Verma" : > Cassandra is being used on a large scale at Uber. We usually create > dedicated clusters for each of our internal use cases, however that is > difficult to scale and manage. > > We are investigating the approach of using a single shared cluster with > 100s of nodes and handle 10s to 100s of different use cases for different > products in the same cluster. We can define different keyspaces for each of > them, but that does not help in case of noisy neighbors. > > Does anybody in the community have similar large shared clusters and/or > face noisy neighbor issues? > > Is there a way to throttle read and write queries in Cassandra currently? > If not, what would be the right place in the code to implement a pluggable > interface for doing it. I have briefly considered using triggers, but that > is invoked only in the write path. The initial goal is to have a custom > pluggable class which would be a no-op. > > We would like to enforce these rate limits per table and for different > query types (point or range queries, or LWT) separately. > > Thank you in advance. > > -Abhishek. >
Re: Does C* coordinator writes to replicas in same order or different order?
+ The C* coordinator send async write requests to the replicas. This is very important since it allows it to return a low latency reply to the client once the CL is reached. You wouldn't want to serialize the replicas one after the other. + The client <-> server sync/async isn't related to the coordinator in this case. + In the case of concurrent writes (always the case...), the time stamp sets the order. Note that it's possible to work with client timestamps or server timestamps. The client ones are usually the best choice. Note that C* each node can be a coordinator (one per request) and its the desired case in order to load balance the incoming requests. Once again, timestamps determine the order among the requests. Cheers, Dor On Mon, Feb 20, 2017 at 4:12 PM, Kant Kodali wrote: > Hi, > > when C* coordinator writes to replicas does it write it in same order or > different order? other words, Does the replication happen synchronously or > asynchrnoulsy ? Also does this depend sync or async client? What happens in > the case of concurrent writes to a coordinator ? > > Thanks, > kant >
Pluggable throttling of read and write queries
Older versions had a request scheduler api. On Monday, February 20, 2017, Ben Slater > wrote: > We’ve actually had several customers where we’ve done the opposite - split > large clusters apart to separate uses cases. We found that this allowed us > to better align hardware with use case requirements (for example using AWS > c3.2xlarge for very hot data at low latency, m4.xlarge for more general > purpose data) we can also tune JVM settings, etc to meet those uses cases. > > Cheers > Ben > > On Mon, 20 Feb 2017 at 22:21 Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > >> On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma wrote: >> >>> Cassandra is being used on a large scale at Uber. We usually create >>> dedicated clusters for each of our internal use cases, however that is >>> difficult to scale and manage. >>> >>> We are investigating the approach of using a single shared cluster with >>> 100s of nodes and handle 10s to 100s of different use cases for different >>> products in the same cluster. We can define different keyspaces for each of >>> them, but that does not help in case of noisy neighbors. >>> >>> Does anybody in the community have similar large shared clusters and/or >>> face noisy neighbor issues? >>> >> >> Hi, >> >> We've never tried this approach and given my limited experience I would >> find this a terrible idea from the perspective of maintenance (remember the >> old saying about basket and eggs?) >> >> What potential benefits do you see? >> >> Regards, >> -- >> Alex >> >> -- > > Ben Slater > Chief Product Officer > Instaclustr: Cassandra + Spark - Managed | Consulting | Support > +61 437 929 798 > -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Does C* coordinator writes to replicas in same order or different order?
Hi, when C* coordinator writes to replicas does it write it in same order or different order? other words, Does the replication happen synchronously or asynchrnoulsy ? Also does this depend sync or async client? What happens in the case of concurrent writes to a coordinator ? Thanks, kant
Are Cassandra Triggers Thread Safe? ("Tough questions perhaps!")
Hi, 1. Are Cassandra Triggers Thread Safe? what happens if two writes invoke the trigger where the trigger is trying to modify same row in a partition? 2. Had anyone used it successfully on production? If so, any issues? (I am using the latest version of C* 3.10) 3. I have partitions that are about 10K rows. And each row in a partition need to have a pointer to the previous row (and the pointer in this case is a hash). Using Triggers here would greatly simply our application logic. It will be a huge help if I can get answers to this. Thanks, kant
Re: Pluggable throttling of read and write queries
We’ve actually had several customers where we’ve done the opposite - split large clusters apart to separate uses cases. We found that this allowed us to better align hardware with use case requirements (for example using AWS c3.2xlarge for very hot data at low latency, m4.xlarge for more general purpose data) we can also tune JVM settings, etc to meet those uses cases. Cheers Ben On Mon, 20 Feb 2017 at 22:21 Oleksandr Shulgin wrote: > On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma wrote: > > Cassandra is being used on a large scale at Uber. We usually create > dedicated clusters for each of our internal use cases, however that is > difficult to scale and manage. > > We are investigating the approach of using a single shared cluster with > 100s of nodes and handle 10s to 100s of different use cases for different > products in the same cluster. We can define different keyspaces for each of > them, but that does not help in case of noisy neighbors. > > Does anybody in the community have similar large shared clusters and/or > face noisy neighbor issues? > > > Hi, > > We've never tried this approach and given my limited experience I would > find this a terrible idea from the perspective of maintenance (remember the > old saying about basket and eggs?) > > What potential benefits do you see? > > Regards, > -- > Alex > > -- Ben Slater Chief Product Officer Instaclustr: Cassandra + Spark - Managed | Consulting | Support +61 437 929 798
Re: [VOTE] Release Apache Cassandra 2.2.9
+1 (non-binding) On 2017-02-16 02:16, Michael Shuler wrote: I propose the following artifacts for release as 2.2.9. sha1: 70a08f1c35091a36f7d9cc4816259210c2185267 Git: http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.9-tentative Artifacts: https://repository.apache.org/content/repositories/orgapachecassandra-1139/org/apache/cassandra/apache-cassandra/2.2.9/ Staging repository: https://repository.apache.org/content/repositories/orgapachecassandra-1139/ The Debian packages are available here: http://people.apache.org/~mshuler The vote will be open for 72 hours (longer if needed). [1]: (CHANGES.txt) https://goo.gl/AYblr5 [2]: (NEWS.txt) https://goo.gl/gIXxgR
Re: Pluggable throttling of read and write queries
On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma wrote: > Cassandra is being used on a large scale at Uber. We usually create > dedicated clusters for each of our internal use cases, however that is > difficult to scale and manage. > > We are investigating the approach of using a single shared cluster with > 100s of nodes and handle 10s to 100s of different use cases for different > products in the same cluster. We can define different keyspaces for each of > them, but that does not help in case of noisy neighbors. > > Does anybody in the community have similar large shared clusters and/or > face noisy neighbor issues? > Hi, We've never tried this approach and given my limited experience I would find this a terrible idea from the perspective of maintenance (remember the old saying about basket and eggs?) What potential benefits do you see? Regards, -- Alex