Re: Pluggable throttling of read and write queries

2017-02-20 Thread Jeff Jirsa


On 2017-02-20 22:47 (-0800), Benjamin Roth  wrote: 
> Thanks.
> 
> Depending on the whole infrastructure and business requirements, isn't it
> easier to implement throttling at the client side?
> I did this once to throttle bulk inserts to migrate whole CFs from other
> DBs.
> 

Sometimes it's better to enforce policy centrally rather than expecting lots of 
different application teams to each do the right thing.


Re: Pluggable throttling of read and write queries

2017-02-20 Thread Benjamin Roth
Thanks.

Depending on the whole infrastructure and business requirements, isn't it
easier to implement throttling at the client side?
I did this once to throttle bulk inserts to migrate whole CFs from other
DBs.

2017-02-21 7:43 GMT+01:00 Jeff Jirsa :

>
>
> On 2017-02-20 21:35 (-0800), Benjamin Roth 
> wrote:
> > Stupid question:
> > Why do you rate limit a database, especially writes. Wouldn't that cause
> a
> > lot of new issues like back pressure on the rest of your system or
> timeouts
> > in case of blocking requests?
> > Also rate limiting has to be based on per coordinator calculations and
> not
> > cluster wide. It reminds me on hinted handoff throttling.
> >
>
> If you're sharing one cluster with 10 (or 20, or 100) applications,
> breaking one application may be better than slowing down 10/20/100. In many
> cases, workloads can be throttled and still meet business goals - nightly
> analytics jobs, for example, may be fine running over the course of 3 hours
> instead of 15 minutes, especially if the slightly-higher-response-latency
> over 3 hours is better than much-worse-response-latency for that 15 minute
> window.
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Pluggable throttling of read and write queries

2017-02-20 Thread Jeff Jirsa


On 2017-02-20 21:35 (-0800), Benjamin Roth  wrote: 
> Stupid question:
> Why do you rate limit a database, especially writes. Wouldn't that cause a
> lot of new issues like back pressure on the rest of your system or timeouts
> in case of blocking requests?
> Also rate limiting has to be based on per coordinator calculations and not
> cluster wide. It reminds me on hinted handoff throttling.
> 

If you're sharing one cluster with 10 (or 20, or 100) applications, breaking 
one application may be better than slowing down 10/20/100. In many cases, 
workloads can be throttled and still meet business goals - nightly analytics 
jobs, for example, may be fine running over the course of 3 hours instead of 15 
minutes, especially if the slightly-higher-response-latency over 3 hours is 
better than much-worse-response-latency for that 15 minute window. 


Re: Pluggable throttling of read and write queries

2017-02-20 Thread Jeff Jirsa


On 2017-02-17 18:12 (-0800), Abhishek Verma  wrote: 
> 
> 
> Is there a way to throttle read and write queries in Cassandra currently?
> If not, what would be the right place in the code to implement a pluggable
> interface for doing it. I have briefly considered using triggers, but that
> is invoked only in the write path. The initial goal is to have a custom
> pluggable class which would be a no-op.
> 

The only real tunable you have now are the yaml properties for limiting 
concurrent reads and concurrent writes - you can tune those per-server to limit 
impact (set them low enough that queries back up rather than crash the 
cluster). 

Other databases tend to express this per-user ( MySQL, for example 
https://dev.mysql.com/doc/refman/5.7/en/user-resources.html, provides options 
like MAX_QUERIES_PER_HOUR and MAX_UPDATES_PER_HOUR  ) - I'm not aware of any 
proposed tickets like that (other than some per-partition, which is a different 
protection), but there are some tangential tickets floating around that involve 
"limiting" user accounts. For example,  
https://issues.apache.org/jira/browse/CASSANDRA-8303 was designed to protect 
clusters from less experienced users (notably some of the "features" that tend 
to be misused) - the linked design doc describes it as being modeled after 
posix capabilities, so it's not really the same proposal, however, you could 
look at the code there and come up with something similar. 

Given how involved a rate-limiting patch would be, you'll want to take the same 
approach that Sam took there - write a design doc first (like 
https://docs.google.com/document/d/1SEdBEF5c4eN2VT2TyQ1j-iE0iJNM6aKFlYdD7bieHhQ/edit
 ) , before you write any code, and ask for feedback. 


Re: Pluggable throttling of read and write queries

2017-02-20 Thread Benjamin Roth
Stupid question:
Why do you rate limit a database, especially writes. Wouldn't that cause a
lot of new issues like back pressure on the rest of your system or timeouts
in case of blocking requests?
Also rate limiting has to be based on per coordinator calculations and not
cluster wide. It reminds me on hinted handoff throttling.

Am 18.02.2017 03:13 schrieb "Abhishek Verma" :

> Cassandra is being used on a large scale at Uber. We usually create
> dedicated clusters for each of our internal use cases, however that is
> difficult to scale and manage.
>
> We are investigating the approach of using a single shared cluster with
> 100s of nodes and handle 10s to 100s of different use cases for different
> products in the same cluster. We can define different keyspaces for each of
> them, but that does not help in case of noisy neighbors.
>
> Does anybody in the community have similar large shared clusters and/or
> face noisy neighbor issues?
>
> Is there a way to throttle read and write queries in Cassandra currently?
> If not, what would be the right place in the code to implement a pluggable
> interface for doing it. I have briefly considered using triggers, but that
> is invoked only in the write path. The initial goal is to have a custom
> pluggable class which would be a no-op.
>
> We would like to enforce these rate limits per table and for different
> query types (point or range queries, or LWT) separately.
>
> Thank you in advance.
>
> -Abhishek.
>


Re: Does C* coordinator writes to replicas in same order or different order?

2017-02-20 Thread Dor Laor
+ The C* coordinator send async write requests to the replicas.
   This is very important since it allows it to return a low latency
   reply to the client once the CL is reached. You wouldn't want
   to serialize the replicas one after the other.

 + The client <-> server sync/async isn't related to the coordinator
in this case.

 + In the case of concurrent writes (always the case...), the time stamp
sets the order. Note that it's possible to work with client timestamps
or
server timestamps. The client ones are usually the best choice.

Note that C* each node can be a coordinator (one per request) and its
the desired case in order to load balance the incoming requests. Once again,
timestamps determine the order among the requests.

Cheers,
Dor

On Mon, Feb 20, 2017 at 4:12 PM, Kant Kodali  wrote:

> Hi,
>
> when C* coordinator writes to replicas does it write it in same order or
> different order? other words, Does the replication happen synchronously or
> asynchrnoulsy ? Also does this depend sync or async client? What happens in
> the case of concurrent writes to a coordinator ?
>
> Thanks,
> kant
>


Pluggable throttling of read and write queries

2017-02-20 Thread Edward Capriolo
Older versions had a request scheduler api.

On Monday, February 20, 2017, Ben Slater > wrote:

> We’ve actually had several customers where we’ve done the opposite - split
> large clusters apart to separate uses cases. We found that this allowed us
> to better align hardware with use case requirements (for example using AWS
> c3.2xlarge for very hot data at low latency, m4.xlarge for more general
> purpose data) we can also tune JVM settings, etc to meet those uses cases.
>
> Cheers
> Ben
>
> On Mon, 20 Feb 2017 at 22:21 Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma  wrote:
>>
>>> Cassandra is being used on a large scale at Uber. We usually create
>>> dedicated clusters for each of our internal use cases, however that is
>>> difficult to scale and manage.
>>>
>>> We are investigating the approach of using a single shared cluster with
>>> 100s of nodes and handle 10s to 100s of different use cases for different
>>> products in the same cluster. We can define different keyspaces for each of
>>> them, but that does not help in case of noisy neighbors.
>>>
>>> Does anybody in the community have similar large shared clusters and/or
>>> face noisy neighbor issues?
>>>
>>
>> Hi,
>>
>> We've never tried this approach and given my limited experience I would
>> find this a terrible idea from the perspective of maintenance (remember the
>> old saying about basket and eggs?)
>>
>> What potential benefits do you see?
>>
>> Regards,
>> --
>> Alex
>>
>> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Does C* coordinator writes to replicas in same order or different order?

2017-02-20 Thread Kant Kodali
Hi,

when C* coordinator writes to replicas does it write it in same order or
different order? other words, Does the replication happen synchronously or
asynchrnoulsy ? Also does this depend sync or async client? What happens in
the case of concurrent writes to a coordinator ?

Thanks,
kant


Are Cassandra Triggers Thread Safe? ("Tough questions perhaps!")

2017-02-20 Thread Kant Kodali
Hi,

1. Are Cassandra Triggers Thread Safe? what happens if two writes invoke
the trigger where the trigger is trying to modify same row in a partition?
2. Had anyone used it successfully on production? If so, any issues? (I am
using the latest version of C* 3.10)
3. I have partitions that are about 10K rows. And each row in a partition
need to have a pointer to the previous row (and the pointer in this case is
a hash). Using Triggers here would greatly simply our application logic.

It will be a huge help if I can get answers to this.

Thanks,
kant


Re: Pluggable throttling of read and write queries

2017-02-20 Thread Ben Slater
We’ve actually had several customers where we’ve done the opposite - split
large clusters apart to separate uses cases. We found that this allowed us
to better align hardware with use case requirements (for example using AWS
c3.2xlarge for very hot data at low latency, m4.xlarge for more general
purpose data) we can also tune JVM settings, etc to meet those uses cases.

Cheers
Ben

On Mon, 20 Feb 2017 at 22:21 Oleksandr Shulgin 
wrote:

> On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma  wrote:
>
> Cassandra is being used on a large scale at Uber. We usually create
> dedicated clusters for each of our internal use cases, however that is
> difficult to scale and manage.
>
> We are investigating the approach of using a single shared cluster with
> 100s of nodes and handle 10s to 100s of different use cases for different
> products in the same cluster. We can define different keyspaces for each of
> them, but that does not help in case of noisy neighbors.
>
> Does anybody in the community have similar large shared clusters and/or
> face noisy neighbor issues?
>
>
> Hi,
>
> We've never tried this approach and given my limited experience I would
> find this a terrible idea from the perspective of maintenance (remember the
> old saying about basket and eggs?)
>
> What potential benefits do you see?
>
> Regards,
> --
> Alex
>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: [VOTE] Release Apache Cassandra 2.2.9

2017-02-20 Thread Tommy Stendahl

+1 (non-binding)


On 2017-02-16 02:16, Michael Shuler wrote:

I propose the following artifacts for release as 2.2.9.

sha1: 70a08f1c35091a36f7d9cc4816259210c2185267
Git:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.9-tentative
Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1139/org/apache/cassandra/apache-cassandra/2.2.9/
Staging repository:
https://repository.apache.org/content/repositories/orgapachecassandra-1139/

The Debian packages are available here: http://people.apache.org/~mshuler

The vote will be open for 72 hours (longer if needed).

[1]: (CHANGES.txt) https://goo.gl/AYblr5
[2]: (NEWS.txt) https://goo.gl/gIXxgR





Re: Pluggable throttling of read and write queries

2017-02-20 Thread Oleksandr Shulgin
On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma  wrote:

> Cassandra is being used on a large scale at Uber. We usually create
> dedicated clusters for each of our internal use cases, however that is
> difficult to scale and manage.
>
> We are investigating the approach of using a single shared cluster with
> 100s of nodes and handle 10s to 100s of different use cases for different
> products in the same cluster. We can define different keyspaces for each of
> them, but that does not help in case of noisy neighbors.
>
> Does anybody in the community have similar large shared clusters and/or
> face noisy neighbor issues?
>

Hi,

We've never tried this approach and given my limited experience I would
find this a terrible idea from the perspective of maintenance (remember the
old saying about basket and eggs?)

What potential benefits do you see?

Regards,
--
Alex