subject:"\[jira\] \[Commented\] \(CASSANDRA\-9318\) Bound the number of in\-flight requests at the coordinator"


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15490107#comment-15490107
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

bq. It's possible I'm totally missing a point you and Stefania are trying to 
make, but it seems to me to be the only reasonable way. The timeout is a 
deadline the user asks us to respect, it's the whole point of it, so it should 
always be respected as strictly as possible.

I didn't follow the two issues mentioned above: if that's the end goal, I agree 
we should be strict with it.

bq. that discussion seems to suggest that back-pressure would make it harder 
for C* to respect a reasonable timeout. I'll admit that sounds 
counter-intuitive to me as a functioning back-pressure should make it easier by 
smoothing things over when there is too much pressure.

The load is smoothed on the server, then it depends on how many replicas are 
"in trouble" and how much aggressive clients are. As an example, say the 
timeout is 2s, the request incoming rate at the coordinator is 1000/s, the 
processing rate at replica A is 50/s and at replica B is 1000/s, then with 
CL.ONE (assuming the coordinator is not part of the replica group for 
simplicity):
1) If back-pressure is disabled, we get no client timeouts, but ~900 mutations 
dropped on replica A.
2) If back-pressure is enabled, the back-pressure rate limiting at the 
coordinator is set at 50/s (assuming the SLOW configuration) to smooth the load 
between servers, which means ~900 mutations will end up in client timeouts, and 
it will be the client responsibility to back down to a saner ingestion rate; 
that is, if it keeps ingesting at a higher rate, there's nothing we can do to 
smooth its side of the equation.

In case #2, the client timeout can be seen as a signal for the client to slow 
down, so I'm fine with that (we also hinted in the past at adding more 
back-pressure related information to the exception, but it seems this requires 
a change to the native protocol?). 

That said, this is a bit of an edge case: most of the time, when there is such 
difference between replica responsiveness, it's because of transient 
short-lived events such as GC or compaction spikes, and the back-pressure 
algorithm will not catch that, as it's meant to react to continuous overloading.

On the other hand, when the node is continuously overloaded by clients, most of 
the time all replicas will suffer from that, and the back-pressure will smooth 
out the load; in such case, the rate of client timeouts shouldn't really change 
much, but I'll do another test with such new changes.

Hope this helps clarifying things a little bit.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-09-14 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489927#comment-15489927
 ] 

Sylvain Lebresne commented on CASSANDRA-9318:
-

bq. If back pressure kicks in, then it may be that mutations are applied by all 
replicas but the application receives a timeout nonetheless.

That's *not* specific to back-pressure and can perfectly happen today and 
that's fine. Again, if your application *require* that the server answers 
within say 500ms, the server should do so and not randomly fail that deadline 
because back-pressure kicks-in. If the deadline you set as a user is too low 
and you get timeouts too often, then it's your problem and you should either 
reconsider your deadline because it's unrealistic, or increase your cluster 
capacity. But I genuinely don't understand how not doing what the user 
explicitly asks us to do would ever make sense.

bq. 1) We keep the timeout strict and change the back-pressure implementation 
to wait at most the given timeout, and fail otherwise.

It's possible I'm totally missing a point you and Stefania are trying to make, 
but it seems to me to be the only reasonable way. The timeout is a deadline the 
user asks us to respect, it's the whole point of it, so it should always be 
respected as strictly as possible. If the user sets it too low, his mistake 
(and don't get me wrong, we should certainly educate user to avoid that mistake 
through documentation as much as possible).

More generally though, that discussion seems to suggest that back-pressure 
would make it harder for C* to respect a reasonable timeout. I'll admit that 
sounds counter-intuitive to me as a functioning back-pressure should make it 
_easier_ by smoothing things over when there is too much pressure.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489860#comment-15489860
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

[~slebresne], I do understand your concerns. I see [~Stefania] largely answered 
them already, but it all depends on how strict we want to be in regards to 
CASSANDRA-12256 and CASSANDRA-2848, so I propose one of the following:
1) We keep the timeout strict and change the back-pressure implementation to 
wait at most the given timeout, and fail otherwise.
2) We add a back-pressure configuration parameter to have the users choose if 
they want a strict timeout even in case of back-pressure, or they want it 
"adaptive".

Thoughts?



> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489813#comment-15489813
 ] 

Stefania commented on CASSANDRA-9318:
-

If back pressure kicks in, then it may be that mutations are applied by all 
replicas but the application receives a timeout nonetheless. The timeout may 
even be already expired after back-pressure is applied but before sending 
mutations to the replicas. So in this case, I think it would make sense to have 
a timeout long enough to take into account the effects of back-pressure when 
the system is overloaded.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-09-14 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489764#comment-15489764
 ] 

Sylvain Lebresne commented on CASSANDRA-9318:
-

bq. If we don't do this, then users have to manually increase the timeout when 
enabling back-pressure, by an amount of time that is not known a priori.

Why? Again, we want the timeout to mean, in the context of CASSANDRA-12256 and 
CASSANDRA-2848, is an upper on the server answering to the application (even if 
that means giving up). In other words, the timeout is about what the 
application needs, and that shouldn't be influenced by having back-pressure on 
or not. Besides, back-pressure should be largely transparent from the user 
point of view, at least as long as nothing goes too bad (and it should 
hopefully make things better otherwise, not worse), so it makes no sense from 
the user point of view  to have to bump the timeout just because back-pressure 
is enabled.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489726#comment-15489726
 ] 

Stefania commented on CASSANDRA-9318:
-

bq.  I don't why having that being violated when back-pressure would make 
sense. It just seem like a gotcha for users that I'd rather avoid.

If we don't do this, then users have to manually increase the timeout when 
enabling back-pressure, by an amount of time that is not known a priori. So 
both scenarios are not ideal.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-09-14 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489709#comment-15489709
 ] 

Sylvain Lebresne commented on CASSANDRA-9318:
-

bq. with back-pressure enabled the time spent paused in back-pressure is not 
counted against

I don't understand that rational. The goal of CASSANDRA-12256 is to make sure 
client can rely on the timeout being an upper bound on when they get an answer 
from the server. I don't why having that being violated when back-pressure 
would make sense. It just seem like a gotcha for users that I'd rather avoid.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489654#comment-15489654
 ] 

Stefania commented on CASSANDRA-9318:
-

Sounds good, thanks.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489629#comment-15489629
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

bq. queryStartNanoTime was introduced by CASSANDRA-12256

Yep, I was just looking at that.

bq. Should we increase the timeout by the amount of time spent waiting for the 
backpressure strategy

Yes, this seems the best solution, plus documenting it, that is the fact that 
with back-pressure enabled the time spent paused in back-pressure is not 
counted against.

bq. Also, it should not be changed if the backpressure strategy is disabled.

I changed it regardless in the previous solution for consistency, but I agree 
that post CASSANDRA-12256 it should be changed only in case of back-pressure.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489612#comment-15489612
 ] 

Stefania commented on CASSANDRA-9318:
-

bq. Alright, looks like queryStartNanoTime replaced the old start variable in 
the latest rebase, let me fix that and check/rerun tests.

I understand now.  {{queryStartNanoTime}} was introduced by CASSANDRA-12256, it 
comes all the way from {{QueryProcessor}}. I think the aim was to to take into 
account the full query processing time from the client point of view, including 
CQL parsing and so forth. Should we increase the timeout by the amount of time 
spent waiting for the backpressure strategy rather than resetting the start 
time? Also, it should not be changed if the backpressure strategy is disabled.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489578#comment-15489578
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

bq. This method initializes the variable but where is it read?

Alright, looks like {{queryStartNanoTime}} replaced the old {{start}} variable 
in the latest rebase, let me fix that and check/rerun tests.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489558#comment-15489558
 ] 

Stefania commented on CASSANDRA-9318:
-

bq. The latest rebase brought several changes and conflicts so I was waiting 
for the jenkins test results to fix any related test failures: let me do that 
and rerun the suite.

OK

bq. It is used here. Or am I missing something?

This method initializes the variable but where is it read?

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489551#comment-15489551
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

Thanks [~Stefania]. 

The latest rebase brought several changes and conflicts so I was waiting for 
the jenkins test results to fix any related test failures: let me do that and 
rerun the suite.

Regarding:

bq. Is this commit complete? It looks like AbstractWriteResponseHandler.start 
is not used.

It is used 
[here|https://github.com/apache/cassandra/commit/ad729f2d1758ec8c4add7b71ce3c3a680f5beb6d#diff-71f06c193f5b5e270cf8ac695164f43aR1307].
 Or am I missing something?

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-09-13 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489119#comment-15489119
 ] 

Stefania commented on CASSANDRA-9318:
-

Thanks for the update, and for running the tests. Is it possible to attach some 
test results?

Here is the code review, hopefully we are almost there:

* There is an NPE in {{RateBasedBackPressureTest}}, looks like we need to call 
{{DatabaseDescriptor.daemonInitialization();}}.
* {{DatabaseDescriptorRefTest}} has two failures, I think we need to add the 
new classes that get initialized by DD to the test white list.
* There are several failing dtests compared to trunk, it seems we have lots of 
schema related issues. Can we rerun both jobs to see if this is a coincidence 
or if we have an issue? This sort of errors appear on trunk as well, but they 
are usually very rare and only 1 or 2 tests have schema issues, here we had 
more than 10.
* Is [this 
commit|https://github.com/apache/cassandra/commit/ad729f2d1758ec8c4add7b71ce3c3a680f5beb6d]
 complete? It looks like {{AbstractWriteResponseHandler.start}} is not used.
* Trailing spaces (but they can be fixed on commit).

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-09-13 Thread Eduard Tudenhoefner (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487836#comment-15487836
 ] 

Eduard Tudenhoefner commented on CASSANDRA-9318:


[~sbtourist] nothing to add from my side. Stuff looks really good

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-09-13 Thread Russ Hatch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487712#comment-15487712
 ] 

Russ Hatch commented on CASSANDRA-9318:
---

Nothing extra to add here from me.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-09-13 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487229#comment-15487229
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

Testing should be now completed and all the main point have been verified:
1) No performance regressions with back-pressure disabled.
2) No performance regressions with back-pressure enabled and normally behaving 
cluster.
3) Reduced dropped mutations, typically by an order of magnitude, for 
continuous heavy write workloads, and full recovery via hints: tested with a 4 
nodes RF=3 cluster, ByteMan-based rate limiting applied in several scenarios to 
one or more nodes, and "slow" back-pressure.

[~eduard.tudenhoefner], [~rhatch], do you have anything to add?

I've also fixed a bug along the way and rebased to the latest trunk: 
[~Stefania], do you mind to have a final (hopefully!) review?

Updated patch and test runs:
| [trunk 
patch|https://github.com/apache/cassandra/compare/trunk...sbtourist:CASSANDRA-9318-trunk?expand=1]
 | 
[testall|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-trunk-testall/]
 | 
[dtest|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-trunk-dtest/]
 | [dtest (back-pressure 
enabled)|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-bp-true-trunk-dtest/]
 |

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-08-02 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403537#comment-15403537
 ] 

Stefania commented on CASSANDRA-9318:
-

It looks good [~sbtourist], I only noticed a typo in cassandra.yaml at this 
[line|https://github.com/apache/cassandra/compare/trunk...sbtourist:CASSANDRA-9318-trunk?expand=1#diff-bdaab1104a93e723ce0b609a6477c9c4R1161]:
 "implement provide". I did not repeat the review of the files which were added 
since they should not have changed, only the existing files with modifications.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-08-01 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402137#comment-15402137
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

Patch rebased to trunk, new links:

| [trunk 
patch|https://github.com/apache/cassandra/compare/trunk...sbtourist:CASSANDRA-9318-trunk?expand=1]
 | 
[testall|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-trunk-testall/]
 | 
[dtest|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-trunk-dtest/]
 | [dtest (back-pressure 
enabled)|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-bp-true-trunk-dtest/]
 |

[~Stefania], 

feel free to have another quick look at the new patch (no relevant changes by 
the way).

[~eduard.tudenhoefner], 

in terms of testing, I've already done some on my own as reported in one of my 
earlier comments, but I am sure you'll have more to add; the objective of our 
testing should be:
1) Verify there's no performance impact *at all* when back-pressure is disabled.
2) Verify there's no relevant performance impact when back-pressure is enabled 
and the cluster is well-behaving.
3) Verify back-pressure works correctly when the cluster is overloaded, and 
dropped mutations are reduced (how much reduced, it depends on the actual 
back-pressure configuration, and I hope the {{cassandra.yaml}} comments will 
help with that, otherwise let me know what's unclear and I'll improve them).

All tests should be run with at least 4 nodes and RF=3, the attached ByteMan 
rule can be used to simulate "slow replica" scenarios and kick-in back-pressure.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-29 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398930#comment-15398930
 ] 

Stefania commented on CASSANDRA-9318:
-

It's in pretty good shape now! :)

Let's rebase and start the testing.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-28 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397353#comment-15397353
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

[~Stefania],

bq. HintsDispatcher.Callback goes through the other one. Also, it is not 
consistent with the fact that supportsBackPressure() is defined at the callback 
level.

Fair enough, {{supportsBackPressure()}} will do its job in excluding spurious 
calls anyway.

bq. But the time windows of replicas do not necessarily match, they might be 
staggered.

The algorithm was meant to work at steady state, assuming even key 
distribution, but you made me realize it was nonetheless exposed to subtle 
fluctuations, so thanks for the excellent point :)
It should be now fixed in the latest round of commits.

I'm pretty happy with this latest version, so if you're happy too, I'll move 
forward with a last self-review round and then rebase into trunk.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-22 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389246#comment-15389246
 ] 

Stefania commented on CASSANDRA-9318:
-

bq. Nope, you can actually implement such kind of strategy after my latest 
changes to the state interface: you just have to keep recording per-replica 
state, and then eventually compute a global coordinator state at back-pressure 
application.

Except the state may have nothing to do with a replica but let's leave it for 
now, I'm fine with this.

bq. I ignored it on purpose as that's supposed to be for non mutations.

HintsDispatcher.Callback goes through the other one. Also, it is not consistent 
with the fact that supportsBackPressure() is defined at the callback level.

bq. This can't actually happen. Given the same replica group (token range), the 
replicas are sorted in stable rate limiting order, which means the same 
fast/slow replica will be picked each time for a given back-pressure window; 
obviously, the replica order might change at the next window, but that's by 
design and then all mutations will converge again to the same replica: makes 
sense?

But the time windows of replicas do not necessarily match, they might be 
staggered. For example replicas 1, 2 and 3 acquired a window for mutation A at 
the same time, then replicas 2, 3, 4 are used for mutation B, in this case only 
3 and 4 will acquire the window because 2 acquired it earlier, so the order may 
change even during a window. Or does it work by approximation, assuming we send 
mutations to all replicas frequently enough?


> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-22 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389176#comment-15389176
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

[~Stefania],

bq. OK, so we are basically saying other coordinator-based approaches won't be 
implemented as a back-pressure strategy.

Nope, you can actually implement such kind of strategy after my latest changes 
to the state interface: you just have to keep recording per-replica state, and 
then eventually compute a global coordinator state at back-pressure application.

bq. There's another overload of sendRR

I ignored it on purpose as that's supposed to be for non mutations.

bq. One final thought, now we have N rate limiters, and we call aquire(1) on 
only one of them. For the next mutation we may pick another one and so forth.

This can't actually happen. Given the same replica group (token range), the 
replicas are sorted in stable rate limiting order, which means the same 
fast/slow replica will be picked each time for a given back-pressure window; 
obviously, the replica order might change at the next window, but that's by 
design and then all mutations will converge again to the same replica: makes 
sense?

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-21 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388813#comment-15388813
 ] 

Stefania commented on CASSANDRA-9318:
-

bq. any strategy that doesn't take into account per-node state, is IMHO not a 
proper back-pressure strategy, and can be implemented in the same way as hints 
overflowing is implemented (that is, outside the back-pressure APIs).

OK, so we are basically saying other coordinator-based approaches won't be 
implemented as a back-pressure strategy. Provided [~jbellis] and [~slebresne] 
are fine with this then that's OK with me.

bq. Anyway, I've made the BackPressureState interface more generic so it can 
react on messages when they're about to be sent, which should allow to 
implement different strategies as you mentioned. This should also make the 
current RateBased implementation clearer (see below), which is nice

I like what you did here, thanks.

bq. Apologies, the SP method went through several rewrites in our back and 
forth and I didn't pay enough attention during the last one, totally stupid 
mistake. Should be fixed now, but I've kept the back-pressure application on 
top, as in case the strategy implementation wants to terminate the request 
straight away (i.e. by throwing an exception) it doesn't make sense to 
partially send messages (that is, either all or nothing is IMHO better).

It looks good now. Agreed on keeping back-pressure on top.

bq. The RateBased implementation doesn't "count" outgoing requests when they're 
sent, which would cause the bug you mentioned, ...

I totally forgot this yesterday when I was reasoning about non-local replicas, 
apologies.

bq. This can only happen when a new replica is injected into the system, or if 
it comes back alive after being dead, otherwise we always send mutations to all 
replicas, or am I missing something? There's no lock-free way to remove such 
replicas, because "0 outgoing" is a very transient state, and that's actually 
the point: the algorithm is designed to work at steady state, and after a 
single "window" (rpc timeout seconds) the replica state will have received 
enough updates.

Yes you are correct, we always receive replies, even from non-local replicas.

I was simply suggesting excluding replicas with positive infinity rate from the 
sorted states used to calculate the back pressure, but that's no longer 
relevant.

bq. Please have a look at my changes, I will do one more review pass myself as 
I think some aspects of the rate-based algorithm can be improved, then if we're 
all happy I will go ahead with rebasing to trunk and then moving into testing.

+1 on starting the tests, just two more easy things:

* Something went wrong with the IDE imports at this line 
[here|https://github.com/apache/cassandra/commit/3041d58184a4ee63a04204d3c6228cbaa1c4fd05#diff-0cbd76290040064cf257d855fb5dd779R42],
 there's a bunch of repeated unused imports that need to be removed.

* There's another overload of {{sendRR}} [just 
above|https://github.com/apache/cassandra/commit/3041d58184a4ee63a04204d3c6228cbaa1c4fd05#diff-af09288f448c37a525e831ee90ea49f9R766],
 I think technically we should cover this too even though mutations go through 
the other one.

One final thought, now we have N rate limiters, and we call aquire(1) on only 
one of them. For the next mutation we may pick another one and so forth. So now 
the rate limiters of each replicas have not 'registered' the actual permits 
(one per message) and may not throttle as effectively. Have you considered this 
at all?

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-21 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388282#comment-15388282
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

[~Stefania],

bq. We just need to decide if we are happy having a back pressure state for 
each OutboundTcpConnectionPool, even though back pressure may be disabled or 
some strategies may use different metrics, e.g. total in-flight requests or 
memory, or whether the states should be hidden inside the strategy.

As I said, if you move the states inside the strategy, you make the strategy 
"fattier", and I don't like that. I think keeping the states associated to the 
connections (as a surrogate for the actual nodes) makes sense, because that's 
what back-pressure is about; any strategy that doesn't take into account 
per-node state, is IMHO not a proper back-pressure strategy, and can be 
implemented in the same way as hints overflowing is implemented (that is, 
outside the back-pressure APIs).

Anyway, I've made the {{BackPressureState}} interface more generic so it can 
react on messages when they're about to be sent, which should allow to 
implement different strategies as you mentioned. This should also make the 
current {{RateBased}} implementation clearer (see below), which is nice :)

bq. Now the back pressure is applied before hinting, or inserting local 
mutations, or sending to remote data centers, but after sending to local 
replicas. I think we may need to rework SP.sendToHintedEndpoints a little bit, 
I think we want to fire the insert local, then block, then send all messages.

Apologies, the {{SP}} method went through several rewrites in our back and 
forth and I didn't pay enough attention during the last one, totally stupid 
mistake. Should be fixed now, but I've kept the back-pressure application on 
top, as in case the strategy implementation wants to terminate the request 
straight away (i.e. by throwing an exception) it doesn't make sense to 
partially send messages (that is, either all or nothing is IMHO better).

bq. There is one more potential issue in the case of non local replicas. We 
send the mutation only to one remote replica, which then forwards it to other 
replicas in that DC. So remote replicas may have the outgoing rate set to zero, 
and therefore the limiter rate set to positive infinity, which means we won't 
throttle at all with FAST flow selected.

The {{RateBased}} implementation doesn't "count" outgoing requests when they're 
sent, which would cause the bug you mentioned, but when the response is 
received or the callback is expired, in order to guarantee a consistent 
counting between outgoing and incoming messages, i.e. if outgoing messages are 
more than incoming ones we know that's because of timeouts, not because some 
requests are still in-flight; this makes the algorithm way more stable, and 
also overcomes the problem you mentioned. The latest changes to the state 
interface should hopefully make this implementation detail clearer.

bq. In fact, we have no way of reliably knowing the status of remote replicas, 
or replicas to which we haven't sent any mutations recently, and we should 
perhaps exclude them.

This can only happen when a new replica is injected into the system, or if it 
comes back alive after being dead, otherwise we always send mutations to all 
replicas, or am I missing something? There's no lock-free way to remove such 
replicas, because "0 outgoing" is a very transient state, and that's actually 
the point: the algorithm is designed to work at steady state, and after a 
single "window" (rpc timeout seconds) the replica state will have received 
enough updates.

bq. I think we can start testing once we have sorted the second discussion 
point above, as for the API issues, we'll eventually need to reach consensus 
but we don't need to block the tests for it.

Please have a look at my changes, I will do one more review pass myself as I 
think some aspects of the rate-based algorithm can be improved, then if we're 
all happy I will go ahead with rebasing to trunk and then moving into testing.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-20 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386978#comment-15386978
 ] 

Stefania commented on CASSANDRA-9318:
-

bq. I think if we add too many responsibilities to the strategy it really 
starts to look less and less as an actual strategy; we could rather add a 
"BackPressureManager", but to be honest, the current responsibility assignment 
feels natural to me, and back-pressure is naturally a "messaging layer" 
concern, so I'd keep everything as it is unless you can show any actual 
advantages in changing the design.

I don't think we want a "BackPressureManager". We just need to decide if we are 
happy having a back pressure state for each OutboundTcpConnectionPool, even 
though back pressure may be disabled or some strategies may use different 
metrics, e.g. total in-flight requests or memory, or whether the states should 
be hidden inside the strategy.

bq. This was intentional, so that outgoing requests could have been processed 
while pausing; but it had the drawback of making the rate limiting less 
"predictable" (as dependant on which thread was going to be called next), so I 
ended up changing it.

Now the back pressure is applied before hinting, or inserting local mutations, 
or sending to remote data centers, but _after_ sending to local replicas. I 
think we may need to rework SP.sendToHintedEndpoints a little bit, I think we 
want to fire the insert local, then block, then send all messages. 

There is one more potential issue in the case of *non local replicas*. We send 
the mutation only to one remote replica, which then forwards it to other 
replicas in that DC. So remote replicas may have the outgoing rate set to zero, 
and therefore the limiter rate set to positive infinity, which means we won't 
throttle at all with FAST flow selected. In fact, we have no way of reliably 
knowing the status of remote replicas, or replicas to which we haven't sent any 
mutations recently,  and we should perhaps exclude them.

bq. It's not a matter of making the sort stable, but a matter of making the 
comparator fully consistent with equals, which should be already documented by 
the treeset/comparator javadoc.

It's in the comparator javadoc but not the TreeSet constructor odc, that's why 
I missed it yesterday. Thanks.

bq. Agreed. If we all (/cc Aleksey Yeschenko Sylvain Lebresne Jonathan Ellis) 
agree on the core concepts of the current implementation, I will rebase to 
trunk, remove the trailing spaces, and then I would move into testing it, and 
in the meantime think/work on adding metrics and making any further adjustments.

I think we can start testing once we have sorted the second discussion point 
above, as for the API issues, we'll eventually need to reach consensus but we 
don't need to block the tests for it.

One more tiny nit:

* The documentation of {{RateBasedBackPressureState}} still mentions the 
overloaded flag.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-20 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386265#comment-15386265
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

[~Stefania],

I've addressed most of your concerns in my latest two commits, also see my 
following comments:

bq. If we removed MessagingService.getBackPressurePerHost(), which doesn't seem 
to be used, then we can at least remove getBackPressureRateLimit and getHost 
from BackPressureState, leaving it with only onMessageSent and 
onMessageReceived.

All of those are used for JMX monitoring (which will be improved more) or 
logging, so I'd keep them.

bq. We can also consider moving the states internally to the strategy

I think if we add too many responsibilities to the strategy it really starts to 
look less and less as an actual strategy; we could rather add a 
"BackPressureManager", but to be honest, the current responsibility assignment 
feels natural to me, and back-pressure is naturally a "messaging layer" 
concern, so I'd keep everything as it is unless you can show any actual 
advantages in changing the design.

bq. We are also applying backpressure after sending the messages, is this 
intentional?

This _was_ intentional, so that outgoing requests could have been processed 
while pausing; but it had the drawback of making the rate limiting less 
"predictable" (as dependant on which thread was going to be called next), so I 
ended up changing it.

bq. In the very same comparator lambda, if 2 states are not equal but have the 
exact same rate limit, then I assume it is intentional to compare them by hash 
code in order to get a stable sort? If so we should add a comment since tree 
set only requires the items to be mutually comparable and therefore returning 
zero would have been allowed as well I believe.

It's not a matter of making the sort stable, but a matter of making the 
comparator fully consistent with {{equals}}, which should be already documented 
by the treeset/comparator javadoc.

bq. One failure that needs to be looked at:

Fixed!

bq. Let's test the strategy first, then metrics can be added later on.

Agreed. If we all (/cc [~iamaleksey] [~slebresne] [~jbellis]) agree on the core 
concepts of the current implementation, I will rebase to trunk, remove the 
trailing spaces, and then I would move into testing it, and in the meantime 
think/work on adding metrics and making any further adjustments.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-19 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385280#comment-15385280
 ] 

Stefania commented on CASSANDRA-9318:
-

The current implementation is still not that generic:

* If we removed {{MessagingService.getBackPressurePerHost()}}, which doesn't 
seem to be used, then we can at least remove {{getBackPressureRateLimit}} and 
{{getHost}} from {{BackPressureState}}, leaving it with only {{onMessageSent}} 
and {{onMessageReceived}}.

* We can also consider moving the states internally to the strategy, at the 
cost of an O(1) lookup, or at least encapsulate 
{{MessagingService.updateBackPressureState}} so that the states become opaque. 
Eventually we could have an update type (message received, memory threshold 
reached, etc) and so that strategies can only deal with the type of updates 
they require.

* The apply method doesn't leave much room in terms of what to do, we can 
basically sleep after a set of replicas has been selected by SP or throw an 
exception. We are also applying backpressure _after sending the messages_, is 
this intentional?

Some other nits:

* In RateBasedBackPressure some fields can have a package based access rather 
than public/protected and the static in the flow enum declaration is redundant.

* We may want to add the flow to the logger.info message "Initialized 
back-pressure..." in the constructor of RateBasedBackPressure.

* In {{RateBasedBackPressure.apply(Iterable 
states)}}, casting the lambda to {{Comparator}} is 
unnecessary.

* In the very same comparator lambda, if 2 states are not equal but have the 
exact same rate limit, then I assume it is intentional to compare them by hash 
code in order to get a stable sort? If so we should add a comment since tree 
set only requires the items to be mutually comparable and therefore returning 
zero would have been allowed as well I believe.

* {{RateBasedBackPressureState}} can be package local, {{windowSize}} can be 
private and {{getLastAcquire()}} can be package local.

* {{sortOrder}} in {{MockBackPressureState}} is not used.

* Several trailing spaces.

One 
[failure|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-3.0-dtest/lastCompletedBuild/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_bulk_round_trip_with_timeouts_2/]
 that needs to be looked at:

{code}
ERROR [SharedPool-Worker-1] 2016-07-19 12:40:38,257 QueryMessage.java:128 - 
Unexpected error during query
java.lang.IllegalArgumentException: Size should be greater than precision.
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:122) 
~[guava-18.0.jar:na]
at 
org.apache.cassandra.utils.SlidingTimeRate.(SlidingTimeRate.java:51) 
~[main/:na]
at 
org.apache.cassandra.net.RateBasedBackPressureState.(RateBasedBackPressureState.java:65)
 ~[main/:na]
at 
org.apache.cassandra.net.RateBasedBackPressure.newState(RateBasedBackPressure.java:205)
 ~[main/:na]
at 
org.apache.cassandra.net.RateBasedBackPressure.newState(RateBasedBackPressure.java:43)
 ~[main/:na]
at 
org.apache.cassandra.net.MessagingService.getConnectionPool(MessagingService.java:649)
 ~[main/:na]
at 
org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:663)
 ~[main/:na]
at 
org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:812) 
~[main/:na]
at 
org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:755) 
~[main/:na]
at 
org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:733) 
~[main/:na]
at 
org.apache.cassandra.service.StorageProxy.truncateBlocking(StorageProxy.java:2380)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.TruncateStatement.execute(TruncateStatement.java:71)
 ~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:206)
 ~[main/:na]
{code}

bq. Metrics can be added in this ticket once we settle on the core algorithm, 
but then yes any reporting mechanism to clients should be probably dealt with 
separately as it would probably involve changes to the native protocol (and I'm 
not sure what's the usual procedure in such case).

Let's test the strategy first, then metrics can be added later on. Let's not 
worry about native protocol changes for now, since we are no longer throwing 
exceptions.

bq. Regarding the memory-threshold strategy, I would like to stress once again 
the fact it's a coordinator-only back-pressure mechanism which wouldn't 
directly benefit replicas; also, Ariel Weisberg showed in his tests that such 
strategy isn't even needed given the local hints back-pressure/overflowing 
mechanisms already implemented. So my vote goes against it, but if you/others 
really want it, I'm perfectly fine implementing it, but it will have to be in 
this ticket, as it requires some

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-18 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382114#comment-15382114
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

I've pushed a new update based on the most recent discussions, more 
specifically:
* I've implemented (and locally tested) rate limiting based on either the 
fastest or slowest replica; rate limiting on the average rate required 
computing such rate for all replicas in the whole cluster each RPC timeout 
interval (2 secs by default), which I deemed to be not worth it.
* I've removed the low ratio configuration: if we aim to keep the cluster as 
"responsive" as possible, throwing exceptions or prematurely hinting doesn't 
make sense.
* {{SP.sendToHintedEndpoints}} is largely unchanged: given we only rate limit 
by a single (fastest/slowest) replica, sorting the endpoints by back-pressure 
doesn't make sense either.

Answering to [~Stefania]'s latest comments:

bq. The proposal doesn't address: "The API should provide for reporting load to 
clients so they can do real load balancing across coordinators and not just 
round-robin."

This is misleading: the implemented back-pressure mechanism is cross-replica, 
so load balancing to different coordinators would be useless.

bq. we can add metrics or another mechanism later on in follow up tickets, as 
well as consider implementing the memory-threshold strategy.

Metrics can be added in this ticket once we settle on the core algorithm, but 
then yes any reporting mechanism to clients should be probably dealt with 
separately as it would probably involve changes to the native protocol (and I'm 
not sure what's the usual procedure in such case).

Regarding the memory-threshold strategy, I would like to stress once again the 
fact it's a coordinator-only back-pressure mechanism which wouldn't directly 
benefit replicas; also, [~aweisberg] showed in his tests that such strategy 
isn't even needed given the local hints back-pressure/overflowing mechanisms 
already implemented. So my vote goes against it, but if you/others really want 
it, I'm perfectly fine implementing it, but it will have to be in this ticket, 
as it requires some more changes to the back-pressure API.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-17 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381623#comment-15381623
 ] 

Stefania commented on CASSANDRA-9318:
-

I'm happy with the suggested way forward. It removes exceptions and it makes 
the rate based strategy more tunable rather than forcing everything to the 
slowest replica rate. 

The proposal doesn't address:

bq. The API should provide for reporting load to clients so they can do real 
load balancing across coordinators and not just round-robin.

However, we can add metrics or another mechanism later on in follow up tickets, 
as well as consider implementing the memory-threshold strategy.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-15 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379014#comment-15379014
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

Based on everyone's feedback above, I propose to move forward with these 
changes:
* Break the {{BackPressureStrategy}} apply method into two, one to compute the 
back-pressure state and return it sorted in ascending order (from lower 
back-pressure to higher/overloader), the other to actually apply it.
* Apply back-pressure on all nodes/states together, and have 
{{RateBasedBackPressure}} apply it based on the faster, slower, or average 
rate, depending on configuration: this is to "soften" [~jbellis]' concern about 
always limiting at the slowest rate.
* Rework {{SP.sendToHintedEndpoints}} a little bit to send mutations in the 
order returned by the {{BackPressureStrategy}}: this is a minor optimization 
based on the reasoning that "slower replicas" might be also slower to accept 
writes at the TCP level, and I can skip it if people prefer not to touch SP too 
much.
* Also have {{SP.sendToHintedEndpoints}} directly hint overloaded replicas 
rather than back-pressure them or throw exception; again, this is an 
optimization to avoid wasting time with dealing with too slow replicas.

Regarding the memory-threshold strategy, I'll hold on implementing it until we 
consolidate the points above.

Regarding any changes related to the write request path going fully 
non-blocking, code placement will probably change but all such core concepts 
will stay the same, and I'll be happy to work on any changes needed 
(CASSANDRA-8457 is another one to keep an eye on /cc [~jasobrown]).

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-14 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378784#comment-15378784
 ] 

Stefania commented on CASSANDRA-9318:
-

I don't have much to add regarding the merits of one approach vs. the other, 
other than to say that I agree with [~slebresne] that we should implement the 
strategy API so that both strategies can be supported, and this will make it 
more likely that the API is fit for even more strategies. I would even go one 
step further and suggest that the second strategy should be relatively easy to 
implement if the framework is in place and if we can work out a reasonable 
threshold. Therefore we could consider implementing and testing both, either as 
part of a follow up ticket or this one.

Another thing I would like to point out is that, once we make read and write 
requests fully non-blocking, either via CASSANDRA-10993 or CASSANDRA-10528, we 
will probably have to rethink this.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-14 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15377074#comment-15377074
 ] 

Ariel Weisberg commented on CASSANDRA-9318:
---

I incorrectly thought request threads block until all responses are received. 
They only block until enough responses to satisfy the consistency level are 
received. So running out of request threads is not always going to be an issue 
because you have enough slow nodes involved in the request.

So rate limiting will work in that it can artificially increase your CL (not 
really but still) to reduce throughput and avoid timeouts. This will also have 
the effect of preventing the coordinator from using more memory because request 
threads can't meet their CL and move on to process new requests. Instead they 
will block in a rate limiter. 

My measurements showed that you can provision enough memory to let the timeout 
kick in so I am not sure that is a useful behavior. Sure it eliminates 
timeouts, but if that is the end goal maybe we need a consistency level that is 
something like CL.ALL but tolerates unavailable nodes. That would have the same 
effect without rate limiting. It's still a partial solution because you can't 
write at full speed to ranges that don't contain slow nodes.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-14 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376955#comment-15376955
 ] 

Ariel Weisberg commented on CASSANDRA-9318:
---

There really isn't much memory to play with when deciding when to backpressure. 
There are 128 requests threads and once those are all consumed by a slow node, 
which doesn't take long in a small cluster, things stall completely. If things 
were async you might be able to commit enough memory that requests time out 
before you need to stall. In other words you can shed via timeouts to nodes and 
no additional mechanisms are needed.

Not reading from clients doesn't address the issue. You have still created a 
situation in which nodes that are performing well can't make progress because 
they can no longer read requests from clients because of one slow node. Not 
reading from clients is the current implementation.

Hinting as it works now doesn't address the issue because the slow node may 
never actually catch up or become faster. Waiting for every request that is 
going to time out to time out and be hinted is going to restrict the 
coordinators ability to coordinate. Hinting also doesn't work because there are 
only 128 concurrent requests that can be in the process of being hinted see 
paragraph #1.

If the coordinator wants to continue to make progress it has to read requests 
from clients and then quickly know if it should shed them. We could shed them 
silently in which case the upstream client is going to time out and it's going 
to exhaust it's memory or threadpool and we have silently and unfixably moved 
the problem upstream. I suppose clients can try and implement their own health 
metrics to duplicate the work we are doing at the coordinator, but it still 
can't force the coordinator to shed so the client can replaced those requests 
that won't succeed with ones that will.

Or we can signal that we aren't going to do that request at this time and the 
client can engage whatever mitigation strategy it wants to implement. There is 
a whole separate discussion about what the state of the art needs to be in 
client drivers to do something useful with this information and how to expose 
the mechanism and policy choices to applications.

Rate limiting isn't really useful. You just end up with all the request threads 
stuck in the rate limiter and coordinators continue to not make progress. Rate 
limiting doesn't solve a load issue at the remote end because as I've 
demonstrated the remote end can buffer up enough requests until shedding kicks 
in due to the timeout and reduces memory utilization to something the heap can 
handle.

If things were async what would rate limiting look like? Would it be disabling 
read for clients? How is the coordinator going to make progress then if it 
can't coordinate requests for healthy nodes?

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-14 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376629#comment-15376629
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

[~jbellis],

yes I think we're going in circles, and probably either one or the other is 
missing each other point.

You said:

bq. Pick a number for how much memory we can afford to have taken up by in 
flight requests (remembering that we need to keep the entirely payload around 
for potential hint writing) as a fraction of the heap, the way we do with 
memtables or key cache. If we hit that mark we start throttling new requests 
and only accept them as old ones drain off.

I'll try a last attempt at explaining why _in my opinion_ this is worse _in all 
accounts_ via an example.

Say you have:
* U: the memory unit to express back-pressure.
* T: the write RPC timeout.
* 3 nodes cluster with RF=2.
* CL.ONE.

Say the coordinator memory threshold is at 10U, and clients hit it with 
requests worth 20Us, and for simplicity the coordinator is not a replica, which 
means it has to accommodate 40U of inflight requests. At some point P < T, the 
first replica answers all of them, while the second replica answers only half, 
which means the memory threshold is met at 10U requests, which will be drained 
when either the replica answers (let's call this time interval R) or T elapses. 
This means:
1) During a time period equal to min(R,T) you'll have 0 throughput. This is 
made worse if you actually have to wait for the T to elapse, as it gets 
proportional to T. 
2) If replica 2 keeps exhibiting the same slow behaviour, you'll keep having a 
throughput profile of high peaks and 0 valleys; this is bad not just because of 
the 0 valleys, but also because during the high peaks the slow replica will end 
up dropping 10U worth of mutations.

Both #1 and #2 look pretty bad outcomes to me and definitely no better than 
throttling at the speed of the slowest replica.

Speaking of which, how would the other algorithm work in such case? Given the 
same scenario above, we'd have the following:
1) During the first T period (T1), there will be no back-pressure (this is 
because the algorithm works at time windows of size T), so nothing changes from 
current behaviour.
2) At T2, it will start throttling at the slow replica rate of 10U (obviously 
the actual throttling is not memory based in this case but I'll keep the same 
notation for simplicity): this means that the sustained throughput during T2 
will be 10U.
3) For all Tn > T2, throughput will *gradually* increase and eventually reduce, 
but without ever touching 0, which means _no high peaks causing high dropped 
mutations rate_, _no 0 valleys of length T_.

Hope this clarifies my reasoning, and please let me know if/how my example 
above is flawed (that is, if I keep missing your point).

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-14 Thread Avi Kivity (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376513#comment-15376513
 ] 

Avi Kivity commented on CASSANDRA-9318:
---

FWIW this is exactly what we do.  It's not perfect but it seems to work.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-13 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375853#comment-15375853
 ] 

Jonathan Ellis commented on CASSANDRA-9318:
---

bq. I can't see any better options than what we implement in this patch for 
those use cases willing to trade performance for overall stability

I feel like we're going in circles here.  Here is a better option:

Pick a number for how much memory we can afford to have taken up by in flight 
requests (remembering that we need to keep the entirely payload around for 
potential hint writing) as a fraction of the heap, the way we do with memtables 
or key cache.  If we hit that mark we start throttling new requests and only 
accept them as old ones drain off.

This has the following benefits:

# Strictly better than the status quo.  I.e., does not make things worse where 
the current behavior is fine (single replica misbehaves, we write hints but 
don't slow things down), and makes things better where the current behavior is 
not (throttle instead of falling over OOM).
# No client side logic is required, all the client sees is slower request 
acceptance when throttling kicks in.
# Gives us a metric we can expose to clients to improve load balancing.
# Does not require a lot of tuning.  (If the system is overloaded it will 
eventually reach even a relatively high mark.  If it doesn't, well, you're not 
going to OOM so you don't need to throttle.)


> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-13 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375752#comment-15375752
 ] 

Jonathan Ellis commented on CASSANDRA-9318:
---

bq. if we make the strategy a bit more generic as mentioned above so the 
decision is made from all replica involved (maybe the strategy should also keep 
track of the replica-state completely internally so we can implement basic 
strategy like having a simple high watermark very easy), and we make sure to 
not throttle too quickly (typically, if a single replica is slow and we don't 
really need it, start by just hinting him), then I'd be happy moving to the 
"actually test this" phase and see how it goes.

I suppose that's reasonable in principle, with some caveats:

# Throwing exceptions shouldn't be part of the API.  OverloadedException dates 
from the Thrift days, where our flow control options were very limited and this 
was the best we could do to tell clients, "back off."  Now that we have our own 
protocol and full control over Netty we should simply not read more requests 
until we shed some load.  (Since shedding load is a gradual process--requests 
time out, we write hints, our load goes down--clients will just perceive this 
as slowing down, which is what we want.)
# The API should provide for reporting load to clients so they can do real load 
balancing across coordinators and not just round-robin.
# Throttling requests to the speed of the slowest replica is not something we 
should ship, even as an option.


> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-13 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375191#comment-15375191
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

[~slebresne],

bq. if we make the strategy a bit more generic as mentioned above so the 
decision is made from all replica involved (maybe the strategy should also keep 
track of the replica-state completely internally so we can implement basic 
strategy like having a simple high watermark very easy)

I'm already separating the "computation" phase from the "application", in order 
to address some previous points from my discussion with [~Stefania], so I think 
that should do it.

bq. and we make sure to not throttle too quickly (typically, if a single 
replica is slow and we don't really need it, start by just hinting him)

This can be easily implemented in the strategy itself as a parameter, i.e. 
"back-pressure cycles before actually starting to rate limit", so I'll do this 
eventually later.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-13 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375176#comment-15375176
 ] 

Sylvain Lebresne commented on CASSANDRA-9318:
-

bq. At this point instead of adding more complexity to an approach that 
fundamentally doesn't solve that, why not back up and use an approach that does 
the right thing in all 3 cases instead?

My understanding of the fundamentals of Sergio's approach is to:
# maintain, on the coordinator, a state for each node that keep track of how 
much in-flight query we have for that node.
# on a new write query, check the state for the replicas involved in that query 
to decide what to do (when to hint the node, when to start rate limiting or 
when to start rejecting the queries to the client).

In that sense, I don't think the approach is fundamentally wrong but I feel the 
main question is on the "what to do (and when)". And as I'm not sure there is a 
single perfect answer for that, I do also like the approach of a strategy, if 
only so experimentation is easier (though technically, instead of just having 
an {{apply()}} that potentially throws or sleep, I think the strategy should 
take the replica for the query, and return a list of nodes to query and one to 
hint (preserving the ability to sleep or throw) to get more options on the 
"what to do", and not making backpressure a node-per-node thing).

In term of the "default" back-pressure strategy we provide, I agree that we 
should mostly try to solve the scenario 3: we should define some condition 
where we consider things overloaded and only apply back-pressure from there. 
Not sure what that exact condition is btw, but I'm not convinced we can come 
with a good one out of thin air, I think we need to experiment.

tl;dr, if we make the strategy a bit more generic as mentioned above so the 
decision is made from all replica involved (maybe the strategy should also keep 
track of the replica-state completely internally so we can implement basic 
strategy like having a simple high watermark very easy), and we make sure to 
not throttle too quickly (typically, if a single replica is slow and we don't 
really need it, start by just hinting him), then I'd be happy moving to the 
"actually test this" phase and see how it goes.


> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-13 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375132#comment-15375132
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

[~jbellis],

bq. it causes other problems in the other two (non-global-overload) scenarios.

I think you are overstating the problem here, because the first two scenarios 
are either very limited in time (the first), or very limited in magnitude (the 
second), and the back-pressure algorithm is configurable to be as sensitive and 
as reactive as you wish, by tuning the incoming/outgoing imbalance you want to 
tolerate, and the growth factor.

bq. I honestly don't see what is "better" about a "slow every write down to the 
speed of the slowest, possibly sick, replica" approach. Defining a simple high 
water mark on requests in flight should be much simpler without the negative 
side effects.

Such kind of threshold would be too arbitrary and coarse grained, but that's 
not even the problem; the point is rather what you're going to do when the 
threshold is met. That is, say the high water mark is met, we really have these 
options:
1) Throttle at the rate of the slow replicas, which is what we do in this patch.
2) Take the slow replica(s) out, which is even worse in terms of availability.
3) Rate limit the message dequeueing in the outbound connection, but this only 
moves the back-pressure problem from a place to another.
4) Rate limit at a global rate equal to the water mark, but this only helps the 
coordinator, as such rate might still be too high for the slow replicas.

In the end, I can't see any better options than what we implement in this patch 
for those use cases willing to trade performance for overall stability, and I 
would at least have it go through proper QA testing, to see how it behaves on 
larger clusters, fix any sharp edges, and see how it stands overall.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-13 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375012#comment-15375012
 ] 

Jonathan Ellis commented on CASSANDRA-9318:
---

bq. Hints are not a solution for chronically overloaded clusters where clients 
ingest faster than replicas can consume

That is the situation I describe in scenario 3, which is the problem I opened 
this ticket to solve.  So, I agree that scenario is a problem, but I don't 
think this proposal is a very good solution for that, and it causes other 
problems in the other two (non-global-overload) scenarios.

bq. I think we do solve that, actually in a better way, which takes into 
consideration all replicas, not just the coordinator capacity of acting as a 
buffer, unless I'm missing a specific case you're referring to?

I honestly don't see what is "better" about a "slow every write down to the 
speed of the slowest, possibly sick, replica" approach.  Defining a simple high 
water mark on requests in flight should be much simpler without the negative 
side effects.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-12 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374158#comment-15374158
 ] 

Stefania commented on CASSANDRA-9318:
-

bq. Right, but there isn't much we can do without way more invasive changes. 
Anyway, I don't think that's actually a problem, as if the coordinator is 
overloaded we'll end up generating too many hints and fail with 
OverloadedException (this time with its original meaning), so we should be 
covered.

I tend to agree that it is an approximation we can live with; I also would 
rather not change the lower levels of messaging service for this.

bq. Does it mean we should advance the protocol version in this issue, or 
delegate to a new issue?

We have a number of issues waiting for protocol V5, they are labeled as 
{{protocolv5}}. Either we make this issue dependent on V5 as well or, since we 
are committing this as disabled, we delegate to a new issue that is dependent 
on V5.

bq. Do you see any complexity I'm missing there?

A new flag would involve a new version and it would need to be handled during 
rolling upgrades. Even if on its own it is not too complex, the system in its 
entirety becomes even more complex (different versions, compression, 
cross-node-timeouts, some verbs are droppable, others aren't and the list goes 
on). Unless it solves a problem, I don't think we should consider it; and we 
are saying in other parts of this conversation that hints are no longer a 
problem. 

bq. as the advantage would be increased consistency at the expense of more 
resource consumption, 

We don't increase consistency if the client has been told the mutation failed 
IMO. If we are instead referring to replicas that were out of the CL pool and 
temporarily overloaded, I think they are better off dropping mutations and 
handling them later on through hints. Basically, I see dropping mutations 
replica side as a self defense mechanism for replicas, I don't think we should 
remove it, rather we should focus on a backpressure strategy such that replicas 
don't need to drop mutations. Also, for the time being, I'd rather focus on the 
major issue, which is that we haven't reached consensus on how to apply 
backpressure yet, and propose this new idea in a follow up ticket if 
backpressure is successful.

bq. These are valid concerns of course, and given similar concerns from 
Jonathan Ellis, I'm working on some changes to avoid write timeouts due to 
healthy replicas unnaturally throttled by unhealthy ones, and depending on 
Jonathan Ellis answer to my last comment above, maybe only actually 
back-pressure if the CL is not met.

OK, so we are basically trying to address the 3 scenarios by throttling/failing 
only if the system as a whole cannot handle the mutations (that is at least CL 
replicas are slow/overloaded) whereas if less than CL replicas are 
slow/overloaded, those replicas get hinted?

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-12 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373750#comment-15373750
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

[~jbellis],

bq. I think you just explained why that's not a very good reason to add more 
complexity here, at least not without a demonstration that it's actually still 
a problem.

Hints are not a solution for chronically overloaded clusters where clients 
ingest faster than replicas can consume: even if hints get delivered timely and 
reliably, replicas will always play catchup if there's no back-pressure. I find 
this pretty straightforward, but as a practical example, try injecting the 
attached byteman rule into a cluster and see it fall over at a rate of hundred 
of thousand dropped mutations per minute. 

bq. But CL is per request. How do you disentangle that client side?

Not sure I follow your objection, can you elaborate?

bq. And we're still not solving what I think is (post file-based hints) the 
real problem, my scenario 3.

I think we do solve that, actually in a better way, which takes into 
consideration all replicas, not just the coordinator capacity of acting as a 
buffer, unless I'm missing a specific case you're referring to?

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-12 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373696#comment-15373696
 ] 

Jonathan Ellis commented on CASSANDRA-9318:
---

bq. I have interacted with many people using Cassandra that would actually like 
to see some rate limiting applied for cases 1 and 2 such that things don't fall 
over (shouldn't happen with new hints hopefully) 

I think you just explained why that's not a very good reason to add more 
complexity here, at least not without a demonstration that it's actually still 
a problem.

bq. What if we tailored the algorithm to only: Rate limit if CL replicas are 
below the high threshold / Throw exception if CL replicas are below the low 
threshold.

But CL is per request.  How do you disentangle that client side?

And we're still not solving what I think is (post file-based hints) the real 
problem, my scenario 3.  At this point instead of adding more complexity to an 
approach that fundamentally doesn't solve that, why not back up and use an 
approach that does the right thing in all 3 cases instead?

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-12 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373624#comment-15373624
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

[~Stefania],

bq. if a message expires before it is sent, we consider this negatively for 
that replica, since we increment the outgoing rate but not the incoming rate 
when the callback expires, and still it may have nothing to do with the replica 
if the message was not sent, it may be due to the coordinator dealing with too 
many messages.

Right, but there isn't much we can do without way more invasive changes. 
Anyway, I don't think that's actually a problem, as if the coordinator is 
overloaded we'll end up generating too many hints and fail with 
{{OverloadedException}} (this time with its original meaning), so we should be 
covered.

bq. I also observe that if a replica has a low rate, then we may block when 
acquiring the limiter, and this will indirectly throttle for all following 
replicas, even if they were ready to receive mutations sooner.

See my answer at the end.

bq. AbstractWriteResponseHandler sets the start time in the constructor, so the 
time spent acquiring a rate limiter for slow replicas counts towards the total 
time before the coordinator throws a write timeout exception.

See my answer at the end.

bq. SP.sendToHintedEndpoints(), we should apply backpressure only if the 
destination is alive.

I know, I'm holding on these changes until we settle on a plan for the whole 
write path (in terms of what to do with CL, the exception to be thrown etc.).

bq. Let's use UnavailableException since WriteFailureException indicates a 
non-timeout failure when processing a mutation, and so it is not appropriate 
for this case. For protocol V4 we cannot change UnavailableException, but for 
V5 we should add a new parameter to it. At the moment it contains 
, we should add the number of overloaded replicas, so that 
drivers can treat the
two cases differently.

Does it mean we should advance the protocol version in this issue, or delegate 
to a new issue?

bq. Marking messages as throttled would let the replica know if backpressure 
was enabled, that's true, but it also makes the existing mechanism even more 
complex.

How so? In implementation terms, it should be literally as easy as:
1) Add a byte parameter to {{MessageOut}}.
2) Read such byte parameter from {{MessageIn}} and eventually skip dropping it 
replica-side.
3) If possible (didn't check it), when a "late" response is received on the 
coordinator, try to cancel the related hint.

Do you see any complexity I'm missing there?

bq. dropping mutations that have been in the queue for longer that the RPC 
write timeout is done not only to shed load on the replica, but also to avoid 
wasting resources to perform a mutation when the coordinator has already 
returned a timeout exception to the client.

This is very true and that's why I said it's a bit of a wild idea. Obviously, 
that is true outside of back-pressure, as even now it is possible to return a 
write timeout to clients and still have some or all mutations applied. In the 
end, it might be good to optionally enable such behaviour, as the advantage 
would be increased consistency at the expense of more resource consumption, 
which is a tradeoff some users might want to make, but to be clear, I'm not 
strictly lobbying to implement it, just trying to reason about pros and cons.

bq. I still have concerns regarding additional write timeout exceptions and 
whether an overloaded or slow replica can slow everything down.

These are valid concerns of course, and given similar concerns from [~jbellis], 
I'm working on some changes to avoid write timeouts due to healthy replicas 
unnaturally throttled by unhealthy ones, and depending on [~jbellis] answer to 
my last comment above, maybe only actually back-pressure if the CL is not met.

Stay tuned.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-12 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373547#comment-15373547
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

[~jbellis],

bq. Put another way: we do NOT want to limit performance to the slowest node in 
a set of replicas. That is kind of the opposite of the redundancy we want to 
provide.

What if we tailored the algorithm to only:
* Rate limit if CL replicas are below the high threshold.
* Throw exception if CL replicas are below the low threshold.

By doing so, C* would behave the same provided at least CL replicas behave 
normally.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-12 Thread Jeremiah Jordan (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373537#comment-15373537
 ] 

Jeremiah Jordan commented on CASSANDRA-9318:


[~jbellis] then we can default this to off.

I have interacted with *many* people using Cassandra that would actually like 
to see some rate limiting applied for cases 1 and 2 such that things don't fall 
over (shouldn't happen with new hints hopefully) or even hint like crazy.  
Turning this on would allow that to happen.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-12 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373512#comment-15373512
 ] 

Jonathan Ellis commented on CASSANDRA-9318:
---

The more I think about it the more I think the entire approach may be a bad fit 
for Cassandra.  Consider:

# If a node has a "hiccup" of slow performance, e.g. due to a GC pause, we want 
to hint those writes and return success to the client.  No need to rate limit.
# If a node has a sustained period of slow performance, we want to hint those 
writes and return success to the client.  No need to rate limit, unless we are 
overwhelmed with hints.  (Not sure if hint overload is actually a problem with 
the new file based hints.)
# Where we DO want to rate limit is when the client is throwing more updates at 
the coordinator than the system can handle, whether that is for a single node 
or globally across many nodes.

So I see this approach as doing the wrong thing for 1 and 2 and only partially 
helping with 3.

Put another way: we do NOT want to limit performance to the slowest node in a 
set of replicas.  That is kind of the opposite of the redundancy we want to 
provide.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-11 Thread Stefania (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372114#comment-15372114
]

Stefania commented on CASSANDRA-9318:
-

bq. But can that really happen? ResponseVerbHandler returns before incrementing
back-pressure if the callback is null (i.e. expired), and OutboundTcpConnection
doesn't even send outbound messages if they're timed out, or am I missing
something?

You're correct, we won't count twice because the callback is already null.
However, this raises another point, if a message expires before it is sent, we
consider this negatively for that replica, since we increment the outgoing rate
but not the incoming rate when the callback expires, and still it may have
nothing to do with the replica if the message was not sent, it may be due to
the coordinator dealing with too many messages.

bq. Again, I believe this would make enabling/disabling back-pressure via JMX
less user friendly.

Fine, let's keep the boolean since it makes life easier for JMX.

bq. I do not think sorting replicas is what we really need, as you have to send
the mutation to all replicas anyway. I think what you rather need is a way to
pre-emptively fail if the write consistency level is not met by enough
"non-overloaded" replicas, i.e.:

You're correct in that the replicas are not sorted in the write path, only in
the read path. I confused the two yesterday. For sure we need to only fail if
the write consistency level is not met. I also observe that if a replica has a
low rate, then we may block when acquiring the limiter, and this will
indirectly throttle for all following replicas, even if they were ready to
receive mutations sooner. Therefore, even a single overloaded or slow replica
may slow the entire write operation. Further, AbstractWriteResponseHandler sets
the start time in the constructor, so the time spent acquiring a rate limiter
for slow replicas counts towards the total time before the coordinator throws a
write timeout exception. So, unless we increase the write RPC timeout or change
the existing behavior, we may observe write timeout exceptions and, at CL.ANY,
hints.

Also, in SP.sendToHintedEndpoints(), we should apply backpressure only if the
destination is alive.

{quote}
This leaves us with two options:

Adding a new exception to the native protocol.
Reusing a different exception, with WriteFailureException and
UnavailableException the most likely candidates.

I'm currently leaning towards the latter option.
{quote}

Let's use UnavailableException since WriteFailureException indicates a
non-timeout failure when processing a mutation, and so it is not appropriate
for this case. For protocol V4 we cannot change UnavailableException, but for
V5 we should add a new parameter to it. At the moment it contains
{{}}, we should add the number of overloaded replicas, so
that drivers can treat the
two cases differently. Another alternative, as suggested by [~slebresne], is to
simply consider overloaded replicas as dead and hint them, therefore throwing
unavailable exceptions as usual, but this is slightly less accurate then
letting clients know that some replicas were unavailable and some simply
overloaded.

bq. We only need to ensure the coordinator for that specific mutation has
back-pressure enabled, and we could do this by "marking" the MessageOut with a
special parameter, what do you think?

Marking messages as throttled would let the replica know if backpressure was
enabled, that's true, but it also makes the existing mechanism even more
complex. Also, as far as I understand it, dropping mutations that have been in
the queue for longer that the RPC write timeout is done not only to shed load
on the replica, but also to avoid wasting resources to perform a mutation when
the coordinator has already returned a timeout exception to the client. I think
this still holds true regardless of backpressure. Since we cannot remove a
timeout check in the write response handlers, I don't see how it helps to drop
it replica side. If the message was throttled, even with cross_node_timeout
enabled, the replica should have time to process it before the RPC write
timeout expires, so I don't think the extra complexity is justified.

bq. If you all agree with that, I'll move forward and make that change.

To summarize, I agree with this change, provided the drivers can separate the
two cases (node unavailable vs. node overloaded), which they will be able to do
with V5 of the native protocol. The alternative, would be to simply consider
overloaded replicas as dead and hint them. Further, I still have concerns
regarding additional write timeout exceptions and whether an overloaded or slow
replica can slow everything down. [~slebresne], [~jbellis] anything else from
your side? I think Jonathan's proposal of bounding total outstanding requests
to all replicas, is

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-11 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371063#comment-15371063
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

bq. One thing that worries me is, how do you distinguish between “node X is 
slow because we are writing too fast and we need to throttle clients down” and 
“node X is slow because it is dying, we need to ignore it and accept writes 
based on other replicas?”
I.e. this seems to implicitly push everyone to a kind of CL.ALL model once your 
threshold triggers, where if one replica is slow then we can't make progress.

This was already noted by [~slebresne], and you're both right, this initial 
implementation is heavily biased towards my specific use case :)
But, the above proposed solution should fix it:

bq. I think what you rather need is a way to pre-emptively fail if the write 
consistency level is not met by enough "non-overloaded" replicas, i.e.: If 
CL.ONE, fail if all replicas are overloaded...

Also, the exception would be sent to the client only if the low threshold is 
met, and only the first time it is met, for the duration of the back-pressure 
window (write RPC timeout), i.e.:
* Threshold is 0.1, outgoing requests are 100, incoming responses are 10, ratio 
is 0.1.
* Exception is thrown by all write requests for the current back-pressure 
window.
* The outgoing rate limiter is set at 10, which means the next ratio 
calculation will approach the sustainable rate, and even if replicas will still 
lag behind, the ratio will not go down to 0.1 _unless_ the incoming rate 
dramatically goes down to 1.

This is to say the chances of getting a meltdown due to "overloaded exceptions" 
is moderate (also, clients are supposed to adjust themselves when getting such 
exception), and the above proposal should make things play nicely with the CL 
too.

If you all agree with that, I'll move forward and make that change.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-11 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371010#comment-15371010
 ] 

Jonathan Ellis commented on CASSANDRA-9318:
---

If I understand this approach correctly, you're looking at the ratio of sent to 
acknowledged writes per replica, and throwing Unavailable if that gets too low 
for a given replica.  Very clever.

One thing that worries me is, how do you distinguish between “node X is slow 
because we are writing too fast and we need to throttle clients down” and “node 
X is slow because it is dying, we need to ignore it and accept writes based on 
other replicas?”

I.e. this seems to implicitly push everyone to a kind of CL.ALL model once your 
threshold triggers, where if one replica is slow then we can't make progress.

If we take a simpler approach of just bounding total outstanding requests to 
all replicas per coordinator, then we can avoid overload meltdown while 
allowing CL to continue to work as designed and tolerate as many slow replicas 
as the requested CL permits.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-11 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370533#comment-15370533
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

[~Stefania],

bq. Do we track the case where we receive a failed response? Specifically, in 
ResponseVerbHandler.doVerb, shouldn't we call updateBackPressureState() also 
when the message is a failure response?

Good point, I focused more on the success case, due to dropped mutations, but 
that sounds like a good thing to do.

bq. If we receive a response after it has timed out, won't we count that 
request twice, incorrectly increasing the rate for that window?

But can that really happen? {{ResponseVerbHandler}} returns _before_ 
incrementing back-pressure if the callback is null (i.e. expired), and 
{{OutboundTcpConnection}} doesn't even send outbound messages if they're timed 
out, or am I missing something?

bq. I also argue that it is quite easy to comment out the strategy and to have 
an empty strategy in the code that means no backpressure.

Again, I believe this would make enabling/disabling back-pressure via JMX less 
user friendly.

bq. I think what we may need is a new companion snitch that sorts the replica 
by backpressure ratio

I do not think sorting replicas is what we really need, as you have to send the 
mutation to all replicas anyway. I think what you rather need is a way to 
pre-emptively fail if the write consistency level is not met by enough 
"non-overloaded" replicas, i.e.:
* If CL.ONE, fail in *all* replicas are overloaded.
* If CL.QUORUM, fail if *quorum* replicas are overloaded.
* if CL.ALL, fail if *any* replica is overloaded.

This can be easily accomplished in {{StorageProxy#sendToHintedEndpoints}}.

bq. the exception needs to be different. native_protocol_v4.spec clearly states

I missed that too :(

This leaves us with two options:
* Adding a new exception to the native protocol.
* Reusing a different exception, with {{WriteFailureException}} and 
{{UnavailableException}} the most likely candidates.

I'm currently leaning towards the latter option.

bq. By "load shedding by the replica" do we mean dropping mutations that have 
timed out or something else?

Yes.

bq. Regardless, there is the problem of ensuring that all nodes have 
backpressure enabled, which may not be trivial.

We only need to ensure the coordinator for that specific mutation has 
back-pressure enabled, and we could do this by "marking" the {{MessageOut}} 
with a special parameter, what do you think?

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-10 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370159#comment-15370159
 ] 

Stefania commented on CASSANDRA-9318:
-

bq. I've pushed a few more commits to address your concerns.

The new commits look good, the code is clearly cleaner with an abstract 
BackPressureState created by the strategy, and the back-pressure window equal 
to the write timeout. Some things left to do:

* Rebase on trunk
* Get rid of the trailing spaces

bq.  more specifically, the rates are now tracked together when either a 
response is received or the callback is expired,

I have some concerns with this:

* Do we track the case where we receive a failed response? Specifically, in 
ResponseVerbHandler.doVerb, shouldn't we call updateBackPressureState() also 
when the message is a failure response? 
* If we receive a response after it has timed out, won't we count that request 
twice, incorrectly increasing the rate for that window?

bq. Configuration-wise, we're now left with only the back_pressure_enabled 
boolean and the back_pressure_strategy, and I'd really like to keep the former, 
as it makes way easier to dynamically turn the back-pressure on/off.

It's much better right now and I am not strongly opposed to keeping 
back_pressure_enabled, but I also argue that it is quite easy to comment out 
the strategy and to have an empty strategy in the code that means no 
backpressure.

bq. Talking about the overloaded state and the usage of OverloadedException, I 
agree the latter might be misleading, and I agree some failure conditions could 
lead to requests being wrongly refused, but I'd also like to keep some form of 
"emergency" feedback towards the client: what about throwing OE only if all (or 
a given number depending on the CL?) replicas are overloaded?

I think what we may need is a new companion snitch that sorts the replica by 
backpressure ratio, or an enhanced dynamic snitch that is aware of the 
backpressure strategy. If we sort replicas correctly, then we can throw an 
exception to the client if one of the selected replicas are overloaded. 
However, I think the exception needs to be different. native_protocol_v4.spec 
clearly states:

{code}
0x1001Overloaded: the request cannot be processed because the coordinator 
node is overloaded
{code}

Sorry I totally missed the meaning of this exception in the previous round of 
review. 

bq. Finally, one more wild idea to consider: given this patch greatly reduces 
the number of dropped mutations, and hence the number of inflight hints, what 
do you think about disabling load shedding by the replica side when 
back-pressure is enabled? This way we'd trade "full consistency" for an 
hopefully smaller number of unnecessary hints sent over to "pressured" replicas 
when their callbacks expire by the coordinator side.

By "load shedding by the replica" do we mean dropping mutations that have timed 
out or something else?

Regardless, there is the problem of ensuring that all nodes have backpressure 
enabled, which may not be trivial. I think replicas should have a mechanism of 
protecting themselves, rather than relying on the coordinator but I can comment 
further if we clarify what we mean exactly.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-07 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366583#comment-15366583
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

[~Stefania], [~slebresne],

I've pushed a few more commits to address your concerns.

First of all, I've got rid of the back-pressure timeout: the back-pressure 
window for the rate-based algorithm is now equal to the write timeout, and the 
overall implementation has been improved to better track in/out rates and avoid 
the need of a larger window; more specifically, the rates are now tracked 
together when either a response is received or the callback is expired, which 
avoids edge cases causing an unbalanced in/out rate when a burst of outgoing 
messages is recorded on the edge of a window.

Also, I've abstracted {{BackPressureState}} into an interface as requested.

Configuration-wise, we're now left with only the {{back_pressure_enabled}} 
boolean and the {{back_pressure_strategy}}, and I'd really like to keep the 
former, as it makes way easier to dynamically turn the back-pressure on/off.

Talking about the overloaded state and the usage of {{OverloadedException}}, I 
agree the latter might be misleading, and I agree some failure conditions could 
lead to requests being wrongly refused, but I'd also like to keep some form of 
"emergency" feedback towards the client: what about throwing OE only if _all_ 
(or a given number depending on the CL?) replicas are overloaded?

Regarding when and how to ship this, I'm fine with trunk and I agree it should 
be off by default for now.

Finally, one more wild idea to consider: given this patch greatly reduces the 
number of dropped mutations, and hence the number of inflight hints, what do 
you think about disabling load shedding by the replica side when back-pressure 
is enabled? This way we'd trade "full consistency" for an hopefully smaller 
number of unnecessary hints sent over to "pressured" replicas when their 
callbacks expire by the coordinator side.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-07 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365817#comment-15365817
 ] 

Sylvain Lebresne commented on CASSANDRA-9318:
-

A first concern I have, that has been brought up earlier in the conversation 
already, is that this will create a lot more OverloadedException that we're 
currently used to, with 2 concrete issues imo:
# the new OverloadedException has a different meaning than the old one. The one 
meant the coordinator was overloaded (by too much hints writing), so client 
drivers will tend to try another coordinator when that happens. That's not a 
good idea here.
# I think in many case that's not what the user want. If you write at CL.ONE, 
and 1 of your 3 replica is struggling (but other nodes are perfectly fine), 
having all your queries rejected (breaking availability concretely) until that 
one replica catch up (or dies) feels quite wrong. What I think you'd want is 
consider the node dead for that query and hint that slow node (and if a 
coordinator gets overwhelmed by those hints, we already throw an OE), but 
proceed with the write. I'll not in particular that when a node dies, it's not 
detected so right away, and detection can actually take a few seconds, which 
might well mean a node that just dies will be considered overloaded 
temporarilly by some coordinator. So considering that overloaded is roughtly 
the same as dead makes some sense (you wouldn't want to start failing writes 
because a node dies due to this mechanism if you have enough node for your CL 
in particular).

bq. For trunk, we can enable it by default provided we can test it on a medium 
size cluster before committing

We *should* test it on a medium sized cluster before committing, but even then 
I'm really reluctant making it the default on commit. I don't think making new 
features untested in production and can that have huge impacts the default 
right away is smart, even though we have done it numerous time (unsucessfully 
most of the time I'd say). I'd much rather leave it opt-in for a few releases 
so users can test it and wait more feedback until we make it the default (I 
know the counter argument, that no-one will test it unless it's a default, but 
I doubt that's true if people are genuinely put off by the existing behavior).

In general, as was already brought up earlier in the discussion, I suspect fine 
tuning the parameters won't be trivial, and I think we'll need a fair amount of 
testing in different conditions to guarantee the defaults for those parameters 
are sane.

bq. back_pressure_timeout_override: should we call it for what it is, the 
back_pressure_window_size and recommend that they increase the write timeout in 
the comments

I concur that I don't love the name either, nor the concept of having an 
override for another setting. I'd also prefer splitting the double meaning, 
leaving {{write_request_timeout_in_ms}} always be the timeout, and add a 
{{window_size}} in the strategy parameter. We can additionally do 2 things to 
get roughly the same default that in the current patch:
# make {{window_size}} be equal to the write timeout by default.
# make the {{*_request_timeout_in_ms}} option "silent" default (i.e. commented 
by default), and make the default for the write one depend on whether 
back_pressure is on or not.

To finish on the yaml option, if we move {{window_size}} to the strategy 
parameters, we could also got rid of {black_pressure_enabled}} and instead have 
a {{NoBackPressure}} strategy (which we can totally special case internally). 
Don't care terribly about it, but it's imo slightly more inline with other 
strategies.

I'd also suggest abstracting the {{BackPressureState}} state as an interface 
and making the strategy responsible of creating a new one, since most of the 
details of the state are likely strategy somewhat specific. As far as I can, 
all we'd need as interface is:
{noformat}
interface BackPressureState
{
void onMessageSent();
void onResponseReceived();
double currentBackPressureRate();
}
{noformat}
This won't provide more info for custom implementation just yet, but at least 
it'll give them more flexibility, and it's imo a bit clearer.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-06 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364517#comment-15364517
 ] 

Aleksey Yeschenko commented on CASSANDRA-9318:
--

I'm not sure that putting anything like this in 3.0.x is reasonable, sorry. 
It's not technically a bug fix. Whether or not it should be enabled by default 
in 3.x is up for debate (would have to be an even feature release, too).

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-05 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363543#comment-15363543
 ] 

Stefania commented on CASSANDRA-9318:
-

Thanks for your input [~jjordan]. I'm +1 with committing this to 3.0/3.9 
disabled by default as the impact should be minimal. For trunk, we can enable 
it by default provided we can test it on a medium size cluster before 
committing. If that's not the case, we can commit it disabled by default, and 
open a new follow-up ticket, to test it and enable it by default once it has 
been tested.

We should probably also prepare a short paragraph for NEWS.txt [~sbtourist].

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-05 Thread Jeremiah Jordan (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363288#comment-15363288
 ] 

Jeremiah Jordan commented on CASSANDRA-9318:


bq. One thing we did not confirm is whether you are happy committing this only 
to trunk or whether you need this in 3.0. Strictly speaking 3.0 accepts only 
bug fixes, not new features. However, this is an optional feature that solves a 
problem (dropped mutations) and that is disabled by default, so we have a case 
for an exception.

I think it would be nice for users to get this in 3.0/3.9 if you think it will 
have minimal effect when disabled.

I also think it would be good to enable it by default in trunk.  We have lots 
of people who get put off by the fact that you can overload your cluster enough 
to start dropping mutations, and the system doesn't "slow things down" such 
that it can cope.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-07-03 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360784#comment-15360784
 ] 

Stefania commented on CASSANDRA-9318:
-

bq. For that specific test I've got no client timeouts at all, as I wrote at 
ONE.

Sorry I should have been clearer, I meant what were the 
{{write_request_timeout_in_ms}} and {{back_pressure_timeout_override}} yaml 
settings?

bq. Agreed with all your points. I'll see what I can do, but any help/pointers 
will be very appreciated.

We can do the following:

bq. verify we can reduce the number of dropped mutations in a larger (5-10 
nodes) cluster with multiple clients writing simultaneously

I will ask for help to the TEs, more details to follow.

bq. some cstar perf tests to ensure ops per second are not degraded, both read 
and writes

We can launch a comparison test [here|http://cstar.datastax.com], 30M rows 
should be enough. I can launch it for you if you don't have an account.

bq. the dtests should be run with and without backpressure enabled

This can be done by temporarily changing cassandra.yaml on your branch and then 
launching the dtests.

bq. we should do a bulk load test, for example for cqlsh COPY FROM

I can take care of this. I don't expect problems because COPY FROM should 
contact the replicas directly, it's just a box I want to tick. Importing 5 to 
10M rows with 3 nodes should be sufficient.

bq. Please send me a PR and I'll incorporate those in my branch

I couldn't create a PR, for some reason sbtourist/cassandra wasn't in the base 
fork list. I've attached a patch to this ticket, 
[^9318-3.0-nits-trailing-spaces.patch].

bq. I find the current layout effective and simple enough, but I'll not object 
if you want to push those under a common "container" option.

The encryption options are what I was aiming at, but it's true that for 
everything else we have a flat layout, so let's leave it as it is.

bq. I don't like much that name either, as it doesn't convey very well the 
(double) meaning; making the back-pressure window the same as the write timeout 
is not strictly necessary, but it makes the algorithm behave better in terms of 
reducing dropped mutations as it gives replica more time to process its backlog 
after the rate is reduced. Let me think about that a bit more, but I'd like to 
avoid requiring the user to increase the write timeout manually, as again, it 
reduces the effectiveness of the algorithm.

I'll let you think about it. Maybe a boolean property that is true by default 
and that clearly indicates that the timeout is overridden, although this 
complicates things somewhat.

bq. Sure I can switch to that on trunk, if you think it's worth 
performance-wise (I can write a JMH test if there isn't one already).

The precision is only 10 milliseconds, if this is acceptable it would be 
interesting to see what the difference in performance is.

bq. It is not used in any unit tests code, but it is used in my manual byteman 
tests, and unfortunately I need it on the C* classpath; is that a problem to 
keep it?

Sorry I missed the byteman imports and helper. Let's just move it to the test 
source folder and add a comment. 

--

The rest of the CR points are fine. 

One thing we did not confirm is whether you are happy committing this only to 
trunk or whether you need this in 3.0. Strictly speaking 3.0 accepts only bug 
fixes, not new features. However, this is an optional feature that solves a 
problem (dropped mutations) and that is disabled by default, so we have a case 
for an exception.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-06-30 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356882#comment-15356882
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

Thanks for reviewing [~Stefania]! See my comments below...

bq. If I understood correctly, what we are trying to achieve with this new 
approach, is reducing the number of dropped mutations rather than preventing 
OOM but, in a way that doesn't cause OOM by invoking a rate-limiter-acquire in 
a Netty shared worker pool thread, which would slow down our ability to handle 
new client requests, as well as throw OEs once in a while when a low water-mark 
is reached. Is this a fair description? 

Correct. I'd just highlight the fact that reducing dropped mutations and 
replica instability is to me as important as avoiding OOMEs on the coordinator; 
[~jshook] has some very good points about that on his comment above.

bq. In the CCM test performed with two nodes, what was the difference in write 
timeout?

For that specific test I've got no client timeouts at all, as I wrote at ONE.

bq. In terms of testing, here is what I would suggest

Agreed with all your points. I'll see what I can do, but any help/pointers will 
be very appreciated.

bq. As for the code review, I've addressed some nits, mostly comments and 
trailing spaces, directly here.

Looks good! Please send me a PR and I'll incorporate those in my branch :)

bq. Should we perhaps group all settings under something like 
back_pressure_options?

I find the current layout effective and simple enough, but I'll not object if 
you want to push those under a common "container" option.

bq. back_pressure_timeout_override: should we call it for what it is, the 
back_pressure_window_size and recommend that they increase the write timeout in 
the comments, or is an override of the write timeout necessary for correctness?

I don't like much that name either, as it doesn't convey very well the (double) 
meaning; making the back-pressure window the same as the write timeout is not 
_strictly_ necessary, but it makes the algorithm behave better in terms of 
reducing dropped mutations as it gives replica more time to process its backlog 
after the rate is reduced. Let me think about that a bit more, but I'd like to 
avoid requiring the user to increase the write timeout manually, as again, it 
reduces the effectiveness of the algorithm.

bq. Should we use a dedicated package?

I'd keep it plain as it is now.

bq. Depending on the precision, in trunk there is an approximate time service 
with a precision of 10 ms, which is faster than calling 
System.currentTimeMillis()

Sure I can switch to that on trunk, if you think it's worth performance-wise (I 
can write a JMH test if there isn't one already).

bq. TESTRATELIMITER.JAVA - Is this still used?

It is not used in any unit tests code, but it is used in my manual byteman 
tests, and unfortunately I need it on the C* classpath; is that a problem to 
keep it? (I can add proper javadocs to make that clear)

bq. TESTTIMESOURCE.JAVA - I think we need to move it somewhere with the test 
sources.

Leftover there from a previous test implementation; I'll move it.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: backpressure.png, limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-06-30 Thread Stefania (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356800#comment-15356800
 ] 

Stefania commented on CASSANDRA-9318:
-

Well, I read all of the discussion above and it took perhaps longer than to 
actually review the code! :) If I understood correctly, what we are trying to 
achieve with this new approach, is reducing the number of dropped mutations 
rather than preventing OOM but, in a way that doesn't cause OOM by invoking a 
rate-limiter-acquire in a Netty shared worker pool thread, which would slow 
down our ability to handle new client requests, as well as throw OEs once in a 
while when a low water-mark is reached. Is this a fair description? Assuming we 
can prove this in a larger scenario, I am +1 on the approach but I would think 
it should go to trunk rather than 3.0? In the CCM test performed with two 
nodes, what was the difference in write timeout?

In terms of testing, here is what I would suggest (cc [~cassandra-te] for any 
help they may be able to provide):

* verify we can reduce the number of dropped mutations in a larger (5-10 nodes) 
cluster with multiple clients writing simultaneously
* some cstar perf tests to ensure ops per second are not degraded, both read 
and writes
* the dtests should be run with and without backpressure enabled
* we should do a bulk load test, for example for cqlsh COPY FROM

As for the code review, I've addressed some nits, mostly comments and trailing 
spaces, directly 
[here|https://github.com/stef1927/cassandra/commit/334509c10a5f4f759550fb64902aabc8c4a4f40c].
 The code is really of excellent quality and very well unit tested, so I didn't 
really find much, and they are mostly suggestions. I will however have another 
pass next week, now that I am more familiar with the feature.

h5. conf/cassandra.yaml

I've edited the comments slightly, mostly to spell out for users why they may 
need a back pressure strategy, and to put the emphasis on the default 
implementation first, then say how to create new strategies. As you mentioned, 
new strategies may require new state parameters anyway, so it's not clear how 
easy it will be to add new strategies. You may be able to improve on my 
comments further.

Should we perhaps group all settings under something like back_pressure_options?

back_pressure_timeout_override: should we call it for what it is, the 
back_pressure_window_size and recommend that they increase the write timeout in 
the comments, or is an override of the write timeout necessary for correctness?

h5. src/java/org/apache/cassandra/config/Config.java

I moved the creation of the default parameterized class to 
RateBasedBackPressure, to remove some pollution of constants. 

h5. src/java/org/apache/cassandra/net/BackPressureState.java

Should we use a dedicated package? It would require a couple of setters and 
getters in the state to keep the properties non public, which I would prefer 
anyway.

h5. src/java/org/apache/cassandra/net/BackPressureStrategy.java

As above, should we use a dedicated package?

h5. src/java/org/apache/cassandra/net/RateBasedBackPressure.java

OK - I've changed the code slightly to calculate backPressureWindowHalfSize in 
seconds only once per call.

h5. src/java/org/apache/cassandra/utils/SystemTimeSource.java

Depending on the precision, in trunk there is an approximate time service with 
a precision of 10 ms, which is faster than calling System.currentTimeMillis()

h5. src/java/org/apache/cassandra/utils/TestRateLimiter.java

Is this still used?

h5. src/java/org/apache/cassandra/utils/TestTimeSource.java

I think we need to move it somewhere with the test sources.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: backpressure.png, limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-06-22 Thread Joshua McKenzie (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345077#comment-15345077
 ] 

Joshua McKenzie commented on CASSANDRA-9318:


[~aweisberg]: You've spent a fair amount of time thinking about this problem - 
would you be up for reviewing this?

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Sergio Bossa
> Attachments: backpressure.png, limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-06-22 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344958#comment-15344958
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

I would like to reopen this ticket and propose the following patch to implement 
coordinator-based back-pressure:

| [3.0 
patch|https://github.com/apache/cassandra/compare/cassandra-3.0...sbtourist:CASSANDRA-9318-3.0?expand=1]
 | 
[testall|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-3.0-testall/]
 | 
[dtest|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-3.0-dtest/]
 |

The above patch provides full-blown, end-to-end, 
replica-to-coordinator-to-client *write* back-pressure, based on the following 
main concepts:
* The replica itself has no back-pressure knowledge: it keeps trying to write 
mutations as fast as possible, and still applies load shedding.
* The coordinator tracks the back-pressure state *per replica*, which in the 
current implementation consists of the incoming and outgoing rate of messages 
from/to the replica.
* The coordinator is configured with a back-pressure strategy that based on the 
back-pressure state, applies a given back-pressure algorithm when sending 
mutations to each replica.
* The provided default strategy is based on the incoming/outgoing message 
ratio, used to rate limit outgoing messages towards a given replica.
* The back-pressure strategy is also in charge of signalling the coordinator 
when a given replica is considered "overloaded", in which case an 
{{OverloadedException}} is thrown to the client for all mutation requests 
deemed as "overloading", until the strategy considers such overloaded state 
over.
* The provided default strategy uses configurable low/high thresholds to either 
rate limit or throw exception back to clients.

While all of that might seem too complex, the patch is actually surprisingly 
simple. I provided as many unit tests as possible, and I've also tested it on a 
2-nodes CCM cluster, using [ByteMan|http://byteman.jboss.org/] to simulate a 
slow replica, and I'd say results are quite promising: as an example, see 
attached ByteMan script and plots showing a cluster with no back-pressure 
ending up dropping ~200k mutations, while a cluster with back-pressure enabled 
only ~2k, which means less coordinator overload and an easily recoverable 
replica state via hints.

I can foresee at least two open points:
* We might want to track more "back-pressure state" to allow implementing 
different strategies; I personally believe strategies based on in/out rates are 
the most appropriate ones to avoid *both* the overloading and dropped mutations 
problems, but people might think differently.
* When the {{OverloadedException}} is (eventually) thrown, some requests might 
have been already sent, which is exactly what currently happens with hint 
overloading too: we might want to check both kinds of overloading before 
actually sending any mutations to replicas.

Thoughts?

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-01-26 Thread Russell Alexander Spitzer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118187#comment-15118187
 ] 

Russell Alexander Spitzer commented on CASSANDRA-9318:
--

[~aweisberg] I think i can get you a repo pretty easily with the 
SparkCassandraConnector. We tend to get a few comments a week where the driver 
ends up excepting out on "NoHostAvailable" exceptions. Usually folks who have 
done a RF=1 and then a full table scan or are doing heavy writes to a cluster 
with a very high number of concurrent writers.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-01-11 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091963#comment-15091963
 ] 

Jonathan Ellis commented on CASSANDRA-9318:
---

Since most of that discussion is implementation details, I'll quote the 
relevant part:

bq. With consistency level less then ALL mutation processing can move to
background (meaning client was answered, but there is still work to
do on behalf of the request). If background request rate completion
is lower than incoming request rate background request will accumulate
and eventually will exhaust all memory resources. This patch's aim is
to prevent this situation by monitoring how much memory all current
background request take and when some threshold is passed stop moving
request to background (by not replying to a client until either memory
consumptions moves below the threshold or request is fully completed).

bq. There are two main point where each background mutation consumes memory:
holding frozen mutation until operation is complete in order to hint it
if it does not) and on rpc queue to each replica where it sits until it's
sent out on the wire. The patch accounts for both of those separately
and limits the former to be 10% of total memory and the later to be 6M.
Why 6M? The best answer I can give is why not :) But on a more serious
note the number should be small enough so that all the data can be
sent out in a reasonable amount of time and one shard is not capable to
achieve even close to a full bandwidth, so empirical evidence shows 6M
to be a good number. 

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2016-01-11 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092215#comment-15092215
 ] 

Ariel Weisberg commented on CASSANDRA-9318:
---

I am not sure how they have structured things and why they need to do that. 
With a hard limit on ingest rate and a 2 second timeout there is a soft bound 
on how much memory can be committed. It's not a hard bound because there is no 
guarantee that the timeout thread will keep up.

I do think that switching requests from async to sync based on load is a better 
approach than enabling/disabling read on client connections. There can be a lot 
of client connections so that is something that can be time consuming.

I am not sold that we should do anything at all? We could add more code to 
address this, but I hate to do that when I can't demonstrate that I am solving 
a problem. I split off what I consider the good bits into CASSANDRA-10971 and 
CASSANDRA-10972. The delay has been other work, me considering this to be 
lowish priority since I can't demonstrate it, and difficulty getting hardware 
going that is capable of demonstrating the problem. I'm slowly iterating on it 
in the background right now.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-12-30 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075476#comment-15075476
 ] 

Aleksey Yeschenko commented on CASSANDRA-9318:
--

Posting myself instead: a link to a very relevant discussion by Cloudius guys: 
https://groups.google.com/forum/#!topic/scylladb-dev/7Ge6vHN52os

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-12-30 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075444#comment-15075444
 ] 

Ariel Weisberg commented on CASSANDRA-9318:
---

HT Aleksey Scylla discussion on a similar topic
https://groups.google.com/forum/#!topic/scylladb-dev/7Ge6vHN52os

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-12-17 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062162#comment-15062162
 ] 

Ariel Weisberg commented on CASSANDRA-9318:
---

To summarize what I have been experimenting with. I have been limiting disk 
throughput with a rate limiter both with and without limiting the commit log. I 
am excluding the commitlog because I don't want to bottleneck the ingest rate. 
I have also been experimenting with limiting the rate at which remote delivered 
messages are consumed which more or less forces the mutations to time out.

I added hint backpressure and since then I have been unable to make the server 
fail or OOM from slow disk or slow remote nodes. What I have found is that I am 
unable to get the server to start enough concurrent requests to OOM in the disk 
bound case because the request stage is bounded to 128 requests. 

In the remote mutation bound case I think I am running out of ingest capacity 
on a single node and that is preventing me from running up more requests in 
memory faster than they can be evicted by the 2 second timeout. With gigabit 
ethernet the reason is obvious. With 10-gigabit ethernet I would still be 
limited to 1.6 gigabytes of data assuming timeouts are prompt. That's a hefty 
quantity, but something that could fit in a larger heap. I should be able to 
test that soon.

I noticed OutboundTcpConnection is another potential source of OOM since it can 
only drop messages if the thread isn't blocked writing to the socket. I am 
allowing it to wakeup regularly in my testing, but if something is unavailable 
we would have to not OOM before failure detection kicks in which is a longer 
timespan. It might make sense to have another thread periodically check if it 
should expire messages from the queue. This isn't something I can test for 
without access to 10-gig.
have coordinate several hundred (or thousand?) megabytes of transaction data to 
OOM. This assumes you can get past the ingest limitations of gigabit ethernet.

For code changes I think that disabling reads based on memory pressure is 
mostly harmless, but I can't demonstrate that it adds a lot of value given how 
things behave with a large enough heap and the network level bounds on ingest 
rate.

I do want to add backpressure to hints and the compressed commit log because 
that is an easy path to OOM. I'll package up those changes up today.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-12-14 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056001#comment-15056001
 ] 

Aleksey Yeschenko commented on CASSANDRA-9318:
--

CASSANDRA-9834 relevant here (and related discussion in CASSANDRA-6230).

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-12-14 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056411#comment-15056411
 ] 

Ariel Weisberg commented on CASSANDRA-9318:
---

Quick note. 65k mutations pending in the mutation stage. 7 memtables pending 
flush. [I hooked memtables pending flush into the backpressure 
mechanism.|https://github.com/apache/cassandra/commit/494eabf48ab48f1e86c058c0b583166ab39dcc39]
 That absolutely wrecked performance as throughput dropped to 0 zero 
periodically, but throughput is infinitely higher than when the database hasn't 
OOMed.

Kicked off a few performance runs to demonstrate what happens when you do have 
backpressure and you try various large limits on in flight memtables/requests.

[9318 w/backpressure 64m 8g heap memtables 
count|http://cstar.datastax.com/tests/id/fa769eec-a283-11e5-bbc9-0256e416528f]
[9318 w/backpressure 1g 8g heap memtables 
count|http://cstar.datastax.com/tests/id/4c52dd6e-a286-11e5-bbc9-0256e416528f]
[9318 w/backpressure 2g 8g heap memtables 
count|http://cstar.datastax.com/tests/id/b3d5b470-a286-11e5-bbc9-0256e416528f]

I am setting the point where backpressure turns off to almost the same limit as 
to when it turns on. This is smooths out performance just enough for stress to 
not constantly emit huge numbers of errors as writes time out because the 
database stops serving requests for a long time waiting for a memtable to flush.

With pressure from memtables somewhat accounted for the remaining source of 
pressure that can bring down a node is remotely delivered mutations. I can 
throw those into the calculation and add a listener that blocks reads from 
other cluster nodes. It's a nasty thing to do, but maybe not that different 
from OOM.

I am going to hack together something to force a node to be slow so I can 
demonstrate overwhelming it with remotely delivered mutations first.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-12-14 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056940#comment-15056940
 ] 

Ariel Weisberg commented on CASSANDRA-9318:
---

Further review shows that with an 8 gigabyte heap backpressure isn't getting a 
chance to kick in. Several memtables end up being queued at once for flushing 
but the peak memory utilization only a few hundred megabytes. I am back to not 
having a workload that demonstrate that backpressure has value.

I tried artificially constraining disk throughput to 32 megabytes/sec and ended 
up hitting OOM in hints. Hints appear to allocate ByteBuffers indefinitely so 
they can't provide backpressure based on available disk capacity/throughput 
anymore. Hints are submitted directly from the contributing thread so there is 
no such thing as in-flight hint anymore. I'll clean up the existing hint 
overload protection so that it kicks in based on number of pending buffers and 
see where that gets me.

I am guessing that in practice hint overload is going to be a lot harder to hit 
now that it is based on flat files. Clearly sequential IO performance outstrips 
the ability to process even large incoming mutations.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-12-12 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15054761#comment-15054761
 ] 

Ariel Weisberg commented on CASSANDRA-9318:
---

I got two cstar jobs to complete.

[This job is set to allow 16 megabytes of transactions per coordinator, and 
disabled reads until they come back down to 12 
megabytes.|http://cstar.datastax.com/graph?command=one_job=d1e720c8-a125-11e5-9051-0256e416528f=op_rate=1_write=1_aggregates=true=0=6664.35=0=11883.3]
[This job is set to allow 64 megabytes of transactions per coordinator, and 
disabled reads until they came back down to 60 
megabytes.|http://cstar.datastax.com/graph?command=one_job=26853362-a127-11e5-80c2-0256e416528f=op_rate=1_write=1_aggregates=true=0=322.85=0=12972.3]

The job with 64 megabytes in flight kind of looks like it failed after 300 
seconds. I didn't expect the threshold for things to fall apart to be quite 
that low, but generally speaking yeah more data in flight tends to cause bad 
things to happen.

So why did the second one fall apart? First off mad props to whomever started 
collecting the GC logs. Lot's of continual full GC at the end. Sure enough the 
heap is only 1 gigabyte. Are we seriously running all our performance tests 
with a default heap of 1 gigabyte?

I don't think it failed due to in flight requests (only had 32 megabytes in 
flight). I think it up OOMed due to other heap pressure. For this in-flight 
request backpressure to work I think we need to include the weight of memtables 
when making the decision. I am going to bump up the heap and try again to see 
if I can reduce the impact of other heap pressure to the point that we can 
start buffering more requests in flight.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-12-11 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053719#comment-15053719
 ] 

Ariel Weisberg commented on CASSANDRA-9318:
---

I tried out limiting based on memory utilization. What I found was that for the 
small amount of throughput I can get out of my desktop the 2 second timeout is 
sufficient to evict and hint without OOM. If I extend the timeout enough I can 
get an OOM because eviction doesn't keep up so that demonstrates that eviction 
has to take place to avoid OOM.

I see this change as useful in that it places a hard bound on working set size, 
but not sufficient.

Performance tanks so badly as the heap fills up with requests being timed out 
that evicting them is not a problem. If that is the case maybe we should just 
be evicting them more aggressively so they don't have an impact on performance. 
Possibly based on perceived odds of receiving a response in a reasonable amount 
of time. It makes sense to me to use hinting as a method of getting the data 
off the heap and batching the replication to slow nodes or DCs.

If we start evicting requests for a node maybe we should have an adaptive 
approach and go straight to hinting for the slow node for some % of requests. 
If the non-immediately hinted requests start succeeding we can gradually 
increase the % that go straight to the node.

I am trying to think of alternatives that don't end up kicking back an error to 
the application. That's still an important capability to have because growing 
hints forever is not great, but we can start by ensuring that rest of the 
cluster can always operate at full speed even if a node is slow. Separately we 
can tackle bounding the resource utilization issues that presents.

Operationally how do people feel about having many gigabytes worth of hints to 
deliver? Is that useful in that it allows things to continue until the slow 
node is addressed?

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths, Streaming and Messaging
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-12-10 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051559#comment-15051559
 ] 

Ariel Weisberg commented on CASSANDRA-9318:
---

The discussion so far has lead to throwing OE or UE to the client being too 
complex starting point since the database will throw off more errors then it 
used to.

The advantage of a bound on just memory committed at the coordinator is that 
it's "free" from the perspective of the user. OOM and blocking are similar 
levels of unavailability. The difference is that OOM is more disruptive. If the 
coordinator at least prevents itself from OOMing it can recover without 
intervention and continue to serve writes.

I think we are being optimistic about how many writes you can buffer and time 
out at coordinators before things start going south. The space amplification is 
going to be bad and the GC overhead increases linearly with the number of 
timing out writes in flight. It's going to lead to survivor copying and 
promotion which is going to lead to fragmentation. I think we will find that we 
can't buffer enough writes to tolerate a 2 second timeout in a small cluster 
where a single coordinator must coordinate many requests for a slow node. I 
think we are going to see what Benedict described, especially in small 
clusters, where performance is cyclical as the coordinators block waiting for 
timeouts to fire.

I have an implementation of most of this. Fussing over how to calculate the 
weight of inflight callbacks and mutations without introducing a lot of 
overhead walking the graph repeatedly. We already walk it once to calculate the 
serialized size, but that is not the same thing. I usually just go with a rough 
guesstimate, serialized size is a good proxy for a weight, but we are trying to 
fill the heap reliably. It's bad enough not knowing the layout of the heap or 
the garbage collector induced overhead.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-10-23 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971574#comment-14971574
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

bq. The whole point of this ticket is to avoid the complexity of intra-node 
backpressure, and instead basing coordinator -> client backpressure on the 
coordinator's local knowledge.

Just to be clear, my proposal _does_ keep the back-pressure decision local to 
the coordinator, that is, there's no communication between nodes in such regard 
(which I agree would be a much different matter). The difference in my proposal 
is that we deal with back-pressure on a per-replica basis and in a fine grained 
way, rather than with coarse grained global memory limits which would end up 
flooding replicas before being triggered.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Ariel Weisberg
>Assignee: Jacek Lewandowski
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-10-23 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971157#comment-14971157
 ] 

Jonathan Ellis commented on CASSANDRA-9318:
---

bq. I'd like to resurrect this one and if time permits take it by following 
Jonathan's proposal above, except I'd also like to propose an additional form 
of back-pressure at the coordinator->replica level.

The whole point of this ticket is to avoid the complexity of intra-node 
backpressure, and instead basing coordinator -> client backpressure on the 
coordinator's local knowledge.  (Which is not a perfect view of cluster state 
but it is a reasonable approximation, especially since it can use its 
MessagingService state to guess replica problems, even without explicit replica 
backpressure.)

We can always add explicit intra-node backpressure later, they are 
complementary.  But I'm not sure it will be necessary and I want to avoid 
over-engineering the problem to start.

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Ariel Weisberg
>Assignee: Jacek Lewandowski
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-10-16 Thread Sergio Bossa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960450#comment-14960450
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-

I'd like to resurrect this one and if time permits take it by following 
Jonathan's proposal above, except I'd also like to propose an additional form 
of back-pressure at the coordinator->replica level. Such beack-pressure would 
be applied by the coordinator when sending messages to *each* replica if some 
kind of flow control condition is met (i.e. number of in-flight requests, drop 
rate, we can talk about this more in a second time or even experiment); that 
is, each replica would have its own flow control, allowing to better fine tune 
the applied back-pressure. The memory-based back-pressure would at this point 
work as a kind of circuit breaker: if replicas can't keep up, and the applied 
flow control causes too many requests to accumulate on the coordinator, the 
memory-based limit will kick in and start pushing back to the client by either 
pausing or throwing OveloadedException.

There are obviously details we need to discuss and/or experiment with, i.e.:
1) The flow control algorithm (we could steal from the TCP literature, using 
something like CoDel or Adaptive RED).
2) If posing any limit to coordinator-level throttling, i.e. shedding requests 
that have been throttled for too much time (I would say no, because the memory 
limit should protect against OOMs and allow the in-flight requests to be 
processed).
3) What to do when the memory limit is reached (we could make this 
policy-based).

I hope it makes sense, and I hope you see the reason behind that: dropped 
mutations are a problem for many C* users, and even more for C* applications 
that cannot rely on QUORUM reads (i.e. inverted index queries, graph queries); 
the proposal above is not meant to be the definitive solution, but should 
greatly help reducing the number of dropped mutations on replicas, which 
memory-based back-pressure alone doesn't (as by the time you kick it, without 
flow control replicas will be already flooded with requests).

> Bound the number of in-flight requests at the coordinator
> -
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Ariel Weisberg
>Assignee: Jacek Lewandowski
> Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605351#comment-14605351
 ] 

Benedict commented on CASSANDRA-9318:
-

Of course, things get even hairier with multi-DC, but I'm not as familiar with 
the logic there. It looks naively that a single node could quickly bring down 
every DC.

 Bound the number of in-flight requests at the coordinator
 -

 Key: CASSANDRA-9318
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 2.1.x, 2.2.x


 It's possible to somewhat bound the amount of load accepted into the cluster 
 by bounding the number of in-flight requests and request bytes.
 An implementation might do something like track the number of outstanding 
 bytes and requests and if it reaches a high watermark disable read on client 
 connections until it goes back below some low watermark.
 Need to make sure that disabling read on the client connection won't 
 introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605600#comment-14605600
 ] 

Benedict commented on CASSANDRA-9318:
-

That said, in general (perhaps in a separate ticket) we should probably make 
our heap calculations a bit more robust wrt each other. i.e. we should subtract 
memtable space from any heap apportionment, in case users set memtable space 
really high.

 Bound the number of in-flight requests at the coordinator
 -

 Key: CASSANDRA-9318
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 2.1.x, 2.2.x


 It's possible to somewhat bound the amount of load accepted into the cluster 
 by bounding the number of in-flight requests and request bytes.
 An implementation might do something like track the number of outstanding 
 bytes and requests and if it reaches a high watermark disable read on client 
 connections until it goes back below some low watermark.
 Need to make sure that disabling read on the client connection won't 
 introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605597#comment-14605597
]

Benedict commented on CASSANDRA-9318:
-

bq. default timeout is 2s not 10, so actually fine in your example of 300MB vs
150MB/s x 2s

Looks like 2.0 this was 10s, and it was hard-coded in yaml, so anyone upgrading
from 2.0 or before likely has a 10s timeout. So we should assume this is by far
the most common timeout.

bq. you don't see a complete halt until capacity's worth of requests timeout
all at once, because you don't get an entire capacity load accepted at once.
it's more continuous than discrete – you pause until the oldest expire, accept
more, pause until the oldest expire, etc. so you make slow progress as load
shedding can free up memory. thus, load shedding is complementary to flow
control.

You see a complete halt as soon as we exhaust space. If we exhaust space in
0.5x timeout, then we will see repeatedly juddering behaviour.

bq. but we can easily set a higher limit on MS heap – maybe as high as 1/8 heap
as default which gives us a lot of room for 8GB heap

If we set this really _aggressively_ high, say min(1/4 heap, 1Gb) until we
implement the improved shedding, then I'll quit complaining. Right now we give
breathing room up to and beyond collapse. I absolutely agree that breathing
room up until just-prior-to-collapse is preferable, but cutting our breathing
room by a magnitude is reducing our availability in clusters without their
opting into it. 1/4 heap is probably still leaving quite a lot of headroom we
would otherwise have safely used in a 2Gb heap (which are quite feasible, and
probably preferable, for many users running offheap memtables), but is still
very unlikely to cause the server to completely collapse.

Bound the number of in-flight requests at the coordinator
-

Key: CASSANDRA-9318
URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
Project: Cassandra
Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
Fix For: 2.1.x, 2.2.x

It's possible to somewhat bound the amount of load accepted into the cluster
by bounding the number of in-flight requests and request bytes.
An implementation might do something like track the number of outstanding
bytes and requests and if it reaches a high watermark disable read on client
connections until it goes back below some low watermark.
Need to make sure that disabling read on the client connection won't
introduce other issues.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605629#comment-14605629
 ] 

Jonathan Ellis commented on CASSANDRA-9318:
---

bq. If we set this really aggressively high, say min(1/4 heap, 1Gb) until we 
implement the improved shedding, then I'll quit complaining. 

Sold!

 Bound the number of in-flight requests at the coordinator
 -

 Key: CASSANDRA-9318
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 2.1.x, 2.2.x


 It's possible to somewhat bound the amount of load accepted into the cluster 
 by bounding the number of in-flight requests and request bytes.
 An implementation might do something like track the number of outstanding 
 bytes and requests and if it reaches a high watermark disable read on client 
 connections until it goes back below some low watermark.
 Need to make sure that disabling read on the client connection won't 
 introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605539#comment-14605539
 ] 

Jonathan Ellis commented on CASSANDRA-9318:
---

* default timeout is 2s not 10, so actually fine in your example of 300MB vs 
150MB/s  x 2s
* but we can easily set a higher limit on MS heap -- maybe as high as 1/8 heap 
as default which gives us a *lot* of room for 8GB heap
* you don't see a complete halt until capacity's worth of requests timeout all 
at once, because you don't get an entire capacity load accepted at once.  it's 
more continuous than discrete -- you pause until the oldest expire, accept 
more, pause until the oldest expire, etc.  so you make slow progress as load 
shedding can free up memory.  thus, load shedding is complementary to flow 
control.
* aggressively load shedding for outlier nodes is a good idea that we should 
follow up on in another ticket.  again, current behavior of continuing to 
accept requests until we fall over is worse than imposing flow control, so we 
should start with that [flow control] in 2.1 and make further improvements in 
2.2.

 Bound the number of in-flight requests at the coordinator
 -

 Key: CASSANDRA-9318
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 2.1.x, 2.2.x


 It's possible to somewhat bound the amount of load accepted into the cluster 
 by bounding the number of in-flight requests and request bytes.
 An implementation might do something like track the number of outstanding 
 bytes and requests and if it reaches a high watermark disable read on client 
 connections until it goes back below some low watermark.
 Need to make sure that disabling read on the client connection won't 
 introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605779#comment-14605779
 ] 

Benedict commented on CASSANDRA-9318:
-

This still leaves some questions, of varying import and difficulty: 

* how can we easily block consumption of writes from clients without stopping 
reads?
* how should we mandate (or encourage) native clients to behave:
** separate read/write connections?
** configurable blocking queue size for pending writes to avoid unbounded 
growth?
** should timeouts be from time of despatch, or from time of submission to 
local client?
* will we implement this for thrift clients?


 Bound the number of in-flight requests at the coordinator
 -

 Key: CASSANDRA-9318
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 2.1.x, 2.2.x


 It's possible to somewhat bound the amount of load accepted into the cluster 
 by bounding the number of in-flight requests and request bytes.
 An implementation might do something like track the number of outstanding 
 bytes and requests and if it reaches a high watermark disable read on client 
 connections until it goes back below some low watermark.
 Need to make sure that disabling read on the client connection won't 
 introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605814#comment-14605814
 ] 

Benedict commented on CASSANDRA-9318:
-

Do you mean within a single connection? I was under the impression we 
absolutely didn't want to stop readers?

I disagree about not imposing _expectations_ on client implementations, so that 
users don't need to reason about each connector they use independently. Along 
with the protocol spec, other specs such as behaviour in these scenarios should 
really be spelled out. If clients want to do their own thing, there's not much 
we can do, but it's helpful for users to have an expectation of behaviour, and 
for implementors to be made aware of the potential problems their driver may 
encounter that they do not anticipate (and what everyone will expect them to 
do).

 Bound the number of in-flight requests at the coordinator
 -

 Key: CASSANDRA-9318
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 2.1.x, 2.2.x


 It's possible to somewhat bound the amount of load accepted into the cluster 
 by bounding the number of in-flight requests and request bytes.
 An implementation might do something like track the number of outstanding 
 bytes and requests and if it reaches a high watermark disable read on client 
 connections until it goes back below some low watermark.
 Need to make sure that disabling read on the client connection won't 
 introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605808#comment-14605808
 ] 

Jonathan Ellis commented on CASSANDRA-9318:
---

# We can't and we shouldn't
# See Above
# No

 Bound the number of in-flight requests at the coordinator
 -

 Key: CASSANDRA-9318
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 2.1.x, 2.2.x


 It's possible to somewhat bound the amount of load accepted into the cluster 
 by bounding the number of in-flight requests and request bytes.
 An implementation might do something like track the number of outstanding 
 bytes and requests and if it reaches a high watermark disable read on client 
 connections until it goes back below some low watermark.
 Need to make sure that disabling read on the client connection won't 
 introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605817#comment-14605817
 ] 

Jonathan Ellis commented on CASSANDRA-9318:
---

https://issues.apache.org/jira/browse/CASSANDRA-9318?focusedCommentId=14604775page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14604775

 Bound the number of in-flight requests at the coordinator
 -

 Key: CASSANDRA-9318
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 2.1.x, 2.2.x


 It's possible to somewhat bound the amount of load accepted into the cluster 
 by bounding the number of in-flight requests and request bytes.
 An implementation might do something like track the number of outstanding 
 bytes and requests and if it reaches a high watermark disable read on client 
 connections until it goes back below some low watermark.
 Need to make sure that disabling read on the client connection won't 
 introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605825#comment-14605825
 ] 

Benedict commented on CASSANDRA-9318:
-

My mistake, I clearly misread your earlier comment. I'm not sure I agree with 
that conclusion, but not strongly enough to prolong the discussion. So I guess 
that's that then.

 Bound the number of in-flight requests at the coordinator
 -

 Key: CASSANDRA-9318
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 2.1.x, 2.2.x


 It's possible to somewhat bound the amount of load accepted into the cluster 
 by bounding the number of in-flight requests and request bytes.
 An implementation might do something like track the number of outstanding 
 bytes and requests and if it reaches a high watermark disable read on client 
 connections until it goes back below some low watermark.
 Need to make sure that disabling read on the client connection won't 
 introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator