[jira] [Commented] (CASSANDRA-7542) Reduce CAS contention

2014-08-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14098326#comment-14098326
 ] 

Benedict commented on CASSANDRA-7542:
-

OK. Not sure if it is worth our pursuing this right now then, at least as far 
as a 2.0 delivery is concerned. When I get some more free time I'll create some 
benchmarks to test how much of an improvement these (or future) changes have.

> Reduce CAS contention
> -
>
> Key: CASSANDRA-7542
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7542
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Benedict
> Fix For: 2.0.10
>
>
> CAS updates on same CQL partition can lead to heavy contention inside C*. I 
> am looking for simple ways(no algorithmic changes) to reduce contention as 
> the penalty of it is high in terms of latency, specially for reads. 
> We can put some sort of synchronization on CQL partition at StorageProxy 
> level. This will reduce contention at least for all requests landing on one 
> box for same partition. 
> Here is an example of why it will help:
> 1) Say 1 write and 2 read CAS requests for the same partition key is send to 
> C* in parallel. 
> 2) Since client is token-aware, it sends these 3 request to the same C* 
> instance A. (Lets assume that all 3 requests goto same instance A) 
> 3) In this C* instance A, all 3 CAS requests will contend with each other in 
> Paxos. (This is bad)
> To improve contention in 3), what I am proposing is to add a lock on 
> partition key similar to what we do in PaxosState.java to serialize these 3 
> requests. This will remove the contention and improve performance as these 3 
> requests will not collide with each other.
> Another improvement we can do in client is to pick a deterministic live 
> replica for a given partition doing CAS.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7542) Reduce CAS contention

2014-08-14 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097747#comment-14097747
 ] 

sankalp kohli commented on CASSANDRA-7542:
--

No we did not get time to use this. Also we have reduced the contention through 
other approaches so this is a less useful to us. However, it might be a good 
way to reduce contention in general.

> Reduce CAS contention
> -
>
> Key: CASSANDRA-7542
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7542
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Benedict
> Fix For: 2.0.10
>
>
> CAS updates on same CQL partition can lead to heavy contention inside C*. I 
> am looking for simple ways(no algorithmic changes) to reduce contention as 
> the penalty of it is high in terms of latency, specially for reads. 
> We can put some sort of synchronization on CQL partition at StorageProxy 
> level. This will reduce contention at least for all requests landing on one 
> box for same partition. 
> Here is an example of why it will help:
> 1) Say 1 write and 2 read CAS requests for the same partition key is send to 
> C* in parallel. 
> 2) Since client is token-aware, it sends these 3 request to the same C* 
> instance A. (Lets assume that all 3 requests goto same instance A) 
> 3) In this C* instance A, all 3 CAS requests will contend with each other in 
> Paxos. (This is bad)
> To improve contention in 3), what I am proposing is to add a lock on 
> partition key similar to what we do in PaxosState.java to serialize these 3 
> requests. This will remove the contention and improve performance as these 3 
> requests will not collide with each other.
> Another improvement we can do in client is to pick a deterministic live 
> replica for a given partition doing CAS.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7542) Reduce CAS contention

2014-08-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096805#comment-14096805
 ] 

Benedict commented on CASSANDRA-7542:
-

[~kohlisankalp] any news?

> Reduce CAS contention
> -
>
> Key: CASSANDRA-7542
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7542
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Benedict
> Fix For: 2.0.10
>
>
> CAS updates on same CQL partition can lead to heavy contention inside C*. I 
> am looking for simple ways(no algorithmic changes) to reduce contention as 
> the penalty of it is high in terms of latency, specially for reads. 
> We can put some sort of synchronization on CQL partition at StorageProxy 
> level. This will reduce contention at least for all requests landing on one 
> box for same partition. 
> Here is an example of why it will help:
> 1) Say 1 write and 2 read CAS requests for the same partition key is send to 
> C* in parallel. 
> 2) Since client is token-aware, it sends these 3 request to the same C* 
> instance A. (Lets assume that all 3 requests goto same instance A) 
> 3) In this C* instance A, all 3 CAS requests will contend with each other in 
> Paxos. (This is bad)
> To improve contention in 3), what I am proposing is to add a lock on 
> partition key similar to what we do in PaxosState.java to serialize these 3 
> requests. This will remove the contention and improve performance as these 3 
> requests will not collide with each other.
> Another improvement we can do in client is to pick a deterministic live 
> replica for a given partition doing CAS.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7542) Reduce CAS contention

2014-07-24 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073790#comment-14073790
 ] 

sankalp kohli commented on CASSANDRA-7542:
--

Thanks. I will take a look at the patch and get it perf tested and see what 
improvements we see. 

> Reduce CAS contention
> -
>
> Key: CASSANDRA-7542
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7542
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Benedict
> Fix For: 2.0.10
>
>
> CAS updates on same CQL partition can lead to heavy contention inside C*. I 
> am looking for simple ways(no algorithmic changes) to reduce contention as 
> the penalty of it is high in terms of latency, specially for reads. 
> We can put some sort of synchronization on CQL partition at StorageProxy 
> level. This will reduce contention at least for all requests landing on one 
> box for same partition. 
> Here is an example of why it will help:
> 1) Say 1 write and 2 read CAS requests for the same partition key is send to 
> C* in parallel. 
> 2) Since client is token-aware, it sends these 3 request to the same C* 
> instance A. (Lets assume that all 3 requests goto same instance A) 
> 3) In this C* instance A, all 3 CAS requests will contend with each other in 
> Paxos. (This is bad)
> To improve contention in 3), what I am proposing is to add a lock on 
> partition key similar to what we do in PaxosState.java to serialize these 3 
> requests. This will remove the contention and improve performance as these 3 
> requests will not collide with each other.
> Another improvement we can do in client is to pick a deterministic live 
> replica for a given partition doing CAS.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7542) Reduce CAS contention

2014-07-24 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073445#comment-14073445
 ] 

Benedict commented on CASSANDRA-7542:
-

I decided to have a quick crack at this to see if I could get an initial patch 
done for you to take for a spin. It's available 
[here|https://github.com/belliottsmith/cassandra/tree/7542-cascontend]

I find on my local machine that it speeds up the paxos dtest by about 20%, 
which may be a little conservative for improvement as it takes a little while 
for it to get an idea of how long a paxos round takes. There are some 
improvements that should probably be made before rolling it out generally, such 
as tracking latency for each token range independently, and we should probably 
track the level of recent contention so that we can exponentially backoff if 
we're too aggressive (this might permit us to be a little _more_ aggressive in 
the typical case).

On my box, it's worth noting this patch brings the average wait time due to 
contention down to around 15ms, instead of 50ms. Since this is more than a 20% 
decline, there is probably some more tuning to be done besides to get improved 
throughput.

This only explores two of the potential ideas: reducing intra-node competition, 
and reducing sleep interval. 

> Reduce CAS contention
> -
>
> Key: CASSANDRA-7542
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7542
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Benedict
> Fix For: 2.0.10
>
>
> CAS updates on same CQL partition can lead to heavy contention inside C*. I 
> am looking for simple ways(no algorithmic changes) to reduce contention as 
> the penalty of it is high in terms of latency, specially for reads. 
> We can put some sort of synchronization on CQL partition at StorageProxy 
> level. This will reduce contention at least for all requests landing on one 
> box for same partition. 
> Here is an example of why it will help:
> 1) Say 1 write and 2 read CAS requests for the same partition key is send to 
> C* in parallel. 
> 2) Since client is token-aware, it sends these 3 request to the same C* 
> instance A. (Lets assume that all 3 requests goto same instance A) 
> 3) In this C* instance A, all 3 CAS requests will contend with each other in 
> Paxos. (This is bad)
> To improve contention in 3), what I am proposing is to add a lock on 
> partition key similar to what we do in PaxosState.java to serialize these 3 
> requests. This will remove the contention and improve performance as these 3 
> requests will not collide with each other.
> Another improvement we can do in client is to pick a deterministic live 
> replica for a given partition doing CAS.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7542) Reduce CAS contention

2014-07-23 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072314#comment-14072314
 ] 

Benedict commented on CASSANDRA-7542:
-

I'm on holiday for a week after tomorrow, so no hope of getting this done 
before then. Sorry! :) 

> Reduce CAS contention
> -
>
> Key: CASSANDRA-7542
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7542
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Benedict
> Fix For: 2.0.10
>
>
> CAS updates on same CQL partition can lead to heavy contention inside C*. I 
> am looking for simple ways(no algorithmic changes) to reduce contention as 
> the penalty of it is high in terms of latency, specially for reads. 
> We can put some sort of synchronization on CQL partition at StorageProxy 
> level. This will reduce contention at least for all requests landing on one 
> box for same partition. 
> Here is an example of why it will help:
> 1) Say 1 write and 2 read CAS requests for the same partition key is send to 
> C* in parallel. 
> 2) Since client is token-aware, it sends these 3 request to the same C* 
> instance A. (Lets assume that all 3 requests goto same instance A) 
> 3) In this C* instance A, all 3 CAS requests will contend with each other in 
> Paxos. (This is bad)
> To improve contention in 3), what I am proposing is to add a lock on 
> partition key similar to what we do in PaxosState.java to serialize these 3 
> requests. This will remove the contention and improve performance as these 3 
> requests will not collide with each other.
> Another improvement we can do in client is to pick a deterministic live 
> replica for a given partition doing CAS.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7542) Reduce CAS contention

2014-07-23 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072308#comment-14072308
 ] 

sankalp kohli commented on CASSANDRA-7542:
--

[~benedict] Any progress :)

> Reduce CAS contention
> -
>
> Key: CASSANDRA-7542
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7542
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Benedict
> Fix For: 2.0.10
>
>
> CAS updates on same CQL partition can lead to heavy contention inside C*. I 
> am looking for simple ways(no algorithmic changes) to reduce contention as 
> the penalty of it is high in terms of latency, specially for reads. 
> We can put some sort of synchronization on CQL partition at StorageProxy 
> level. This will reduce contention at least for all requests landing on one 
> box for same partition. 
> Here is an example of why it will help:
> 1) Say 1 write and 2 read CAS requests for the same partition key is send to 
> C* in parallel. 
> 2) Since client is token-aware, it sends these 3 request to the same C* 
> instance A. (Lets assume that all 3 requests goto same instance A) 
> 3) In this C* instance A, all 3 CAS requests will contend with each other in 
> Paxos. (This is bad)
> To improve contention in 3), what I am proposing is to add a lock on 
> partition key similar to what we do in PaxosState.java to serialize these 3 
> requests. This will remove the contention and improve performance as these 3 
> requests will not collide with each other.
> Another improvement we can do in client is to pick a deterministic live 
> replica for a given partition doing CAS.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7542) Reduce CAS contention

2014-07-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063405#comment-14063405
 ] 

Benedict commented on CASSANDRA-7542:
-

It seems to me the first and simplest improvement is to bound the _cost_ of 
contention by tracking the, say, ~90th %ile of paxos round time, and to sleep 
for anywhere between this time, and twice this time, once we detect contention. 
Secondly, we could potentially have a lock within each C* process, where for 
some period of time only one paxos request may be in flight for the given 
partition+host (as it looks to me that a read against the same host as a write 
can cause the write's paxos round to be interrupted, so the host fights with 
itself... though I need to digest the code a little more to be sure about this 
one).


> Reduce CAS contention
> -
>
> Key: CASSANDRA-7542
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7542
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Benedict
> Fix For: 2.0.10
>
>
> CAS updates on same CQL partition can lead to heavy contention inside C*. I 
> am looking for simple ways(no algorithmic changes) to reduce contention as 
> the penalty of it is high in terms of latency, specially for reads. 
> We can put some sort of synchronization on CQL partition at StorageProxy 
> level. This will reduce contention at least for all requests landing on one 
> box for same partition. 
> Here is an example of why it will help:
> 1) Say 1 write and 2 read CAS requests for the same partition key is send to 
> C* in parallel. 
> 2) Since client is token-aware, it sends these 3 request to the same C* 
> instance A. (Lets assume that all 3 requests goto same instance A) 
> 3) In this C* instance A, all 3 CAS requests will contend with each other in 
> Paxos. (This is bad)
> To improve contention in 3), what I am proposing is to add a lock on 
> partition key similar to what we do in PaxosState.java to serialize these 3 
> requests. This will remove the contention and improve performance as these 3 
> requests will not collide with each other.
> Another improvement we can do in client is to pick a deterministic live 
> replica for a given partition doing CAS.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7542) Reduce CAS contention

2014-07-15 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062247#comment-14062247
 ] 

sankalp kohli commented on CASSANDRA-7542:
--

Thanks Brandon. I have updated the fix version as 2.0.10 for this. 

> Reduce CAS contention
> -
>
> Key: CASSANDRA-7542
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7542
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Benedict
> Fix For: 2.0.10
>
>
> CAS updates on same CQL partition can lead to heavy contention inside C*. I 
> am looking for simple ways(no algorithmic changes) to reduce contention as 
> the penalty of it is high in terms of latency, specially for reads. 
> We can put some sort of synchronization on CQL partition at StorageProxy 
> level. This will reduce contention at least for all requests landing on one 
> box for same partition. 
> Here is an example of why it will help:
> 1) Say 1 write and 2 read CAS requests for the same partition key is send to 
> C* in parallel. 
> 2) Since client is token-aware, it sends these 3 request to the same C* 
> instance A. (Lets assume that all 3 requests goto same instance A) 
> 3) In this C* instance A, all 3 CAS requests will contend with each other in 
> Paxos. (This is bad)
> To improve contention in 3), what I am proposing is to add a lock on 
> partition key similar to what we do in PaxosState.java to serialize these 3 
> requests. This will remove the contention and improve performance as these 3 
> requests will not collide with each other.
> Another improvement we can do in client is to pick a deterministic live 
> replica for a given partition doing CAS.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)