[jira] [Issue Comment Deleted] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-11-05 Thread Aaron Whiteside (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Whiteside updated CASSANDRA-9328:
---
Comment: was deleted

(was: Completely agree here, if you need to add some sort of 
versioning/transaction id to detect changes then using CAS/LWT is pointless and 
you can achieve the same result with Cassandra's default eventual consistency 
behavior + versioning/transaction id. 

Which means CAS/LWT are completely broken and meaningless.)

> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms
> 
>
> Key: CASSANDRA-9328
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Aaron Whiteside
> Fix For: 2.1.x
>
> Attachments: CassandraLWTTest.java, CassandraLWTTest2.java
>
>
> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms.
> Unit test attached, run against a 3 node cluster running 2.1.5.
> If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
> the WTE is due to not being able to communicate with other nodes, why does 
> the concurrency >1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-11-05 Thread Aaron Whiteside (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992789#comment-14992789
 ] 

Aaron Whiteside commented on CASSANDRA-9328:


Using a version id (to execute the conditional update on) and a transaction id 
(to determine if a WTE that really succeeded, was applied by the current 
thread/transaction/operation) still does not work.

Thread A: reads version 1
Thread A: updates version 1 to 2, transaction id to ABC, and sets account 
balance to $0+$100=$100, but receives a WTE.
Thread B: reads version 2
Thread B: updates version 2 to 3, transaction id to XYZ, and sets account 
balance to $100+500=$600, win the race, no WTEs anywhere in sight.
Thread B: is happy!
Thread A: tries again, reads version 3 this time, sees that version 3 is 
greater than it's previous version 2, now it checks the transaction id and 
finds it's also different.. 

How can thread A know that it's update failed or succeeded? since between it 
doing the update and reading the record again, someone else has updated it.

At this point thread A might assume it failed and try again and add another 
$100 to the balance, causing more money to appear in the account than would be 
expected. Or it might choose to abandon the transaction, but if the WTE was 
actually due to a timeout and not contention the balance will have $100 less 
then is expected.

And no one is happy.

> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms
> 
>
> Key: CASSANDRA-9328
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Aaron Whiteside
> Fix For: 2.1.x
>
> Attachments: CassandraLWTTest.java, CassandraLWTTest2.java
>
>
> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms.
> Unit test attached, run against a 3 node cluster running 2.1.5.
> If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
> the WTE is due to not being able to communicate with other nodes, why does 
> the concurrency >1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-11-05 Thread Aaron Whiteside (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992789#comment-14992789
 ] 

Aaron Whiteside edited comment on CASSANDRA-9328 at 11/6/15 3:28 AM:
-

Using a version id (to execute the conditional update on) and a transaction id 
(to determine if a WTE really succeeded, representing the current 
thread/transaction/operation) still does not work.

Thread A: reads version 1
Thread A: updates version 1 to 2, transaction id to ABC, and sets account 
balance to $0+$100=$100, successfully applies the update but still receives a 
WTE.
Thread B: reads version 2
Thread B: updates version 2 to 3, transaction id to XYZ, and sets account 
balance to $100+500=$600, win the race, no WTEs anywhere in sight.
Thread B: is happy!
Thread A: tries again, reads version 3 this time, sees that version 3 is 
greater than it's previous version 2, now it checks the transaction id and 
finds it's also different.. 

How can thread A know that it's update failed or succeeded? since between it 
doing the update and reading the record again, someone else has updated it.

At this point thread A might assume it failed and try again and add another 
$100 to the balance, causing more money to appear in the account than would be 
expected. Or it might choose to abandon the transaction, but if the WTE was 
actually due to a timeout and not contention the balance will have $100 less 
then is expected.

And no one is happy.


was (Author: aaronjwhiteside):
Using a version id (to execute the conditional update on) and a transaction id 
(to determine if a WTE really succeeded, representing the current 
thread/transaction/operation) still does not work.

Thread A: reads version 1
Thread A: updates version 1 to 2, transaction id to ABC, and sets account 
balance to $0+$100=$100, but receives a WTE.
Thread B: reads version 2
Thread B: updates version 2 to 3, transaction id to XYZ, and sets account 
balance to $100+500=$600, win the race, no WTEs anywhere in sight.
Thread B: is happy!
Thread A: tries again, reads version 3 this time, sees that version 3 is 
greater than it's previous version 2, now it checks the transaction id and 
finds it's also different.. 

How can thread A know that it's update failed or succeeded? since between it 
doing the update and reading the record again, someone else has updated it.

At this point thread A might assume it failed and try again and add another 
$100 to the balance, causing more money to appear in the account than would be 
expected. Or it might choose to abandon the transaction, but if the WTE was 
actually due to a timeout and not contention the balance will have $100 less 
then is expected.

And no one is happy.

> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms
> 
>
> Key: CASSANDRA-9328
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Aaron Whiteside
> Fix For: 2.1.x
>
> Attachments: CassandraLWTTest.java, CassandraLWTTest2.java
>
>
> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms.
> Unit test attached, run against a 3 node cluster running 2.1.5.
> If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
> the WTE is due to not being able to communicate with other nodes, why does 
> the concurrency >1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-11-05 Thread Aaron Whiteside (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992789#comment-14992789
 ] 

Aaron Whiteside edited comment on CASSANDRA-9328 at 11/6/15 3:28 AM:
-

Using a version id (to execute the conditional update on) and a transaction id 
(to determine if a WTE really succeeded, representing the current 
thread/transaction/operation) still does not work.

Thread A: reads version 1
Thread A: updates version 1 to 2, transaction id to ABC, and sets account 
balance to $0+$100=$100, but receives a WTE.
Thread B: reads version 2
Thread B: updates version 2 to 3, transaction id to XYZ, and sets account 
balance to $100+500=$600, win the race, no WTEs anywhere in sight.
Thread B: is happy!
Thread A: tries again, reads version 3 this time, sees that version 3 is 
greater than it's previous version 2, now it checks the transaction id and 
finds it's also different.. 

How can thread A know that it's update failed or succeeded? since between it 
doing the update and reading the record again, someone else has updated it.

At this point thread A might assume it failed and try again and add another 
$100 to the balance, causing more money to appear in the account than would be 
expected. Or it might choose to abandon the transaction, but if the WTE was 
actually due to a timeout and not contention the balance will have $100 less 
then is expected.

And no one is happy.


was (Author: aaronjwhiteside):
Using a version id (to execute the conditional update on) and a transaction id 
(to determine if a WTE that really succeeded, was applied by the current 
thread/transaction/operation) still does not work.

Thread A: reads version 1
Thread A: updates version 1 to 2, transaction id to ABC, and sets account 
balance to $0+$100=$100, but receives a WTE.
Thread B: reads version 2
Thread B: updates version 2 to 3, transaction id to XYZ, and sets account 
balance to $100+500=$600, win the race, no WTEs anywhere in sight.
Thread B: is happy!
Thread A: tries again, reads version 3 this time, sees that version 3 is 
greater than it's previous version 2, now it checks the transaction id and 
finds it's also different.. 

How can thread A know that it's update failed or succeeded? since between it 
doing the update and reading the record again, someone else has updated it.

At this point thread A might assume it failed and try again and add another 
$100 to the balance, causing more money to appear in the account than would be 
expected. Or it might choose to abandon the transaction, but if the WTE was 
actually due to a timeout and not contention the balance will have $100 less 
then is expected.

And no one is happy.

> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms
> 
>
> Key: CASSANDRA-9328
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Aaron Whiteside
> Fix For: 2.1.x
>
> Attachments: CassandraLWTTest.java, CassandraLWTTest2.java
>
>
> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms.
> Unit test attached, run against a 3 node cluster running 2.1.5.
> If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
> the WTE is due to not being able to communicate with other nodes, why does 
> the concurrency >1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-10-29 Thread Aaron Whiteside (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981227#comment-14981227
 ] 

Aaron Whiteside commented on CASSANDRA-9328:


Completely agree here, if you need to add some sort of versioning/transaction 
id to detect changes then using CAS/LWT is pointless and you can achieve the 
same result with Cassandra's default eventual consistency behavior + 
versioning/transaction id. 

Which means CAS/LWT are completely broken and meaningless.

> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms
> 
>
> Key: CASSANDRA-9328
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Aaron Whiteside
>Priority: Critical
> Fix For: 2.1.x
>
> Attachments: CassandraLWTTest.java, CassandraLWTTest2.java
>
>
> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms.
> Unit test attached, run against a 3 node cluster running 2.1.5.
> If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
> the WTE is due to not being able to communicate with other nodes, why does 
> the concurrency >1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-10-29 Thread Aaron Whiteside (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981230#comment-14981230
 ] 

Aaron Whiteside commented on CASSANDRA-9328:


Personally I think this is acceptable. As you will retry the CAS operation and 
it will fail again (already applied, or someone else won).

The behavior should be correct under ideal conditions, currently it's 
non-deterministic under ideal conditions.

> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms
> 
>
> Key: CASSANDRA-9328
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Aaron Whiteside
>Priority: Critical
> Fix For: 2.1.x
>
> Attachments: CassandraLWTTest.java, CassandraLWTTest2.java
>
>
> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms.
> Unit test attached, run against a 3 node cluster running 2.1.5.
> If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
> the WTE is due to not being able to communicate with other nodes, why does 
> the concurrency >1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-10-26 Thread Aaron Whiteside (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974817#comment-14974817
 ] 

Aaron Whiteside commented on CASSANDRA-9328:


If this is a known issue, and there is no other ticket to represent this issue, 
then please tell me again why you want to close it? This ticket should remain 
OPEN until the issue is resolved, regardless of the fact there is no known 
solution.

And I don't see any documentation on this feature that says it will provide 
non-deterministic behavior under light (2 threads) contention. 

I disagree on your point that you can read the value after writing it to 
determine if the LWT was successful. You forget in a concurrent environment 
that this is the very definition of a race condition. With the current LWT 
implementation you can NEVER know 100% if an update succeeded or not. If you 
think this is not true please provide sample code on how to accomplish this.. 
if such a thing exists it should also be added to the official documentation as 
a work around on how to use LWT "correctly".


> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms
> 
>
> Key: CASSANDRA-9328
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Aaron Whiteside
>Priority: Critical
> Fix For: 2.1.x
>
> Attachments: CassandraLWTTest.java, CassandraLWTTest2.java
>
>
> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms.
> Unit test attached, run against a 3 node cluster running 2.1.5.
> If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
> the WTE is due to not being able to communicate with other nodes, why does 
> the concurrency >1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-10-26 Thread Aaron Whiteside (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Whiteside reopened CASSANDRA-9328:


> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms
> 
>
> Key: CASSANDRA-9328
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Aaron Whiteside
>Priority: Critical
> Fix For: 2.1.x
>
> Attachments: CassandraLWTTest.java, CassandraLWTTest2.java
>
>
> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms.
> Unit test attached, run against a 3 node cluster running 2.1.5.
> If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
> the WTE is due to not being able to communicate with other nodes, why does 
> the concurrency >1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-05-08 Thread Aaron Whiteside (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Whiteside updated CASSANDRA-9328:
---
Attachment: CassandraLWTTest2.java

 WriteTimeoutException thrown when LWT concurrency  1, despite the query 
 duration taking MUCH less than cas_contention_timeout_in_ms
 

 Key: CASSANDRA-9328
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Aaron Whiteside
Assignee: Benjamin Lerer
Priority: Critical
 Fix For: 2.1.x

 Attachments: CassandraLWTTest.java, CassandraLWTTest2.java


 WriteTimeoutException thrown when LWT concurrency  1, despite the query 
 duration taking MUCH less than cas_contention_timeout_in_ms.
 Unit test attached, run against a 3 node cluster running 2.1.5.
 If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
 the WTE is due to not being able to communicate with other nodes, why does 
 the concurrency 1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-05-08 Thread Aaron Whiteside (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535107#comment-14535107
 ] 

Aaron Whiteside commented on CASSANDRA-9328:


I've found if I retry the WriteTimeoutException's I get corrupted data, it 
seems that some updates that throw WTE really succeed..

See new attached test.

I'd say that LWT/CAS does not work at all, it's completely broken (not atomic). 
If it were just slow duration contention that would be fine, since the result 
would be correct (one would hope).

 WriteTimeoutException thrown when LWT concurrency  1, despite the query 
 duration taking MUCH less than cas_contention_timeout_in_ms
 

 Key: CASSANDRA-9328
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Aaron Whiteside
Assignee: Benjamin Lerer
Priority: Critical
 Fix For: 2.1.x

 Attachments: CassandraLWTTest.java, CassandraLWTTest2.java


 WriteTimeoutException thrown when LWT concurrency  1, despite the query 
 duration taking MUCH less than cas_contention_timeout_in_ms.
 Unit test attached, run against a 3 node cluster running 2.1.5.
 If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
 the WTE is due to not being able to communicate with other nodes, why does 
 the concurrency 1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-05-08 Thread Aaron Whiteside (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535391#comment-14535391
 ] 

Aaron Whiteside edited comment on CASSANDRA-9328 at 5/8/15 8:00 PM:


{quote}
 And since a WriteTimeoutException already means I don't know, we throw it in 
that case too, even though it's not a proper timeout per-se. The point being, 
you should handle it as if it was a timeout.
{quote}
Is at odds with..
{quote}
A WTE means that update may or may not be applied, so yes, the update may have 
succeeded if you get a WTE.
{quote}

CAS is not atomic, and you shouldn't retry WTEs because it may cause 
inconsistent data.. The application cannot go back and read a record after a 
WTE because someone else might have updated the value (race) and it might not 
be possible for the application to tell if it should retry or not (based on the 
value).

For CAS in cassandra to work correctly (At the moment) you must ensure the data 
you update is idempotent and if that is the case you probably wouldn't be using 
CAS in the first place..



was (Author: aaronjwhiteside):
{quote}
 And since a WriteTimeoutException already means I don't know, we throw it in 
that case too, even though it's not a proper timeout per-se. The point being, 
you should handle it as if it was a timeout.
{quote}
Is at odds with..
{quote}
A WTE means that update may or may not be applied, so yes, the update may have 
succeeded if you get a WTE.
{quote}

CAS is not atomic, and you shouldn't retry WTEs because it may cause 
inconsistent data.. The application cannot go back and read a record after a 
WTE because someone else might have updated the value (race) and it might not 
be possible for the application to tell if it should retry or not (based on the 
value).



 WriteTimeoutException thrown when LWT concurrency  1, despite the query 
 duration taking MUCH less than cas_contention_timeout_in_ms
 

 Key: CASSANDRA-9328
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Aaron Whiteside
Assignee: Benjamin Lerer
Priority: Critical
 Fix For: 2.1.x

 Attachments: CassandraLWTTest.java, CassandraLWTTest2.java


 WriteTimeoutException thrown when LWT concurrency  1, despite the query 
 duration taking MUCH less than cas_contention_timeout_in_ms.
 Unit test attached, run against a 3 node cluster running 2.1.5.
 If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
 the WTE is due to not being able to communicate with other nodes, why does 
 the concurrency 1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-05-08 Thread Aaron Whiteside (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535391#comment-14535391
 ] 

Aaron Whiteside edited comment on CASSANDRA-9328 at 5/8/15 8:11 PM:


{quote}
 And since a WriteTimeoutException already means I don't know, we throw it in 
that case too, even though it's not a proper timeout per-se. The point being, 
you should handle it as if it was a timeout.
{quote}
Is at odds with..
{quote}
A WTE means that update may or may not be applied, so yes, the update may have 
succeeded if you get a WTE.
{quote}

If CAS is not atomic, and you shouldn't retry WTEs because it may cause 
inconsistent data.. The application cannot go back and read a record after a 
WTE because someone else might have updated the value (race) and it might not 
be possible for the application to tell if it should retry or not (based on the 
value).

For CAS in cassandra to work correctly (At the moment) you must ensure the data 
you update is idempotent and if that is the case you probably wouldn't be using 
CAS in the first place..



was (Author: aaronjwhiteside):
{quote}
 And since a WriteTimeoutException already means I don't know, we throw it in 
that case too, even though it's not a proper timeout per-se. The point being, 
you should handle it as if it was a timeout.
{quote}
Is at odds with..
{quote}
A WTE means that update may or may not be applied, so yes, the update may have 
succeeded if you get a WTE.
{quote}

CAS is not atomic, and you shouldn't retry WTEs because it may cause 
inconsistent data.. The application cannot go back and read a record after a 
WTE because someone else might have updated the value (race) and it might not 
be possible for the application to tell if it should retry or not (based on the 
value).

For CAS in cassandra to work correctly (At the moment) you must ensure the data 
you update is idempotent and if that is the case you probably wouldn't be using 
CAS in the first place..


 WriteTimeoutException thrown when LWT concurrency  1, despite the query 
 duration taking MUCH less than cas_contention_timeout_in_ms
 

 Key: CASSANDRA-9328
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Aaron Whiteside
Assignee: Benjamin Lerer
Priority: Critical
 Fix For: 2.1.x

 Attachments: CassandraLWTTest.java, CassandraLWTTest2.java


 WriteTimeoutException thrown when LWT concurrency  1, despite the query 
 duration taking MUCH less than cas_contention_timeout_in_ms.
 Unit test attached, run against a 3 node cluster running 2.1.5.
 If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
 the WTE is due to not being able to communicate with other nodes, why does 
 the concurrency 1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-05-08 Thread Aaron Whiteside (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535391#comment-14535391
 ] 

Aaron Whiteside edited comment on CASSANDRA-9328 at 5/8/15 7:57 PM:


{quote}
 And since a WriteTimeoutException already means I don't know, we throw it in 
that case too, even though it's not a proper timeout per-se. The point being, 
you should handle it as if it was a timeout.
{quote}
Is at odds with..
{quote}
A WTE means that update may or may not be applied, so yes, the update may have 
succeeded if you get a WTE.
{quote}

CAS is not atomic, and you shouldn't retry WTEs because it may cause 
inconsistent data.. The application cannot go back and read a record after a 
WTE because someone else might have updated the value (race) and it might not 
be possible for the application to tell if it should retry or not (based on the 
value).




was (Author: aaronjwhiteside):
{quote}
 And since a WriteTimeoutException already means I don't know, we throw it in 
that case too, even though it's not a proper timeout per-se. The point being, 
you should handle it as if it was a timeout.
{quote}
Is at odds with..
{quote}
A WTE means that update may or may not be applied, so yes, the update may have 
succeeded if you get a WTE.
{quote}

CAS is not atomic, and you shouldn't retry WTEs because it may cause 
inconsistent data.. The application cannot go back and read a record after a 
WTE because someone else might have updated the value (race) and it might not 
be possible for the application to tell if it should retry or not.



 WriteTimeoutException thrown when LWT concurrency  1, despite the query 
 duration taking MUCH less than cas_contention_timeout_in_ms
 

 Key: CASSANDRA-9328
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Aaron Whiteside
Assignee: Benjamin Lerer
Priority: Critical
 Fix For: 2.1.x

 Attachments: CassandraLWTTest.java, CassandraLWTTest2.java


 WriteTimeoutException thrown when LWT concurrency  1, despite the query 
 duration taking MUCH less than cas_contention_timeout_in_ms.
 Unit test attached, run against a 3 node cluster running 2.1.5.
 If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
 the WTE is due to not being able to communicate with other nodes, why does 
 the concurrency 1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-05-08 Thread Aaron Whiteside (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535391#comment-14535391
 ] 

Aaron Whiteside commented on CASSANDRA-9328:


{quote}
 And since a WriteTimeoutException already means I don't know, we throw it in 
that case too, even though it's not a proper timeout per-se. The point being, 
you should handle it as if it was a timeout.
{quote}
Is at odds with..
{quote}
A WTE means that update may or may not be applied, so yes, the update may have 
succeeded if you get a WTE.
{quote}

CAS is not atomic, and you shouldn't retry WTEs because it may cause 
inconsistent data.. The application cannot go back and read a record after a 
WTE because someone else might have updated the value (race) and it might not 
be possible for the application to tell if it should retry or not.



 WriteTimeoutException thrown when LWT concurrency  1, despite the query 
 duration taking MUCH less than cas_contention_timeout_in_ms
 

 Key: CASSANDRA-9328
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Aaron Whiteside
Assignee: Benjamin Lerer
Priority: Critical
 Fix For: 2.1.x

 Attachments: CassandraLWTTest.java, CassandraLWTTest2.java


 WriteTimeoutException thrown when LWT concurrency  1, despite the query 
 duration taking MUCH less than cas_contention_timeout_in_ms.
 Unit test attached, run against a 3 node cluster running 2.1.5.
 If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
 the WTE is due to not being able to communicate with other nodes, why does 
 the concurrency 1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9329) Make CAS retry logic configurable

2015-05-07 Thread Aaron Whiteside (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533614#comment-14533614
 ] 

Aaron Whiteside commented on CASSANDRA-9329:


Can we at least get something into the next 2.x release? 

Perhaps if cas_contention_timeout_in_ms is set to 0, then we never sleep and 
only try the condition once?



 Make CAS retry logic configurable
 -

 Key: CASSANDRA-9329
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9329
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Aaron Whiteside
 Fix For: 3.x


 Make CAS retry logic configurable:
 One should be able to disable the internal CAS retry loop (when the condition 
 is not met) and let the client choose how to do retries (so the client does 
 not have to incur the server side random sleep of up to 100ms). Basically let 
 the client handle all CAS retries in a manor it sees fit.
 Secondly the hardcoded sleep up to 100ms that happens when cassandra fails to 
 meet the CAS condition should be configurable.
 - The max duration should be configurable
 - The algorithm used to choose the duration should be configurable (Random, 
 Exponential, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9329) Make CAS retry logic configurable

2015-05-07 Thread Aaron Whiteside (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533614#comment-14533614
 ] 

Aaron Whiteside edited comment on CASSANDRA-9329 at 5/7/15 11:43 PM:
-

Can we at least get something into the next 2.x release? 

Perhaps if cas_contention_timeout_in_ms is set to 0, then we never sleep and 
only try the condition once?

The configurable alg and timeout can come later.


was (Author: aaronjwhiteside):
Can we at least get something into the next 2.x release? 

Perhaps if cas_contention_timeout_in_ms is set to 0, then we never sleep and 
only try the condition once?



 Make CAS retry logic configurable
 -

 Key: CASSANDRA-9329
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9329
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Aaron Whiteside
 Fix For: 3.x


 Make CAS retry logic configurable:
 One should be able to disable the internal CAS retry loop (when the condition 
 is not met) and let the client choose how to do retries (so the client does 
 not have to incur the server side random sleep of up to 100ms). Basically let 
 the client handle all CAS retries in a manor it sees fit.
 Secondly the hardcoded sleep up to 100ms that happens when cassandra fails to 
 meet the CAS condition should be configurable.
 - The max duration should be configurable
 - The algorithm used to choose the duration should be configurable (Random, 
 Exponential, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

2015-05-07 Thread Aaron Whiteside (JIRA)
Aaron Whiteside created CASSANDRA-9328:
--

 Summary: WriteTimeoutException thrown when LWT concurrency  1, 
despite the query duration taking MUCH less than cas_contention_timeout_in_ms
 Key: CASSANDRA-9328
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Aaron Whiteside
Priority: Critical
 Attachments: CassandraLWTTest.java

WriteTimeoutException thrown when LWT concurrency  1, despite the query 
duration taking MUCH less than cas_contention_timeout_in_ms.

Unit test attached, run against a 3 node cluster running 2.1.5.

If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
the WTE is due to not being able to communicate with other nodes, why does the 
concurrency 1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9330) CAS timeout errors should use a different exception than WriteTimeoutException as WTE can happen when nodes fail to respond.

2015-05-07 Thread Aaron Whiteside (JIRA)
Aaron Whiteside created CASSANDRA-9330:
--

 Summary: CAS timeout errors should use a different exception than 
WriteTimeoutException as WTE can happen when nodes fail to respond.
 Key: CASSANDRA-9330
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9330
 Project: Cassandra
  Issue Type: Improvement
  Components: Core, Drivers (now out of tree)
Reporter: Aaron Whiteside


Perhaps a CASContentionTimeoutException?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9329) Make CAS retry logic configurable

2015-05-07 Thread Aaron Whiteside (JIRA)
Aaron Whiteside created CASSANDRA-9329:
--

 Summary: Make CAS retry logic configurable
 Key: CASSANDRA-9329
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9329
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Aaron Whiteside


Make CAS retry logic configurable:

One should be able to disable the internal CAS retry loop (when the condition 
is not met) and let the client choose how to do retries (so the client does not 
have to incur the server side random sleep of up to 100ms). Basically let the 
client handle all CAS retries in a manor it sees fit.

Secondly the hardcoded sleep up to 100ms that happens when cassandra fails to 
meet the CAS condition should be configurable.
- The max duration should be configurable
- The algorithm used to choose the duration should be configurable (Random, 
Exponential, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)