[jira] [Issue Comment Deleted] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Whiteside updated CASSANDRA-9328: --- Comment: was deleted (was: Completely agree here, if you need to add some sort of versioning/transaction id to detect changes then using CAS/LWT is pointless and you can achieve the same result with Cassandra's default eventual consistency behavior + versioning/transaction id. Which means CAS/LWT are completely broken and meaningless.) > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms > > > Key: CASSANDRA-9328 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Aaron Whiteside > Fix For: 2.1.x > > Attachments: CassandraLWTTest.java, CassandraLWTTest2.java > > > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms. > Unit test attached, run against a 3 node cluster running 2.1.5. > If you reduce the threadCount to 1, you never see a WriteTimeoutException. If > the WTE is due to not being able to communicate with other nodes, why does > the concurrency >1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992789#comment-14992789 ] Aaron Whiteside commented on CASSANDRA-9328: Using a version id (to execute the conditional update on) and a transaction id (to determine if a WTE that really succeeded, was applied by the current thread/transaction/operation) still does not work. Thread A: reads version 1 Thread A: updates version 1 to 2, transaction id to ABC, and sets account balance to $0+$100=$100, but receives a WTE. Thread B: reads version 2 Thread B: updates version 2 to 3, transaction id to XYZ, and sets account balance to $100+500=$600, win the race, no WTEs anywhere in sight. Thread B: is happy! Thread A: tries again, reads version 3 this time, sees that version 3 is greater than it's previous version 2, now it checks the transaction id and finds it's also different.. How can thread A know that it's update failed or succeeded? since between it doing the update and reading the record again, someone else has updated it. At this point thread A might assume it failed and try again and add another $100 to the balance, causing more money to appear in the account than would be expected. Or it might choose to abandon the transaction, but if the WTE was actually due to a timeout and not contention the balance will have $100 less then is expected. And no one is happy. > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms > > > Key: CASSANDRA-9328 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Aaron Whiteside > Fix For: 2.1.x > > Attachments: CassandraLWTTest.java, CassandraLWTTest2.java > > > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms. > Unit test attached, run against a 3 node cluster running 2.1.5. > If you reduce the threadCount to 1, you never see a WriteTimeoutException. If > the WTE is due to not being able to communicate with other nodes, why does > the concurrency >1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992789#comment-14992789 ] Aaron Whiteside edited comment on CASSANDRA-9328 at 11/6/15 3:28 AM: - Using a version id (to execute the conditional update on) and a transaction id (to determine if a WTE really succeeded, representing the current thread/transaction/operation) still does not work. Thread A: reads version 1 Thread A: updates version 1 to 2, transaction id to ABC, and sets account balance to $0+$100=$100, successfully applies the update but still receives a WTE. Thread B: reads version 2 Thread B: updates version 2 to 3, transaction id to XYZ, and sets account balance to $100+500=$600, win the race, no WTEs anywhere in sight. Thread B: is happy! Thread A: tries again, reads version 3 this time, sees that version 3 is greater than it's previous version 2, now it checks the transaction id and finds it's also different.. How can thread A know that it's update failed or succeeded? since between it doing the update and reading the record again, someone else has updated it. At this point thread A might assume it failed and try again and add another $100 to the balance, causing more money to appear in the account than would be expected. Or it might choose to abandon the transaction, but if the WTE was actually due to a timeout and not contention the balance will have $100 less then is expected. And no one is happy. was (Author: aaronjwhiteside): Using a version id (to execute the conditional update on) and a transaction id (to determine if a WTE really succeeded, representing the current thread/transaction/operation) still does not work. Thread A: reads version 1 Thread A: updates version 1 to 2, transaction id to ABC, and sets account balance to $0+$100=$100, but receives a WTE. Thread B: reads version 2 Thread B: updates version 2 to 3, transaction id to XYZ, and sets account balance to $100+500=$600, win the race, no WTEs anywhere in sight. Thread B: is happy! Thread A: tries again, reads version 3 this time, sees that version 3 is greater than it's previous version 2, now it checks the transaction id and finds it's also different.. How can thread A know that it's update failed or succeeded? since between it doing the update and reading the record again, someone else has updated it. At this point thread A might assume it failed and try again and add another $100 to the balance, causing more money to appear in the account than would be expected. Or it might choose to abandon the transaction, but if the WTE was actually due to a timeout and not contention the balance will have $100 less then is expected. And no one is happy. > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms > > > Key: CASSANDRA-9328 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Aaron Whiteside > Fix For: 2.1.x > > Attachments: CassandraLWTTest.java, CassandraLWTTest2.java > > > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms. > Unit test attached, run against a 3 node cluster running 2.1.5. > If you reduce the threadCount to 1, you never see a WriteTimeoutException. If > the WTE is due to not being able to communicate with other nodes, why does > the concurrency >1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992789#comment-14992789 ] Aaron Whiteside edited comment on CASSANDRA-9328 at 11/6/15 3:28 AM: - Using a version id (to execute the conditional update on) and a transaction id (to determine if a WTE really succeeded, representing the current thread/transaction/operation) still does not work. Thread A: reads version 1 Thread A: updates version 1 to 2, transaction id to ABC, and sets account balance to $0+$100=$100, but receives a WTE. Thread B: reads version 2 Thread B: updates version 2 to 3, transaction id to XYZ, and sets account balance to $100+500=$600, win the race, no WTEs anywhere in sight. Thread B: is happy! Thread A: tries again, reads version 3 this time, sees that version 3 is greater than it's previous version 2, now it checks the transaction id and finds it's also different.. How can thread A know that it's update failed or succeeded? since between it doing the update and reading the record again, someone else has updated it. At this point thread A might assume it failed and try again and add another $100 to the balance, causing more money to appear in the account than would be expected. Or it might choose to abandon the transaction, but if the WTE was actually due to a timeout and not contention the balance will have $100 less then is expected. And no one is happy. was (Author: aaronjwhiteside): Using a version id (to execute the conditional update on) and a transaction id (to determine if a WTE that really succeeded, was applied by the current thread/transaction/operation) still does not work. Thread A: reads version 1 Thread A: updates version 1 to 2, transaction id to ABC, and sets account balance to $0+$100=$100, but receives a WTE. Thread B: reads version 2 Thread B: updates version 2 to 3, transaction id to XYZ, and sets account balance to $100+500=$600, win the race, no WTEs anywhere in sight. Thread B: is happy! Thread A: tries again, reads version 3 this time, sees that version 3 is greater than it's previous version 2, now it checks the transaction id and finds it's also different.. How can thread A know that it's update failed or succeeded? since between it doing the update and reading the record again, someone else has updated it. At this point thread A might assume it failed and try again and add another $100 to the balance, causing more money to appear in the account than would be expected. Or it might choose to abandon the transaction, but if the WTE was actually due to a timeout and not contention the balance will have $100 less then is expected. And no one is happy. > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms > > > Key: CASSANDRA-9328 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Aaron Whiteside > Fix For: 2.1.x > > Attachments: CassandraLWTTest.java, CassandraLWTTest2.java > > > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms. > Unit test attached, run against a 3 node cluster running 2.1.5. > If you reduce the threadCount to 1, you never see a WriteTimeoutException. If > the WTE is due to not being able to communicate with other nodes, why does > the concurrency >1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981227#comment-14981227 ] Aaron Whiteside commented on CASSANDRA-9328: Completely agree here, if you need to add some sort of versioning/transaction id to detect changes then using CAS/LWT is pointless and you can achieve the same result with Cassandra's default eventual consistency behavior + versioning/transaction id. Which means CAS/LWT are completely broken and meaningless. > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms > > > Key: CASSANDRA-9328 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 > Project: Cassandra > Issue Type: Bug >Reporter: Aaron Whiteside >Priority: Critical > Fix For: 2.1.x > > Attachments: CassandraLWTTest.java, CassandraLWTTest2.java > > > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms. > Unit test attached, run against a 3 node cluster running 2.1.5. > If you reduce the threadCount to 1, you never see a WriteTimeoutException. If > the WTE is due to not being able to communicate with other nodes, why does > the concurrency >1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981230#comment-14981230 ] Aaron Whiteside commented on CASSANDRA-9328: Personally I think this is acceptable. As you will retry the CAS operation and it will fail again (already applied, or someone else won). The behavior should be correct under ideal conditions, currently it's non-deterministic under ideal conditions. > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms > > > Key: CASSANDRA-9328 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 > Project: Cassandra > Issue Type: Bug >Reporter: Aaron Whiteside >Priority: Critical > Fix For: 2.1.x > > Attachments: CassandraLWTTest.java, CassandraLWTTest2.java > > > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms. > Unit test attached, run against a 3 node cluster running 2.1.5. > If you reduce the threadCount to 1, you never see a WriteTimeoutException. If > the WTE is due to not being able to communicate with other nodes, why does > the concurrency >1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974817#comment-14974817 ] Aaron Whiteside commented on CASSANDRA-9328: If this is a known issue, and there is no other ticket to represent this issue, then please tell me again why you want to close it? This ticket should remain OPEN until the issue is resolved, regardless of the fact there is no known solution. And I don't see any documentation on this feature that says it will provide non-deterministic behavior under light (2 threads) contention. I disagree on your point that you can read the value after writing it to determine if the LWT was successful. You forget in a concurrent environment that this is the very definition of a race condition. With the current LWT implementation you can NEVER know 100% if an update succeeded or not. If you think this is not true please provide sample code on how to accomplish this.. if such a thing exists it should also be added to the official documentation as a work around on how to use LWT "correctly". > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms > > > Key: CASSANDRA-9328 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Aaron Whiteside >Priority: Critical > Fix For: 2.1.x > > Attachments: CassandraLWTTest.java, CassandraLWTTest2.java > > > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms. > Unit test attached, run against a 3 node cluster running 2.1.5. > If you reduce the threadCount to 1, you never see a WriteTimeoutException. If > the WTE is due to not being able to communicate with other nodes, why does > the concurrency >1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Whiteside reopened CASSANDRA-9328: > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms > > > Key: CASSANDRA-9328 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Aaron Whiteside >Priority: Critical > Fix For: 2.1.x > > Attachments: CassandraLWTTest.java, CassandraLWTTest2.java > > > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms. > Unit test attached, run against a 3 node cluster running 2.1.5. > If you reduce the threadCount to 1, you never see a WriteTimeoutException. If > the WTE is due to not being able to communicate with other nodes, why does > the concurrency >1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Whiteside updated CASSANDRA-9328: --- Attachment: CassandraLWTTest2.java WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms Key: CASSANDRA-9328 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 Project: Cassandra Issue Type: Bug Components: Core Reporter: Aaron Whiteside Assignee: Benjamin Lerer Priority: Critical Fix For: 2.1.x Attachments: CassandraLWTTest.java, CassandraLWTTest2.java WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms. Unit test attached, run against a 3 node cluster running 2.1.5. If you reduce the threadCount to 1, you never see a WriteTimeoutException. If the WTE is due to not being able to communicate with other nodes, why does the concurrency 1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535107#comment-14535107 ] Aaron Whiteside commented on CASSANDRA-9328: I've found if I retry the WriteTimeoutException's I get corrupted data, it seems that some updates that throw WTE really succeed.. See new attached test. I'd say that LWT/CAS does not work at all, it's completely broken (not atomic). If it were just slow duration contention that would be fine, since the result would be correct (one would hope). WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms Key: CASSANDRA-9328 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 Project: Cassandra Issue Type: Bug Components: Core Reporter: Aaron Whiteside Assignee: Benjamin Lerer Priority: Critical Fix For: 2.1.x Attachments: CassandraLWTTest.java, CassandraLWTTest2.java WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms. Unit test attached, run against a 3 node cluster running 2.1.5. If you reduce the threadCount to 1, you never see a WriteTimeoutException. If the WTE is due to not being able to communicate with other nodes, why does the concurrency 1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535391#comment-14535391 ] Aaron Whiteside edited comment on CASSANDRA-9328 at 5/8/15 8:00 PM: {quote} And since a WriteTimeoutException already means I don't know, we throw it in that case too, even though it's not a proper timeout per-se. The point being, you should handle it as if it was a timeout. {quote} Is at odds with.. {quote} A WTE means that update may or may not be applied, so yes, the update may have succeeded if you get a WTE. {quote} CAS is not atomic, and you shouldn't retry WTEs because it may cause inconsistent data.. The application cannot go back and read a record after a WTE because someone else might have updated the value (race) and it might not be possible for the application to tell if it should retry or not (based on the value). For CAS in cassandra to work correctly (At the moment) you must ensure the data you update is idempotent and if that is the case you probably wouldn't be using CAS in the first place.. was (Author: aaronjwhiteside): {quote} And since a WriteTimeoutException already means I don't know, we throw it in that case too, even though it's not a proper timeout per-se. The point being, you should handle it as if it was a timeout. {quote} Is at odds with.. {quote} A WTE means that update may or may not be applied, so yes, the update may have succeeded if you get a WTE. {quote} CAS is not atomic, and you shouldn't retry WTEs because it may cause inconsistent data.. The application cannot go back and read a record after a WTE because someone else might have updated the value (race) and it might not be possible for the application to tell if it should retry or not (based on the value). WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms Key: CASSANDRA-9328 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 Project: Cassandra Issue Type: Bug Components: Core Reporter: Aaron Whiteside Assignee: Benjamin Lerer Priority: Critical Fix For: 2.1.x Attachments: CassandraLWTTest.java, CassandraLWTTest2.java WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms. Unit test attached, run against a 3 node cluster running 2.1.5. If you reduce the threadCount to 1, you never see a WriteTimeoutException. If the WTE is due to not being able to communicate with other nodes, why does the concurrency 1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535391#comment-14535391 ] Aaron Whiteside edited comment on CASSANDRA-9328 at 5/8/15 8:11 PM: {quote} And since a WriteTimeoutException already means I don't know, we throw it in that case too, even though it's not a proper timeout per-se. The point being, you should handle it as if it was a timeout. {quote} Is at odds with.. {quote} A WTE means that update may or may not be applied, so yes, the update may have succeeded if you get a WTE. {quote} If CAS is not atomic, and you shouldn't retry WTEs because it may cause inconsistent data.. The application cannot go back and read a record after a WTE because someone else might have updated the value (race) and it might not be possible for the application to tell if it should retry or not (based on the value). For CAS in cassandra to work correctly (At the moment) you must ensure the data you update is idempotent and if that is the case you probably wouldn't be using CAS in the first place.. was (Author: aaronjwhiteside): {quote} And since a WriteTimeoutException already means I don't know, we throw it in that case too, even though it's not a proper timeout per-se. The point being, you should handle it as if it was a timeout. {quote} Is at odds with.. {quote} A WTE means that update may or may not be applied, so yes, the update may have succeeded if you get a WTE. {quote} CAS is not atomic, and you shouldn't retry WTEs because it may cause inconsistent data.. The application cannot go back and read a record after a WTE because someone else might have updated the value (race) and it might not be possible for the application to tell if it should retry or not (based on the value). For CAS in cassandra to work correctly (At the moment) you must ensure the data you update is idempotent and if that is the case you probably wouldn't be using CAS in the first place.. WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms Key: CASSANDRA-9328 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 Project: Cassandra Issue Type: Bug Components: Core Reporter: Aaron Whiteside Assignee: Benjamin Lerer Priority: Critical Fix For: 2.1.x Attachments: CassandraLWTTest.java, CassandraLWTTest2.java WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms. Unit test attached, run against a 3 node cluster running 2.1.5. If you reduce the threadCount to 1, you never see a WriteTimeoutException. If the WTE is due to not being able to communicate with other nodes, why does the concurrency 1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535391#comment-14535391 ] Aaron Whiteside edited comment on CASSANDRA-9328 at 5/8/15 7:57 PM: {quote} And since a WriteTimeoutException already means I don't know, we throw it in that case too, even though it's not a proper timeout per-se. The point being, you should handle it as if it was a timeout. {quote} Is at odds with.. {quote} A WTE means that update may or may not be applied, so yes, the update may have succeeded if you get a WTE. {quote} CAS is not atomic, and you shouldn't retry WTEs because it may cause inconsistent data.. The application cannot go back and read a record after a WTE because someone else might have updated the value (race) and it might not be possible for the application to tell if it should retry or not (based on the value). was (Author: aaronjwhiteside): {quote} And since a WriteTimeoutException already means I don't know, we throw it in that case too, even though it's not a proper timeout per-se. The point being, you should handle it as if it was a timeout. {quote} Is at odds with.. {quote} A WTE means that update may or may not be applied, so yes, the update may have succeeded if you get a WTE. {quote} CAS is not atomic, and you shouldn't retry WTEs because it may cause inconsistent data.. The application cannot go back and read a record after a WTE because someone else might have updated the value (race) and it might not be possible for the application to tell if it should retry or not. WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms Key: CASSANDRA-9328 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 Project: Cassandra Issue Type: Bug Components: Core Reporter: Aaron Whiteside Assignee: Benjamin Lerer Priority: Critical Fix For: 2.1.x Attachments: CassandraLWTTest.java, CassandraLWTTest2.java WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms. Unit test attached, run against a 3 node cluster running 2.1.5. If you reduce the threadCount to 1, you never see a WriteTimeoutException. If the WTE is due to not being able to communicate with other nodes, why does the concurrency 1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535391#comment-14535391 ] Aaron Whiteside commented on CASSANDRA-9328: {quote} And since a WriteTimeoutException already means I don't know, we throw it in that case too, even though it's not a proper timeout per-se. The point being, you should handle it as if it was a timeout. {quote} Is at odds with.. {quote} A WTE means that update may or may not be applied, so yes, the update may have succeeded if you get a WTE. {quote} CAS is not atomic, and you shouldn't retry WTEs because it may cause inconsistent data.. The application cannot go back and read a record after a WTE because someone else might have updated the value (race) and it might not be possible for the application to tell if it should retry or not. WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms Key: CASSANDRA-9328 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 Project: Cassandra Issue Type: Bug Components: Core Reporter: Aaron Whiteside Assignee: Benjamin Lerer Priority: Critical Fix For: 2.1.x Attachments: CassandraLWTTest.java, CassandraLWTTest2.java WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms. Unit test attached, run against a 3 node cluster running 2.1.5. If you reduce the threadCount to 1, you never see a WriteTimeoutException. If the WTE is due to not being able to communicate with other nodes, why does the concurrency 1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9329) Make CAS retry logic configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-9329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533614#comment-14533614 ] Aaron Whiteside commented on CASSANDRA-9329: Can we at least get something into the next 2.x release? Perhaps if cas_contention_timeout_in_ms is set to 0, then we never sleep and only try the condition once? Make CAS retry logic configurable - Key: CASSANDRA-9329 URL: https://issues.apache.org/jira/browse/CASSANDRA-9329 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Aaron Whiteside Fix For: 3.x Make CAS retry logic configurable: One should be able to disable the internal CAS retry loop (when the condition is not met) and let the client choose how to do retries (so the client does not have to incur the server side random sleep of up to 100ms). Basically let the client handle all CAS retries in a manor it sees fit. Secondly the hardcoded sleep up to 100ms that happens when cassandra fails to meet the CAS condition should be configurable. - The max duration should be configurable - The algorithm used to choose the duration should be configurable (Random, Exponential, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9329) Make CAS retry logic configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-9329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533614#comment-14533614 ] Aaron Whiteside edited comment on CASSANDRA-9329 at 5/7/15 11:43 PM: - Can we at least get something into the next 2.x release? Perhaps if cas_contention_timeout_in_ms is set to 0, then we never sleep and only try the condition once? The configurable alg and timeout can come later. was (Author: aaronjwhiteside): Can we at least get something into the next 2.x release? Perhaps if cas_contention_timeout_in_ms is set to 0, then we never sleep and only try the condition once? Make CAS retry logic configurable - Key: CASSANDRA-9329 URL: https://issues.apache.org/jira/browse/CASSANDRA-9329 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Aaron Whiteside Fix For: 3.x Make CAS retry logic configurable: One should be able to disable the internal CAS retry loop (when the condition is not met) and let the client choose how to do retries (so the client does not have to incur the server side random sleep of up to 100ms). Basically let the client handle all CAS retries in a manor it sees fit. Secondly the hardcoded sleep up to 100ms that happens when cassandra fails to meet the CAS condition should be configurable. - The max duration should be configurable - The algorithm used to choose the duration should be configurable (Random, Exponential, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
Aaron Whiteside created CASSANDRA-9328: -- Summary: WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms Key: CASSANDRA-9328 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 Project: Cassandra Issue Type: Bug Components: Core Reporter: Aaron Whiteside Priority: Critical Attachments: CassandraLWTTest.java WriteTimeoutException thrown when LWT concurrency 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms. Unit test attached, run against a 3 node cluster running 2.1.5. If you reduce the threadCount to 1, you never see a WriteTimeoutException. If the WTE is due to not being able to communicate with other nodes, why does the concurrency 1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9330) CAS timeout errors should use a different exception than WriteTimeoutException as WTE can happen when nodes fail to respond.
Aaron Whiteside created CASSANDRA-9330: -- Summary: CAS timeout errors should use a different exception than WriteTimeoutException as WTE can happen when nodes fail to respond. Key: CASSANDRA-9330 URL: https://issues.apache.org/jira/browse/CASSANDRA-9330 Project: Cassandra Issue Type: Improvement Components: Core, Drivers (now out of tree) Reporter: Aaron Whiteside Perhaps a CASContentionTimeoutException? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9329) Make CAS retry logic configurable
Aaron Whiteside created CASSANDRA-9329: -- Summary: Make CAS retry logic configurable Key: CASSANDRA-9329 URL: https://issues.apache.org/jira/browse/CASSANDRA-9329 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Aaron Whiteside Make CAS retry logic configurable: One should be able to disable the internal CAS retry loop (when the condition is not met) and let the client choose how to do retries (so the client does not have to incur the server side random sleep of up to 100ms). Basically let the client handle all CAS retries in a manor it sees fit. Secondly the hardcoded sleep up to 100ms that happens when cassandra fails to meet the CAS condition should be configurable. - The max duration should be configurable - The algorithm used to choose the duration should be configurable (Random, Exponential, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332)