[jira] [Created] (CASSANDRA-19951) Non-serial single partition reads on Accord

2024-09-24 Thread Ariel Weisberg (Jira)
Ariel Weisberg created CASSANDRA-19951:
--

 Summary: Non-serial single partition reads on Accord
 Key: CASSANDRA-19951
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19951
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord, Consistency/Coordination
Reporter: Ariel Weisberg


Factor out just single partition reads on Accord without live migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19944) Synchronously persist command store fields and flush memtables before setting RedundantBefore

2024-09-24 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19944:
---
Attachment: ci_summary.html

> Synchronously persist command store fields and flush memtables before setting 
> RedundantBefore
> -
>
> Key: CASSANDRA-19944
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19944
> Project: Cassandra
>  Issue Type: Task
>  Components: Accord
>Reporter: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> Recovery won't be correct if these fields aren't persisted synchronously.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19944) Synchronously persist command store fields and flush memtables before setting RedundantBefore

2024-09-23 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19944:
---
Test and Documentation Plan: Add simulated persistence delays to 
InMemoryCommandStore, existing Cassandra tests 
 Status: Patch Available  (was: Open)

> Synchronously persist command store fields and flush memtables before setting 
> RedundantBefore
> -
>
> Key: CASSANDRA-19944
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19944
> Project: Cassandra
>  Issue Type: Task
>  Components: Accord
>Reporter: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
>
> Recovery won't be correct if these fields aren't persisted synchronously.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19944) Synchronously persist command store fields and flush memtables before setting RedundantBefore

2024-09-23 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19944:
---
Change Category: Semantic
 Complexity: Normal
  Fix Version/s: 5.x
  Reviewers: Alex Petrov, Benedict Elliott Smith
 Status: Open  (was: Triage Needed)

> Synchronously persist command store fields and flush memtables before setting 
> RedundantBefore
> -
>
> Key: CASSANDRA-19944
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19944
> Project: Cassandra
>  Issue Type: Task
>  Components: Accord
>Reporter: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
>
> Recovery won't be correct if these fields aren't persisted synchronously.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19945) Reverse cursor and iteration support for Trie based memtables

2024-09-23 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19945:
---
Fix Version/s: 5.x
   (was: 5.1-beta)

> Reverse cursor and iteration support for Trie based memtables
> -
>
> Key: CASSANDRA-19945
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19945
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Memtable, Local/SSTable
>Reporter: Ariel Weisberg
>Assignee: Branimir Lambov
>Priority: Normal
> Fix For: 5.x
>
>
> Cherry- pick 
> [https://github.com/datastax/cassandra/commit/196b931c677829d681406f14cf1da814ff5a6624]
> For Accord in particular this is useful to avoid flushing memtables that 
> don't intersect with the range that is going to start having metadata GCed so 
> we can flush less frequently/later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19945) Reverse cursor and iteration support for Trie based memtables

2024-09-23 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19945:
---
Fix Version/s: 5.1-beta

> Reverse cursor and iteration support for Trie based memtables
> -
>
> Key: CASSANDRA-19945
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19945
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Memtable, Local/SSTable
>Reporter: Ariel Weisberg
>Assignee: Branimir Lambov
>Priority: Normal
> Fix For: 5.1-beta
>
>
> Cherry- pick 
> [https://github.com/datastax/cassandra/commit/196b931c677829d681406f14cf1da814ff5a6624]
> For Accord in particular this is useful to avoid flushing memtables that 
> don't intersect with the range that is going to start having metadata GCed so 
> we can flush less frequently/later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19945) Reverse cursor and iteration support for Trie based memtables

2024-09-23 Thread Ariel Weisberg (Jira)
Ariel Weisberg created CASSANDRA-19945:
--

 Summary: Reverse cursor and iteration support for Trie based 
memtables
 Key: CASSANDRA-19945
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19945
 Project: Cassandra
  Issue Type: Improvement
  Components: Local/Memtable, Local/SSTable
Reporter: Ariel Weisberg
Assignee: Branimir Lambov


Cherry- pick 
[https://github.com/datastax/cassandra/commit/196b931c677829d681406f14cf1da814ff5a6624]

For Accord in particular this is useful to avoid flushing memtables that don't 
intersect with the range that is going to start having metadata GCed so we can 
flush less frequently/later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19944) Synchronously persist command store fields and flush memtables before setting RedundantBefore

2024-09-23 Thread Ariel Weisberg (Jira)
Ariel Weisberg created CASSANDRA-19944:
--

 Summary: Synchronously persist command store fields and flush 
memtables before setting RedundantBefore
 Key: CASSANDRA-19944
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19944
 Project: Cassandra
  Issue Type: Task
  Components: Accord
Reporter: Ariel Weisberg


Recovery won't be correct if these fields aren't persisted synchronously.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19926) CEP-15 (C*) increase message timeouts for range barrier messages

2024-09-16 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19926:
---
Reviewers: Ariel Weisberg, Ariel Weisberg  (was: Ariel Weisberg)
   Ariel Weisberg, Ariel Weisberg  (was: Ariel Weisberg)
   Status: Review In Progress  (was: Patch Available)

> CEP-15 (C*) increase message timeouts for range barrier messages
> 
>
> Key: CASSANDRA-19926
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19926
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> Messages involved in the coordination of accord barriers and sync points can 
> take longer than those involved in typical operations and are not blocking 
> client requests. Subjecting them to the same timeouts as client operations 
> can destabilize the system by preventing background bookkeeping operations 
> from completing and putting them into a retry loop. This patch special cases 
> the timeouts for messages coordinating range barriers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19926) CEP-15 (C*) increase message timeouts for range barrier messages

2024-09-16 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19926:
---
Status: Ready to Commit  (was: Review In Progress)

+1, arguably this might be something we want to do for more commands than just 
sync points, but this is pretty easily updated so no need to over engineer now.

> CEP-15 (C*) increase message timeouts for range barrier messages
> 
>
> Key: CASSANDRA-19926
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19926
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
>
> Messages involved in the coordination of accord barriers and sync points can 
> take longer than those involved in typical operations and are not blocking 
> client requests. Subjecting them to the same timeouts as client operations 
> can destabilize the system by preventing background bookkeeping operations 
> from completing and putting them into a retry loop. This patch special cases 
> the timeouts for messages coordinating range barriers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied

2024-08-29 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877789#comment-17877789
 ] 

Ariel Weisberg commented on CASSANDRA-19651:


`waitToSettle` is asserting that the number of endpoints in Gossip is equal to 
the expected number for some period of time and if the endpoint is in Gossip I 
think it means we have pulled the various application states, but it could be a 
lot more complicated that with only a subset being present.

> idealCLWriteLatency metric reports the worst response time instead of the 
> time when ideal CL is satisfied
> -
>
> Key: CASSANDRA-19651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19651
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Observability
>Reporter: Dmitry Konstantinov
>Assignee: Dmitry Konstantinov
>Priority: Normal
> Fix For: 4.0.14, 4.1.7, 5.0.1, 5.1
>
> Attachments: 
> ci_summary-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.html, 
> ci_summary-cassandra-4.1-1ed312f881c0c170c8833ff9fbf397ab8fc625cc.html, 
> ci_summary-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.html, 
> ci_summary-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.html, 
> result_details-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.tar.gz, 
> result_details-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.tar.gz, 
> result_details-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.tar.gz, 
> select-junit-tests-rerun-4.1.zip
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.service.AbstractWriteResponseHandler:
> {code:java}
> private final void decrementResponseOrExpired()
> {
>     int decrementedValue = responsesAndExpirations.decrementAndGet();
>     if (decrementedValue == 0)
>     {
>         // The condition being signaled is a valid proxy for the CL being 
> achieved
>         // Only mark it as failed if the requested CL was achieved.
>         if (!condition.isSignalled() && requestedCLAchieved)
>         {
>             replicaPlan.keyspace().metric.writeFailedIdealCL.inc();
>         }
>         else
>         {
>             
> replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - 
> queryStartNanoTime);
>         }
>     }
> } {code}
> Actual result: responsesAndExpirations is a total number of replicas across 
> all DCs which does not depend on the ideal CL, so the metric value for 
> replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the 
> latest response/timeout for all replicas.
> Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated 
> when we get enough responses from replicas according to the ideal CL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19744) Accord migration and interop correctness

2024-08-22 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19744:
---
Attachment: ci_summary.html

> Accord migration and interop correctness
> 
>
> Key: CASSANDRA-19744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>  Labels: pull-request-available
> Attachments: ci_summary.html
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> There are several issues around splitting and retrying mutations, using the 
> original timestamp for batchlog/hints, batchlog/hint support in general, 
> running Accord barriers only against the ranges actually owned by Accord.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19744) Accord migration and interop correctness

2024-08-22 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19744:
---
Attachment: (was: ci_summary.html)

> Accord migration and interop correctness
> 
>
> Key: CASSANDRA-19744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>  Labels: pull-request-available
> Attachments: ci_summary.html
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> There are several issues around splitting and retrying mutations, using the 
> original timestamp for batchlog/hints, batchlog/hint support in general, 
> running Accord barriers only against the ranges actually owned by Accord.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied

2024-08-22 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876087#comment-17876087
 ] 

Ariel Weisberg commented on CASSANDRA-19651:


We have `spinAssertEquals` for this purpose. I'm not 100% sure the issue is 
that we aren't waiting, if `Instance.startup` is finished should that Gossip 
state not be present? It may be that we have schema and gossip state from some, 
but not all nodes once startup is done. I would just double check to make sure 
startup is working as intended in terms of waiting until it's received enough 
state from Gossip that it is actually done and there isn't some other root 
cause bug here.

See `Gossiper.waitToSettle` for context.

In 5.1 all those checks are removed and it just checks the underlying table to 
see if the data is there. I wonder why it needs to do the `pullSchemaFrom`.

> idealCLWriteLatency metric reports the worst response time instead of the 
> time when ideal CL is satisfied
> -
>
> Key: CASSANDRA-19651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19651
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Observability
>Reporter: Dmitry Konstantinov
>Assignee: Dmitry Konstantinov
>Priority: Normal
> Fix For: 4.0.14, 5.0.1, 5.1, 4.1.7
>
> Attachments: 
> ci_summary-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.html, 
> ci_summary-cassandra-4.1-1ed312f881c0c170c8833ff9fbf397ab8fc625cc.html, 
> ci_summary-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.html, 
> ci_summary-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.html, 
> result_details-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.tar.gz, 
> result_details-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.tar.gz, 
> result_details-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.tar.gz, 
> select-junit-tests-rerun-4.1.zip
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.service.AbstractWriteResponseHandler:
> {code:java}
> private final void decrementResponseOrExpired()
> {
>     int decrementedValue = responsesAndExpirations.decrementAndGet();
>     if (decrementedValue == 0)
>     {
>         // The condition being signaled is a valid proxy for the CL being 
> achieved
>         // Only mark it as failed if the requested CL was achieved.
>         if (!condition.isSignalled() && requestedCLAchieved)
>         {
>             replicaPlan.keyspace().metric.writeFailedIdealCL.inc();
>         }
>         else
>         {
>             
> replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - 
> queryStartNanoTime);
>         }
>     }
> } {code}
> Actual result: responsesAndExpirations is a total number of replicas across 
> all DCs which does not depend on the ideal CL, so the metric value for 
> replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the 
> latest response/timeout for all replicas.
> Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated 
> when we get enough responses from replicas according to the ideal CL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied

2024-08-21 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19651:
---
  Fix Version/s: 4.1.7
 4.0.14
 5.0.1
 5.1
 (was: 5.x)
 (was: 4.1.x)
 (was: 5.0.x)
Source Control Link: 
https://github.com/apache/cassandra/commit/93415c91af3d06504593a87c8b8d7e5d2d65b1ac
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> idealCLWriteLatency metric reports the worst response time instead of the 
> time when ideal CL is satisfied
> -
>
> Key: CASSANDRA-19651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19651
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Observability
>Reporter: Dmitry Konstantinov
>Assignee: Dmitry Konstantinov
>Priority: Normal
> Fix For: 4.1.7, 4.0.14, 5.0.1, 5.1
>
> Attachments: 
> ci_summary-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.html, 
> ci_summary-cassandra-4.1-1ed312f881c0c170c8833ff9fbf397ab8fc625cc.html, 
> ci_summary-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.html, 
> ci_summary-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.html, 
> result_details-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.tar.gz, 
> result_details-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.tar.gz, 
> result_details-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.tar.gz, 
> select-junit-tests-rerun-4.1.zip
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.service.AbstractWriteResponseHandler:
> {code:java}
> private final void decrementResponseOrExpired()
> {
>     int decrementedValue = responsesAndExpirations.decrementAndGet();
>     if (decrementedValue == 0)
>     {
>         // The condition being signaled is a valid proxy for the CL being 
> achieved
>         // Only mark it as failed if the requested CL was achieved.
>         if (!condition.isSignalled() && requestedCLAchieved)
>         {
>             replicaPlan.keyspace().metric.writeFailedIdealCL.inc();
>         }
>         else
>         {
>             
> replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - 
> queryStartNanoTime);
>         }
>     }
> } {code}
> Actual result: responsesAndExpirations is a total number of replicas across 
> all DCs which does not depend on the ideal CL, so the metric value for 
> replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the 
> latest response/timeout for all replicas.
> Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated 
> when we get enough responses from replicas according to the ideal CL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied

2024-08-21 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19651:
---
Status: Ready to Commit  (was: Review In Progress)

> idealCLWriteLatency metric reports the worst response time instead of the 
> time when ideal CL is satisfied
> -
>
> Key: CASSANDRA-19651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19651
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Observability
>Reporter: Dmitry Konstantinov
>Assignee: Dmitry Konstantinov
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: 
> ci_summary-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.html, 
> ci_summary-cassandra-4.1-1ed312f881c0c170c8833ff9fbf397ab8fc625cc.html, 
> ci_summary-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.html, 
> ci_summary-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.html, 
> result_details-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.tar.gz, 
> result_details-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.tar.gz, 
> result_details-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.tar.gz, 
> select-junit-tests-rerun-4.1.zip
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.service.AbstractWriteResponseHandler:
> {code:java}
> private final void decrementResponseOrExpired()
> {
>     int decrementedValue = responsesAndExpirations.decrementAndGet();
>     if (decrementedValue == 0)
>     {
>         // The condition being signaled is a valid proxy for the CL being 
> achieved
>         // Only mark it as failed if the requested CL was achieved.
>         if (!condition.isSignalled() && requestedCLAchieved)
>         {
>             replicaPlan.keyspace().metric.writeFailedIdealCL.inc();
>         }
>         else
>         {
>             
> replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - 
> queryStartNanoTime);
>         }
>     }
> } {code}
> Actual result: responsesAndExpirations is a total number of replicas across 
> all DCs which does not depend on the ideal CL, so the metric value for 
> replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the 
> latest response/timeout for all replicas.
> Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated 
> when we get enough responses from replicas according to the ideal CL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied

2024-08-21 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875642#comment-17875642
 ] 

Ariel Weisberg commented on CASSANDRA-19651:


Merged as 
[93415c91af3d06504593a87c8b8d7e5d2d65b1ac|https://github.com/apache/cassandra/commit/93415c91af3d06504593a87c8b8d7e5d2d65b1ac],
 TY!

> idealCLWriteLatency metric reports the worst response time instead of the 
> time when ideal CL is satisfied
> -
>
> Key: CASSANDRA-19651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19651
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Observability
>Reporter: Dmitry Konstantinov
>Assignee: Dmitry Konstantinov
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: 
> ci_summary-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.html, 
> ci_summary-cassandra-4.1-1ed312f881c0c170c8833ff9fbf397ab8fc625cc.html, 
> ci_summary-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.html, 
> ci_summary-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.html, 
> result_details-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.tar.gz, 
> result_details-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.tar.gz, 
> result_details-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.tar.gz, 
> select-junit-tests-rerun-4.1.zip
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.service.AbstractWriteResponseHandler:
> {code:java}
> private final void decrementResponseOrExpired()
> {
>     int decrementedValue = responsesAndExpirations.decrementAndGet();
>     if (decrementedValue == 0)
>     {
>         // The condition being signaled is a valid proxy for the CL being 
> achieved
>         // Only mark it as failed if the requested CL was achieved.
>         if (!condition.isSignalled() && requestedCLAchieved)
>         {
>             replicaPlan.keyspace().metric.writeFailedIdealCL.inc();
>         }
>         else
>         {
>             
> replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - 
> queryStartNanoTime);
>         }
>     }
> } {code}
> Actual result: responsesAndExpirations is a total number of replicas across 
> all DCs which does not depend on the ideal CL, so the metric value for 
> replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the 
> latest response/timeout for all replicas.
> Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated 
> when we get enough responses from replicas according to the ideal CL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied

2024-08-21 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875547#comment-17875547
 ] 

Ariel Weisberg commented on CASSANDRA-19651:


Thanks, I've also been triaging them, I don't think they are related to your 
patch it's our CI environment, but when I did a baseline run of trunk there 
were some that didn't seem to fail, but also a lot of noise.

I am going to try it on ASF CI for 5.0/5.1 and if that looks good I'll go ahead 
and merge.

These don't appear yet since they are queued and haven't started
https://ci-cassandra.apache.org/job/Cassandra-devbranch-5/54/
https://ci-cassandra.apache.org/job/Cassandra-devbranch-5/55/

> idealCLWriteLatency metric reports the worst response time instead of the 
> time when ideal CL is satisfied
> -
>
> Key: CASSANDRA-19651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19651
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Observability
>Reporter: Dmitry Konstantinov
>Assignee: Dmitry Konstantinov
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: 
> ci_summary-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.html, 
> ci_summary-cassandra-4.1-1ed312f881c0c170c8833ff9fbf397ab8fc625cc.html, 
> ci_summary-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.html, 
> ci_summary-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.html, 
> result_details-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.tar.gz, 
> result_details-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.tar.gz, 
> result_details-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.tar.gz, 
> select-junit-tests-rerun-4.1.zip
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.service.AbstractWriteResponseHandler:
> {code:java}
> private final void decrementResponseOrExpired()
> {
>     int decrementedValue = responsesAndExpirations.decrementAndGet();
>     if (decrementedValue == 0)
>     {
>         // The condition being signaled is a valid proxy for the CL being 
> achieved
>         // Only mark it as failed if the requested CL was achieved.
>         if (!condition.isSignalled() && requestedCLAchieved)
>         {
>             replicaPlan.keyspace().metric.writeFailedIdealCL.inc();
>         }
>         else
>         {
>             
> replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - 
> queryStartNanoTime);
>         }
>     }
> } {code}
> Actual result: responsesAndExpirations is a total number of replicas across 
> all DCs which does not depend on the ideal CL, so the metric value for 
> replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the 
> latest response/timeout for all replicas.
> Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated 
> when we get enough responses from replicas according to the ideal CL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19843) Add tracing support to Accord

2024-08-20 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19843:
---
Change Category: Operability
 Complexity: Normal
 Status: Open  (was: Triage Needed)

> Add tracing support to Accord
> -
>
> Key: CASSANDRA-19843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19843
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord, Observability/Tracing
>Reporter: Ariel Weisberg
>Priority: Normal
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19843) Add tracing support to Accord

2024-08-20 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19843:
---
Authors: Ariel Weisberg
Test and Documentation Plan: Added unit test
 Status: Patch Available  (was: Open)

> Add tracing support to Accord
> -
>
> Key: CASSANDRA-19843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19843
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Accord, Observability/Tracing
>Reporter: Ariel Weisberg
>Priority: Normal
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19843) Add tracing support to Accord

2024-08-20 Thread Ariel Weisberg (Jira)
Ariel Weisberg created CASSANDRA-19843:
--

 Summary: Add tracing support to Accord
 Key: CASSANDRA-19843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19843
 Project: Cassandra
  Issue Type: Improvement
  Components: Accord, Observability/Tracing
Reporter: Ariel Weisberg






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19438) Accord barriers need to handle racing with topology changes

2024-08-14 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19438:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> Accord barriers need to handle racing with topology changes
> ---
>
> Key: CASSANDRA-19438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19438
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Priority: Normal
>
> Topology changes can result in the ranges sent to Accord including things not 
> managed by Accord. It might be sufficient to have the range barriers 
> automatically remove the unsupported subranges since that might be sufficient 
> for the caller.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19440) Non-serial writes can race with Accord topology changes

2024-08-14 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19440:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> Non-serial writes can race with Accord topology changes
> ---
>
> Key: CASSANDRA-19440
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19440
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Priority: Normal
>
> Accord and Paxos handle these, but non-SERIAL writes don't check for this 
> condition and can't retry the portions of the write that failed on the 
> correct system until the entire thing succeeds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19435) Hint delivery doesn't write through Accord

2024-08-14 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19435:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> Hint delivery doesn't write through Accord
> --
>
> Key: CASSANDRA-19435
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19435
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Priority: Normal
>
> Hint delivery doesn't write through Accord which would make txn recovery 
> non-deterministic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19430) Read repair through Accord needs to only route the read repair through Accord if the range is actually migrated/running on Accord

2024-08-14 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19430:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> Read repair through Accord needs to only route the read repair through Accord 
> if the range is actually migrated/running on Accord
> -
>
> Key: CASSANDRA-19430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Priority: Normal
>
> This is because the read repair will simply fail if Accord doesn't manage 
> that range. Not only does it need to be routed through Accord but if it races 
> with topology change it needs to retry and not surface an error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19434) Batch log doesn't write through Accord during Accord migration

2024-08-14 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19434:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> Batch log doesn't write through Accord during Accord migration
> --
>
> Key: CASSANDRA-19434
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19434
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Priority: Normal
>
> This can result in writes not through Accord occurring which makes txn 
> recovery non-deterministic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19736) Batchlog and hint replay have timestamps replaced by Accord

2024-08-14 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19736:
---
Resolution: Fixed
Status: Resolved  (was: Open)

> Batchlog and hint replay have timestamps replaced by Accord
> ---
>
> Key: CASSANDRA-19736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> The issue is that we might create the transaction at a much later time and 
> then the operation would be written to Cassandra with a later timestamp. It 
> should be fine to use the minimum of the two.
> This also means that `USING TIMESTAMP` will also work as long as the provided 
> timestamp is < the Accord timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19744) Accord migration and interop correctness

2024-08-14 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19744:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> Accord migration and interop correctness
> 
>
> Key: CASSANDRA-19744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>  Labels: pull-request-available
> Attachments: ci_summary.html
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> There are several issues around splitting and retrying mutations, using the 
> original timestamp for batchlog/hints, batchlog/hint support in general, 
> running Accord barriers only against the ranges actually owned by Accord.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19744) Accord migration and interop correctness

2024-08-14 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19744:
---
Attachment: ci_summary.html

> Accord migration and interop correctness
> 
>
> Key: CASSANDRA-19744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>  Labels: pull-request-available
> Attachments: ci_summary.html
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> There are several issues around splitting and retrying mutations, using the 
> original timestamp for batchlog/hints, batchlog/hint support in general, 
> running Accord barriers only against the ranges actually owned by Accord.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19744) Accord migration and interop correctness

2024-08-14 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19744:
---
Description: 
There are several issues around splitting and retrying mutations, using the 
original timestamp for batchlog/hints, batchlog/hint support in general, 
running Accord barriers only against the ranges actually owned by Accord.



  was:There are several issues around splitting and retrying mutations, using 
the original timestamp for batchlog/hints, batchlog/hint support in general, 
running Accord barriers only against the ranges actually owned by Accord.


> Accord migration and interop correctness
> 
>
> Key: CASSANDRA-19744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> There are several issues around splitting and retrying mutations, using the 
> original timestamp for batchlog/hints, batchlog/hint support in general, 
> running Accord barriers only against the ranges actually owned by Accord.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19744) Accord migration and interop correctness

2024-08-14 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19744:
---
Reviewers: Blake Eggleston

> Accord migration and interop correctness
> 
>
> Key: CASSANDRA-19744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> There are several issues around splitting and retrying mutations, using the 
> original timestamp for batchlog/hints, batchlog/hint support in general, 
> running Accord barriers only against the ranges actually owned by Accord.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19737) Accord migration mode FULL always runs with interop

2024-08-14 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19737:
---
Resolution: Not A Bug
Status: Resolved  (was: Triage Needed)

> Accord migration mode FULL always runs with interop
> ---
>
> Key: CASSANDRA-19737
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19737
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> Whether we use interop is not done per transaction. Accord always seems to 
> run with interop for every transaction when it is constructed with the 
> factory that creates interop execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied

2024-08-06 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19651:
---
Attachment: 
ci_summary-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.html

ci_summary-cassandra-4.1-1ed312f881c0c170c8833ff9fbf397ab8fc625cc.html

ci_summary-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.html
ci_summary-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.html

result_details-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.tar.gz

result_details-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.tar.gz

result_details-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.tar.gz

> idealCLWriteLatency metric reports the worst response time instead of the 
> time when ideal CL is satisfied
> -
>
> Key: CASSANDRA-19651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19651
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Observability
>Reporter: Dmitry Konstantinov
>Assignee: Dmitry Konstantinov
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: 
> ci_summary-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.html, 
> ci_summary-cassandra-4.1-1ed312f881c0c170c8833ff9fbf397ab8fc625cc.html, 
> ci_summary-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.html, 
> ci_summary-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.html, 
> result_details-cassandra-4.0-a75f6c3e81f677e50c0a0d467dd5dad672f923e3.tar.gz, 
> result_details-cassandra-5.0-009f2982ac88d9c9bc0a7a7d29220f055aa7f11e.tar.gz, 
> result_details-trunk-da68729322515b4a7a698b73a0154ecdeb3abf39.tar.gz
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.service.AbstractWriteResponseHandler:
> {code:java}
> private final void decrementResponseOrExpired()
> {
>     int decrementedValue = responsesAndExpirations.decrementAndGet();
>     if (decrementedValue == 0)
>     {
>         // The condition being signaled is a valid proxy for the CL being 
> achieved
>         // Only mark it as failed if the requested CL was achieved.
>         if (!condition.isSignalled() && requestedCLAchieved)
>         {
>             replicaPlan.keyspace().metric.writeFailedIdealCL.inc();
>         }
>         else
>         {
>             
> replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - 
> queryStartNanoTime);
>         }
>     }
> } {code}
> Actual result: responsesAndExpirations is a total number of replicas across 
> all DCs which does not depend on the ideal CL, so the metric value for 
> replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the 
> latest response/timeout for all replicas.
> Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated 
> when we get enough responses from replicas according to the ideal CL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied

2024-07-23 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868096#comment-17868096
 ] 

Ariel Weisberg commented on CASSANDRA-19651:


I want to make sure you aren't waiting on me. Can you create PRs for 4.0, 4.1, 
5.0, and trunk? Thanks

> idealCLWriteLatency metric reports the worst response time instead of the 
> time when ideal CL is satisfied
> -
>
> Key: CASSANDRA-19651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19651
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Observability
>Reporter: Dmitry Konstantinov
>Assignee: Dmitry Konstantinov
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
>
> org.apache.cassandra.service.AbstractWriteResponseHandler:
> {code:java}
> private final void decrementResponseOrExpired()
> {
>     int decrementedValue = responsesAndExpirations.decrementAndGet();
>     if (decrementedValue == 0)
>     {
>         // The condition being signaled is a valid proxy for the CL being 
> achieved
>         // Only mark it as failed if the requested CL was achieved.
>         if (!condition.isSignalled() && requestedCLAchieved)
>         {
>             replicaPlan.keyspace().metric.writeFailedIdealCL.inc();
>         }
>         else
>         {
>             
> replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - 
> queryStartNanoTime);
>         }
>     }
> } {code}
> Actual result: responsesAndExpirations is a total number of replicas across 
> all DCs which does not depend on the ideal CL, so the metric value for 
> replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the 
> latest response/timeout for all replicas.
> Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated 
> when we get enough responses from replicas according to the ideal CL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied

2024-07-16 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866452#comment-17866452
 ] 

Ariel Weisberg commented on CASSANDRA-19651:


The patch attached to the ticket seems a bit out of date compared to the 
branch, would be better to remove it. This will need patches for 4.0, 4.1, 5.0, 
and trunk. You can skip modifying `CHANGES.txt`, I will add that when I commit. 
When you do modify `CHANGES.txt` always insert your new change at the top of 
the list for the version.

> idealCLWriteLatency metric reports the worst response time instead of the 
> time when ideal CL is satisfied
> -
>
> Key: CASSANDRA-19651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19651
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Observability
>Reporter: Dmitry Konstantinov
>Assignee: Dmitry Konstantinov
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: 19651-4.1.patch
>
>
> org.apache.cassandra.service.AbstractWriteResponseHandler:
> {code:java}
> private final void decrementResponseOrExpired()
> {
>     int decrementedValue = responsesAndExpirations.decrementAndGet();
>     if (decrementedValue == 0)
>     {
>         // The condition being signaled is a valid proxy for the CL being 
> achieved
>         // Only mark it as failed if the requested CL was achieved.
>         if (!condition.isSignalled() && requestedCLAchieved)
>         {
>             replicaPlan.keyspace().metric.writeFailedIdealCL.inc();
>         }
>         else
>         {
>             
> replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - 
> queryStartNanoTime);
>         }
>     }
> } {code}
> Actual result: responsesAndExpirations is a total number of replicas across 
> all DCs which does not depend on the ideal CL, so the metric value for 
> replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the 
> latest response/timeout for all replicas.
> Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated 
> when we get enough responses from replicas according to the ideal CL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19748) [Analytics] Refactor Analytics to move standalone code into common module with minimal dependencies

2024-07-12 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865531#comment-17865531
 ] 

Ariel Weisberg commented on CASSANDRA-19748:


+1 TY

> [Analytics] Refactor Analytics to move standalone code into common module 
> with minimal dependencies
> ---
>
> Key: CASSANDRA-19748
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19748
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Analytics Library
>Reporter: James Berragan
>Assignee: James Berragan
>Priority: Low
> Fix For: NA
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The Analytics codebase is heavily tied to Spark. In an effort to re-use code 
> across projects (like CDC) we should move standalone Pojos and util classes 
> into an cassandra-analytics-common module that exists standalone without 
> dependencies to Cassandra or Spark and with minimal standard dependencies 
> (Kryo, Guava, Jackson, Apache Commons Lang etc).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied

2024-07-10 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19651:
---
Reviewers: Ariel Weisberg, Ariel Weisberg
   Ariel Weisberg, Ariel Weisberg  (was: Ariel Weisberg)
   Status: Review In Progress  (was: Patch Available)

Thanks for spotting this. I’ll take a look.

> idealCLWriteLatency metric reports the worst response time instead of the 
> time when ideal CL is satisfied
> -
>
> Key: CASSANDRA-19651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19651
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Observability
>Reporter: Dmitry Konstantinov
>Assignee: Dmitry Konstantinov
>Priority: Normal
> Fix For: 4.1.x, 5.0.x, 5.x
>
> Attachments: 19651-4.1.patch
>
>
> org.apache.cassandra.service.AbstractWriteResponseHandler:
> {code:java}
> private final void decrementResponseOrExpired()
> {
>     int decrementedValue = responsesAndExpirations.decrementAndGet();
>     if (decrementedValue == 0)
>     {
>         // The condition being signaled is a valid proxy for the CL being 
> achieved
>         // Only mark it as failed if the requested CL was achieved.
>         if (!condition.isSignalled() && requestedCLAchieved)
>         {
>             replicaPlan.keyspace().metric.writeFailedIdealCL.inc();
>         }
>         else
>         {
>             
> replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - 
> queryStartNanoTime);
>         }
>     }
> } {code}
> Actual result: responsesAndExpirations is a total number of replicas across 
> all DCs which does not depend on the ideal CL, so the metric value for 
> replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the 
> latest response/timeout for all replicas.
> Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated 
> when we get enough responses from replicas according to the ideal CL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19437) Non-serial reads/range reads need to be done through Accord for Accord to support async apply/commit

2024-07-10 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864747#comment-17864747
 ] 

Ariel Weisberg edited comment on CASSANDRA-19437 at 7/10/24 4:55 PM:
-

Another thing to keep in mind is that `TxnQuery` will need validate that range 
reads against Accord are against ranges that are managed by Accord in the epoch 
that the read executes in.

`TxnQuery` also currently has a problem with its validation where it doesn't 
have access to the migration information for the epoch of the transaction and 
just has whatever the latest information is.


was (Author: aweisberg):
Another thing to keep in mind is that `TxnQuery` will need validate that range 
reads against Accord are against ranges that are managed by Accord in the epoch 
that the read executes in.

> Non-serial reads/range reads need to be done through Accord for Accord to 
> support async apply/commit
> 
>
> Key: CASSANDRA-19437
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19437
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Priority: Normal
>
> Currently they haven't been implemented. We have a path forward for it using 
> ephemeral reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19437) Non-serial reads/range reads need to be done through Accord for Accord to support async apply/commit

2024-07-10 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864747#comment-17864747
 ] 

Ariel Weisberg commented on CASSANDRA-19437:


Another thing to keep in mind is that `TxnQuery` will need validate that range 
reads against Accord are against ranges that are managed by Accord in the epoch 
that the read executes in.

> Non-serial reads/range reads need to be done through Accord for Accord to 
> support async apply/commit
> 
>
> Key: CASSANDRA-19437
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19437
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Priority: Normal
>
> Currently they haven't been implemented. We have a path forward for it using 
> ephemeral reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19436) When transitioning to Accord migration it's not safe to read immediately using Accord due to concurrent non-serial writes

2024-07-09 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19436:
---
Description: 
Concurrent writes at the same time that migration starts make it unsafe to read 
from Accord because txn recovery will not be deterministic in the presences of 
writes not done through Accord.

Migration to Accord needs to be split into two phases, in the first phase we 
write through Accord and always respect the consistency level and do 
synchronous commit. This allows Paxos and non-serial writes to continue while 
Accord's metadata covers everything needed for future reads. Paxos continues to 
operate as normal since it has enough metadata to allow key migration in the 
second phase and we need to stay online so something needs to handle LWTs.

Then repair runs and makes it possible for Accord to read any data written 
non-transactionally and we can then do key migration and route all updates 
(conditional or blind) through Accord while Paxos repair runs so we can stop 
doing key migration.

  was:
Concurrent writes at the same time that migration starts make it unsafe to read 
from Accord because txn recovery will not be deterministic in the presences of 
writes not done through Accord.

Adding key migration to non-serial writes could solve this by causing writes 
not going through Accord to be rejected at nodes where key migration already 
occurred.


> When transitioning to Accord migration it's not safe to read immediately 
> using Accord due to concurrent non-serial writes
> -
>
> Key: CASSANDRA-19436
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19436
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Priority: Normal
>
> Concurrent writes at the same time that migration starts make it unsafe to 
> read from Accord because txn recovery will not be deterministic in the 
> presences of writes not done through Accord.
> Migration to Accord needs to be split into two phases, in the first phase we 
> write through Accord and always respect the consistency level and do 
> synchronous commit. This allows Paxos and non-serial writes to continue while 
> Accord's metadata covers everything needed for future reads. Paxos continues 
> to operate as normal since it has enough metadata to allow key migration in 
> the second phase and we need to stay online so something needs to handle LWTs.
> Then repair runs and makes it possible for Accord to read any data written 
> non-transactionally and we can then do key migration and route all updates 
> (conditional or blind) through Accord while Paxos repair runs so we can stop 
> doing key migration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19687) ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch

2024-07-09 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19687:
---
Source Control Link: 
[37c957c719491634f081b39900ebf708079ef3ee|https://github.com/apache/cassandra-accord/commit/37c957c719491634f081b39900ebf708079ef3ee]
 
[24fb418adb70f37dfd717fef2a2f33a8802https://github.com/apache/cassandra/commit/24fb418adb70f37dfd717fef2a2f33a8802f21a7]
  (was: 
https://github.com/apache/cassandra-accord/commit/37c957c719491634f081b39900ebf708079ef3ee
 
https://github.com/apache/cassandra/commit/24fb418adb70f37dfd717fef2a2f33a8802f21a7)

> ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch
> -
>
> Key: CASSANDRA-19687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19687
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html
>
>
> It's from the `txnId` not the `executeAt`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19687) ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch

2024-07-09 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19687:
---
Source Control Link: 
https://github.com/apache/cassandra-accord/commit/37c957c719491634f081b39900ebf708079ef3ee
 
24fb418adb70f37dfd717fef2a2f33a8802https://github.com/apache/cassandra/commit/24fb418adb70f37dfd717fef2a2f33a8802f21a7
  (was: 
[37c957c719491634f081b39900ebf708079ef3ee|https://github.com/apache/cassandra-accord/commit/37c957c719491634f081b39900ebf708079ef3ee]
 
[24fb418adb70f37dfd717fef2a2f33a8802https://github.com/apache/cassandra/commit/24fb418adb70f37dfd717fef2a2f33a8802f21a7])

> ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch
> -
>
> Key: CASSANDRA-19687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19687
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html
>
>
> It's from the `txnId` not the `executeAt`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19687) ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch

2024-07-09 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19687:
---
Source Control Link: 
https://github.com/apache/cassandra-accord/commit/37c957c719491634f081b39900ebf708079ef3ee
 
https://github.com/apache/cassandra/commit/24fb418adb70f37dfd717fef2a2f33a8802f21a7
  (was: 
https://github.com/apache/cassandra-accord/commit/37c957c719491634f081b39900ebf708079ef3ee,https://github.com/apache/cassandra/commit/24fb418adb70f37dfd717fef2a2f33a8802f21a7)

> ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch
> -
>
> Key: CASSANDRA-19687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19687
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html
>
>
> It's from the `txnId` not the `executeAt`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19687) ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch

2024-07-09 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19687:
---
  Fix Version/s: 5.1
  Since Version: 5.1
Source Control Link: 
https://github.com/apache/cassandra-accord/commit/37c957c719491634f081b39900ebf708079ef3ee,https://github.com/apache/cassandra/commit/24fb418adb70f37dfd717fef2a2f33a8802f21a7
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch
> -
>
> Key: CASSANDRA-19687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19687
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html
>
>
> It's from the `txnId` not the `executeAt`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19687) ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch

2024-07-09 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19687:
---
Source Control Link: 
https://github.com/apache/cassandra-accord/commit/37c957c719491634f081b39900ebf708079ef3ee
 
https://github.com/apache/cassandra/commit/24fb418adb70f37dfd717fef2a2f33a8802f21a7
  (was: 
https://github.com/apache/cassandra-accord/commit/37c957c719491634f081b39900ebf708079ef3ee
 
24fb418adb70f37dfd717fef2a2f33a8802https://github.com/apache/cassandra/commit/24fb418adb70f37dfd717fef2a2f33a8802f21a7)

> ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch
> -
>
> Key: CASSANDRA-19687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19687
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html
>
>
> It's from the `txnId` not the `executeAt`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19687) ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch

2024-07-09 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19687:
---
Status: Ready to Commit  (was: Review In Progress)

Committed as Accord 
[37c957c719491634f081b39900ebf708079ef3ee|https://github.com/apache/cassandra-accord/commit/37c957c719491634f081b39900ebf708079ef3ee]
 and Cassandra 
[24fb418adb70f37dfd717fef2a2f33a8802f21a7|https://github.com/apache/cassandra/commit/24fb418adb70f37dfd717fef2a2f33a8802f21a7]

> ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch
> -
>
> Key: CASSANDRA-19687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19687
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Attachments: ci_summary.html
>
>
> It's from the `txnId` not the `executeAt`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19687) ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch

2024-07-09 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19687:
---
Reviewers: Benedict Elliott Smith, Ariel Weisberg  (was: Benedict Elliott 
Smith)
   Benedict Elliott Smith, Ariel Weisberg  (was: Ariel Weisberg, 
Benedict Elliott Smith)
   Status: Review In Progress  (was: Patch Available)

> ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch
> -
>
> Key: CASSANDRA-19687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19687
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Attachments: ci_summary.html
>
>
> It's from the `txnId` not the `executeAt`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19737) Accord migration mode FULL always runs with interop

2024-07-08 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863829#comment-17863829
 ] 

Ariel Weisberg edited comment on CASSANDRA-19737 at 7/8/24 3:22 PM:


I should add that some of interop might be working as expected with FULL 
because even though it's an interop execution we do handle consistency levels 
different and provide null for the read and commit CLs to the interop code and 
I see code that should handle this. Where I thought I saw an issue was the read 
for Accord occurring through the interop read path which is a read executor and 
that surprised me as it pretty much obsoletes Accord's internal read path.


was (Author: aweisberg):
I should add that some of interop might be working as expected with FULL 
because even though it's an interop execution we do handle consistency levels 
different and provide null for the read and commit CLs to the interop code and 
I see code that should handle this. Where  I saw issues was read repair 
occurring when I thought the transactional mode was full. 

> Accord migration mode FULL always runs with interop
> ---
>
> Key: CASSANDRA-19737
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19737
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> Whether we use interop is not done per transaction. Accord always seems to 
> run with interop for every transaction when it is constructed with the 
> factory that creates interop execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19737) Accord migration mode FULL always runs with interop

2024-07-08 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863829#comment-17863829
 ] 

Ariel Weisberg commented on CASSANDRA-19737:


I should add that some of interop might be working as expected with FULL 
because even though it's an interop execution we do handle consistency levels 
different and provide null for the read and commit CLs to the interop code and 
I see code that should handle this. Where  I saw issues was read repair 
occurring when I thought the transactional mode was full. 

> Accord migration mode FULL always runs with interop
> ---
>
> Key: CASSANDRA-19737
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19737
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> Whether we use interop is not done per transaction. Accord always seems to 
> run with interop for every transaction when it is constructed with the 
> factory that creates interop execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19419) Non-transactional schema updates can interfere with Accord transaction execution

2024-07-02 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19419:
---
Summary: Non-transactional schema updates can interfere with Accord 
transaction execution  (was: Non-transactional schema updates can interfere 
with Accord transaction execuion)

> Non-transactional schema updates can interfere with Accord transaction 
> execution
> 
>
> Key: CASSANDRA-19419
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19419
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Priority: Normal
>
> While Accord can handle topology changes correctly it can’t handle 
> non-transaction schema updates because those execute outside of Accord. When 
> Accord tries to execute a transaction against the schema in the epoch the 
> transaction is supposed to execute in then it is possible for different nodes 
> to see different schemas when reading or writing data as part of a 
> transaction.
> Dropping a needed a column or table is the most likely issue as we don't 
> support altering column types.
> Because commit is async it is possible for a table or to be dropped before 
> the writes can be propagated after it was acknowledged instead of signaling 
> an error. While the table was dropped it's possible the client needed the 
> error to know that the request was processed improperly or that it needed to 
> take some other action client side.
> Or add table where the original coordinator can't read the table, but the 
> recovery coordinator can and might apply different results to different 
> replicas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19744) Accord migration and interop correctness

2024-07-02 Thread Ariel Weisberg (Jira)
Ariel Weisberg created CASSANDRA-19744:
--

 Summary: Accord migration and interop correctness
 Key: CASSANDRA-19744
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19744
 Project: Cassandra
  Issue Type: Bug
  Components: Accord
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg


There are several issues around splitting and retrying mutations, using the 
original timestamp for batchlog/hints, batchlog/hint support in general, 
running Accord barriers only against the ranges actually owned by Accord.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19737) Accord migration mode FULL always runs with interop

2024-07-01 Thread Ariel Weisberg (Jira)
Ariel Weisberg created CASSANDRA-19737:
--

 Summary: Accord migration mode FULL always runs with interop
 Key: CASSANDRA-19737
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19737
 Project: Cassandra
  Issue Type: Bug
  Components: Accord
Reporter: Ariel Weisberg


Whether we use interop is not done per transaction. Accord always seems to run 
with interop for every transaction when it is constructed with the factory that 
creates interop execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19737) Accord migration mode FULL always runs with interop

2024-07-01 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg reassigned CASSANDRA-19737:
--

Assignee: Ariel Weisberg

> Accord migration mode FULL always runs with interop
> ---
>
> Key: CASSANDRA-19737
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19737
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> Whether we use interop is not done per transaction. Accord always seems to 
> run with interop for every transaction when it is constructed with the 
> factory that creates interop execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19736) Batchlog and hint replay have timestamps replaced by Accord

2024-07-01 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19736:
---
 Bug Category: Parent values: Correctness(12982)Level 1 values: 
Unrecoverable Corruption / Loss(13161)
   Complexity: Normal
Discovered By: Code Inspection
 Severity: Critical
   Status: Open  (was: Triage Needed)

> Batchlog and hint replay have timestamps replaced by Accord
> ---
>
> Key: CASSANDRA-19736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> The issue is that we might create the transaction at a much later time and 
> then the operation would be written to Cassandra with a later timestamp. It 
> should be fine to use the minimum of the two.
> This also means that `USING TIMESTAMP` will also work as long as the provided 
> timestamp is < the Accord timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19736) Batchlog and hint replay have timestamps replaced by Accord

2024-07-01 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg reassigned CASSANDRA-19736:
--

Assignee: Ariel Weisberg

> Batchlog and hint replay have timestamps replaced by Accord
> ---
>
> Key: CASSANDRA-19736
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19736
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> The issue is that we might create the transaction at a much later time and 
> then the operation would be written to Cassandra with a later timestamp. It 
> should be fine to use the minimum of the two.
> This also means that `USING TIMESTAMP` will also work as long as the provided 
> timestamp is < the Accord timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19736) Batchlog and hint replay have timestamps replaced by Accord

2024-07-01 Thread Ariel Weisberg (Jira)
Ariel Weisberg created CASSANDRA-19736:
--

 Summary: Batchlog and hint replay have timestamps replaced by 
Accord
 Key: CASSANDRA-19736
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19736
 Project: Cassandra
  Issue Type: Bug
  Components: Accord
Reporter: Ariel Weisberg


The issue is that we might create the transaction at a much later time and then 
the operation would be written to Cassandra with a later timestamp. It should 
be fine to use the minimum of the two.

This also means that `USING TIMESTAMP` will also work as long as the provided 
timestamp is < the Accord timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19718) CEP-15: (Accord) SyncPoint timeouts become a Exhausted rather than a Timeout and doesn’t get retried

2024-06-18 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19718:
---
Reviewers: Ariel Weisberg

> CEP-15: (Accord) SyncPoint timeouts become a Exhausted rather than a Timeout 
> and doesn’t get retried
> 
>
> Key: CASSANDRA-19718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19718
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: NA
>
>
> In Cassandra we try to make sure coordinators return timeout if every call 
> under it was also a timeout, this makes it easier to understand what is going 
> on (coordination failure due to timeouts looks very different than us just 
> timing out), but accord doesn't do this; leading to an Exhausted error (which 
> we don't retry)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19687) ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch

2024-06-12 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19687:
---
Attachment: ci_summary.html

> ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch
> -
>
> Key: CASSANDRA-19687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19687
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Attachments: ci_summary.html
>
>
> It's from the `txnId` not the `executeAt`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19687) ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch

2024-06-06 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19687:
---
Test and Documentation Plan: Burn test
 Status: Patch Available  (was: Open)

> ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch
> -
>
> Key: CASSANDRA-19687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19687
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> It's from the `txnId` not the `executeAt`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19687) ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch

2024-06-06 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19687:
---
 Bug Category: Parent values: Correctness(12982)
   Complexity: Normal
Discovered By: Fuzz Test
Reviewers: Benedict Elliott Smith
 Severity: Normal
   Status: Open  (was: Triage Needed)

> ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch
> -
>
> Key: CASSANDRA-19687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19687
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> It's from the `txnId` not the `executeAt`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19687) ApplyThenWaitUntilApplied supplies wrong epoch for executeAtEpoch

2024-06-06 Thread Ariel Weisberg (Jira)
Ariel Weisberg created CASSANDRA-19687:
--

 Summary: ApplyThenWaitUntilApplied supplies wrong epoch for 
executeAtEpoch
 Key: CASSANDRA-19687
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19687
 Project: Cassandra
  Issue Type: Bug
  Components: Accord
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg


It's from the `txnId` not the `executeAt`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest

2024-05-31 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19641:
---
  Fix Version/s: 5.x
  Since Version: 5.x
Source Control Link: 
https://github.com/apache/cassandra/commit/3b99044d6d5491304d4a25d8dcea54510cfd3215
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed as Cassandra  and Accord 
[4e8bcae81f9751b9d732fd5056bce31c97ad58f3|https://github.com/apache/cassandra-accord/commit/4e8bcae81f9751b9d732fd5056bce31c97ad58f3].

> Accord barriers/inclusive sync points cause failures in BurnTest
> 
>
> Key: CASSANDRA-19641
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19641
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> The burn test fails almost every run at the moment we found several things to 
> fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest

2024-05-31 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19641:
---
Status: Ready to Commit  (was: Review In Progress)

> Accord barriers/inclusive sync points cause failures in BurnTest
> 
>
> Key: CASSANDRA-19641
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19641
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Attachments: ci_summary.html
>
>
> The burn test fails almost every run at the moment we found several things to 
> fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest

2024-05-31 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19641:
---
Reviewers: Benedict Elliott Smith, Ariel Weisberg
   Benedict Elliott Smith, Ariel Weisberg  (was: Ariel Weisberg, 
Benedict Elliott Smith)
   Status: Review In Progress  (was: Patch Available)

> Accord barriers/inclusive sync points cause failures in BurnTest
> 
>
> Key: CASSANDRA-19641
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19641
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Attachments: ci_summary.html
>
>
> The burn test fails almost every run at the moment we found several things to 
> fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest

2024-05-21 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19641:
---
Attachment: ci_summary.html

> Accord barriers/inclusive sync points cause failures in BurnTest
> 
>
> Key: CASSANDRA-19641
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19641
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Attachments: ci_summary.html
>
>
> The burn test fails almost every run at the moment we found several things to 
> fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest

2024-05-21 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19641:
---
Test and Documentation Plan: Small tweaks to one of the Accord tests, 
covered by existing simulator tests, going to add checks in AccordMigrationTest 
that validate that the cache and system table for migrated keys is being 
correctly populated
 Status: Patch Available  (was: Open)

> Accord barriers/inclusive sync points cause failures in BurnTest
> 
>
> Key: CASSANDRA-19641
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19641
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> The burn test fails almost every run at the moment we found several things to 
> fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest

2024-05-17 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19641:
---
 Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)
   Complexity: Normal
Discovered By: Fuzz Test
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Accord barriers/inclusive sync points cause failures in BurnTest
> 
>
> Key: CASSANDRA-19641
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19641
> Project: Cassandra
>  Issue Type: Bug
>  Components: Accord
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> The burn test fails almost every run at the moment we found several things to 
> fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest

2024-05-17 Thread Ariel Weisberg (Jira)
Ariel Weisberg created CASSANDRA-19641:
--

 Summary: Accord barriers/inclusive sync points cause failures in 
BurnTest
 Key: CASSANDRA-19641
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19641
 Project: Cassandra
  Issue Type: Bug
  Components: Accord
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg


The burn test fails almost every run at the moment we found several things to 
fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19636) Fix CCM for Cassandra 5.0 and add arg to the command line which let the user explicitly select JVM

2024-05-17 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847378#comment-17847378
 ] 

Ariel Weisberg commented on CASSANDRA-19636:


I didn't test this yet (still working on getting the existing changes to run), 
but +1 on what I saw in the PR and its description.

> Fix CCM for Cassandra 5.0 and add arg to the command line which let the user 
> explicitly select JVM
> --
>
> Key: CASSANDRA-19636
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19636
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Attachments: CASSANDRA-19636_50_75_ci_summary.html, 
> CASSANDRA-19636_50_75_results_details.tar.xz, 
> CASSANDRA-19636_trunk_76_ci_summary.html, 
> CASSANDRA-19636_trunk_76_results_details.tar.xz
>
>
> CCM fails to select the right Java version for Cassandra 5 binary 
> distribution.
> There are also two additional changes proposed here:
>  * add {{--jvm-version}} argument to let the user explicitly select Java 
> version when starting a node from command line
>  * fail if {{java}} command is available on the {{PATH}} and points to a 
> different Java version than Java distribution defined in {{JAVA_HOME}} 
> because there is no obvious way for the user to figure out which one is going 
> to be used
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19636) Fix CCM for Cassandra 5.0 and add arg to the command line which let the user explicitly select JVM

2024-05-16 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846980#comment-17846980
 ] 

Ariel Weisberg edited comment on CASSANDRA-19636 at 5/16/24 2:37 PM:
-

Great!

I assume separately the [[upgrade_manifest.py|http://example.com]] to not 
depend on the JAVAX_HOME so we have a more canonical set of things to test?


was (Author: aweisberg):
Great!

I assume separately the 
[upgrade_manifest.py](https://github.com/apache/cassandra-dtest/blob/trunk/upgrade_tests/upgrade_manifest.py#L228)
 to not depend on the JAVAX_HOME so we have a more canonical set of things to 
test?

> Fix CCM for Cassandra 5.0 and add arg to the command line which let the user 
> explicitly select JVM
> --
>
> Key: CASSANDRA-19636
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19636
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Attachments: CASSANDRA-19636_50_75_ci_summary.html, 
> CASSANDRA-19636_50_75_results_details.tar.xz, 
> CASSANDRA-19636_trunk_76_ci_summary.html, 
> CASSANDRA-19636_trunk_76_results_details.tar.xz
>
>
> CCM fails to select the right Java version for Cassandra 5 binary 
> distribution.
> There are also two additional changes proposed here:
>  * add {{--jvm-version}} argument to let the user explicitly select Java 
> version when starting a node from command line
>  * fail if {{java}} command is available on the {{PATH}} and points to a 
> different Java version than Java distribution defined in {{JAVA_HOME}} 
> because there is no obvious way for the user to figure out which one is going 
> to be used
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19636) Fix CCM for Cassandra 5.0 and add arg to the command line which let the user explicitly select JVM

2024-05-16 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846980#comment-17846980
 ] 

Ariel Weisberg edited comment on CASSANDRA-19636 at 5/16/24 2:37 PM:
-

Great!

I assume separately the [upgrade_manifest.py|http://example.com] to not depend 
on the JAVAX_HOME so we have a more canonical set of things to test?


was (Author: aweisberg):
Great!

I assume separately the [[upgrade_manifest.py|http://example.com]] to not 
depend on the JAVAX_HOME so we have a more canonical set of things to test?

> Fix CCM for Cassandra 5.0 and add arg to the command line which let the user 
> explicitly select JVM
> --
>
> Key: CASSANDRA-19636
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19636
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Attachments: CASSANDRA-19636_50_75_ci_summary.html, 
> CASSANDRA-19636_50_75_results_details.tar.xz, 
> CASSANDRA-19636_trunk_76_ci_summary.html, 
> CASSANDRA-19636_trunk_76_results_details.tar.xz
>
>
> CCM fails to select the right Java version for Cassandra 5 binary 
> distribution.
> There are also two additional changes proposed here:
>  * add {{--jvm-version}} argument to let the user explicitly select Java 
> version when starting a node from command line
>  * fail if {{java}} command is available on the {{PATH}} and points to a 
> different Java version than Java distribution defined in {{JAVA_HOME}} 
> because there is no obvious way for the user to figure out which one is going 
> to be used
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19636) Fix CCM for Cassandra 5.0 and add arg to the command line which let the user explicitly select JVM

2024-05-16 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846980#comment-17846980
 ] 

Ariel Weisberg commented on CASSANDRA-19636:


Great!

I assume separately the 
[upgrade_manifest.py](https://github.com/apache/cassandra-dtest/blob/trunk/upgrade_tests/upgrade_manifest.py#L228)
 to not depend on the JAVAX_HOME so we have a more canonical set of things to 
test?

> Fix CCM for Cassandra 5.0 and add arg to the command line which let the user 
> explicitly select JVM
> --
>
> Key: CASSANDRA-19636
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19636
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Attachments: CASSANDRA-19636_50_75_ci_summary.html, 
> CASSANDRA-19636_50_75_results_details.tar.xz, 
> CASSANDRA-19636_trunk_76_ci_summary.html, 
> CASSANDRA-19636_trunk_76_results_details.tar.xz
>
>
> CCM fails to select the right Java version for Cassandra 5 binary 
> distribution.
> There are also two additional changes proposed here:
>  * add {{--jvm-version}} argument to let the user explicitly select Java 
> version when starting a node from command line
>  * fail if {{java}} command is available on the {{PATH}} and points to a 
> different Java version than Java distribution defined in {{JAVA_HOME}} 
> because there is no obvious way for the user to figure out which one is going 
> to be used
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19636) Fix CCM for Cassandra 5.0 and add arg to the command line which let the user explicitly select JVM

2024-05-16 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846957#comment-17846957
 ] 

Ariel Weisberg commented on CASSANDRA-19636:


In terms of future direction for CCM behavior. If CCM automatically selecting a 
compatible version goes away we should minimize the number of things you need 
to manage to make CCM do the thing.

* Ignore PATH and only use JAVA_HOME
* If JAVA_HOME JDK is incompatible return an error
* Allowing specifying JDK version as a parameter and then look up the actual 
JDK location from JAVAX_HOME

Existing users now don't need to modify environment variables to do whatever it 
is they are trying to do with CCM.

> Fix CCM for Cassandra 5.0 and add arg to the command line which let the user 
> explicitly select JVM
> --
>
> Key: CASSANDRA-19636
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19636
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Attachments: CASSANDRA-19636_50_75_ci_summary.html, 
> CASSANDRA-19636_50_75_results_details.tar.xz, 
> CASSANDRA-19636_trunk_76_ci_summary.html, 
> CASSANDRA-19636_trunk_76_results_details.tar.xz
>
>
> CCM fails to select the right Java version for Cassandra 5 binary 
> distribution.
> There are also two additional changes proposed here:
>  * add {{--jvm-version}} argument to let the user explicitly select Java 
> version when starting a node from command line
>  * fail if {{java}} command is available on the {{PATH}} and points to a 
> different Java version than Java distribution defined in {{JAVA_HOME}} 
> because there is no obvious way for the user to figure out which one is going 
> to be used
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19596) IntervalTree build throughput is low enough to be a bottleneck

2024-05-09 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19596:
---
Attachment: ci_summary.html

> IntervalTree build throughput is low enough to be a bottleneck
> --
>
> Key: CASSANDRA-19596
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19596
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/SSTable
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> With several terabytes of data and 8 compactors it’s possible for the 
> compactors to spend a lot of time blocked waiting on IntervalTrees to be 
> built.
> There is also a lot of wasted CPU because it’s updated optimistically so most 
> of them end up being thrown away.
> This can end up being quite painful because it can block memtable flushing as 
> well and then a single slow CFS can block unrelated CFS because the memtable 
> post flush executor is single threaded and shared across all CFS. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19596) IntervalTree build throughput is low enough to be a bottleneck

2024-05-09 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845106#comment-17845106
 ] 

Ariel Weisberg commented on CASSANDRA-19596:


This is a quick and dirty improvement that removes the redundant sorting and 
replaces it with re-use of the existing sorted data.

So instead of having to repeat the n * Lg(n) sort to construct every node we 
only have to do linear scans of the already sorted data that is in that nodes 
subtree.

> IntervalTree build throughput is low enough to be a bottleneck
> --
>
> Key: CASSANDRA-19596
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19596
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/SSTable
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
>
> With several terabytes of data and 8 compactors it’s possible for the 
> compactors to spend a lot of time blocked waiting on IntervalTrees to be 
> built.
> There is also a lot of wasted CPU because it’s updated optimistically so most 
> of them end up being thrown away.
> This can end up being quite painful because it can block memtable flushing as 
> well and then a single slow CFS can block unrelated CFS because the memtable 
> post flush executor is single threaded and shared across all CFS. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19596) IntervalTree build throughput is low enough to be a bottleneck

2024-05-09 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19596:
---
Change Category: Performance
 Complexity: Low Hanging Fruit
  Fix Version/s: 5.x
 Status: Open  (was: Triage Needed)

> IntervalTree build throughput is low enough to be a bottleneck
> --
>
> Key: CASSANDRA-19596
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19596
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/SSTable
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
>
> With several terabytes of data and 8 compactors it’s possible for the 
> compactors to spend a lot of time blocked waiting on IntervalTrees to be 
> built.
> There is also a lot of wasted CPU because it’s updated optimistically so most 
> of them end up being thrown away.
> This can end up being quite painful because it can block memtable flushing as 
> well and then a single slow CFS can block unrelated CFS because the memtable 
> post flush executor is single threaded and shared across all CFS. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19596) IntervalTree build throughput is low enough to be a bottleneck

2024-05-09 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19596:
---
Test and Documentation Plan: Existing units tests + a new QT based test
 Status: Patch Available  (was: Open)

> IntervalTree build throughput is low enough to be a bottleneck
> --
>
> Key: CASSANDRA-19596
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19596
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/SSTable
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
>
> With several terabytes of data and 8 compactors it’s possible for the 
> compactors to spend a lot of time blocked waiting on IntervalTrees to be 
> built.
> There is also a lot of wasted CPU because it’s updated optimistically so most 
> of them end up being thrown away.
> This can end up being quite painful because it can block memtable flushing as 
> well and then a single slow CFS can block unrelated CFS because the memtable 
> post flush executor is single threaded and shared across all CFS. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19596) IntervalTree build throughput is low enough to be a bottleneck

2024-05-09 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg reassigned CASSANDRA-19596:
--

Assignee: Ariel Weisberg

> IntervalTree build throughput is low enough to be a bottleneck
> --
>
> Key: CASSANDRA-19596
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19596
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/SSTable
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> With several terabytes of data and 8 compactors it’s possible for the 
> compactors to spend a lot of time blocked waiting on IntervalTrees to be 
> built.
> There is also a lot of wasted CPU because it’s updated optimistically so most 
> of them end up being thrown away.
> This can end up being quite painful because it can block memtable flushing as 
> well and then a single slow CFS can block unrelated CFS because the memtable 
> post flush executor is single threaded and shared across all CFS. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction

2024-04-30 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842456#comment-17842456
 ] 

Ariel Weisberg commented on CASSANDRA-19597:


I have a patch for this. I think I need to add a test as flushing and doing 
post flush things in order doesn't seem like it is very well covered. 
`CommitLogTest` has something, but it doesn't look like it actually checks that 
the post flush stuff runs in order or makes it run out of order.

CFS also doesn't look very testable so I need to spend some time figuring out 
how to test it without making a mess.

> SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction
> -
>
> Key: CASSANDRA-19597
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19597
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Memtable
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Attachments: ci_summary.html
>
>
> There is a single post flush thread and that thread processes tasks in order 
> and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, 
> and that memtable flush can be blocked by slow IntervalTree building and 
> racing with compactors to try and build an interval tree.
> Unless there is a requirement for ordering we probably want to loosen this to 
> the actual ordering requirement so that problems in one keyspace can’t effect 
> another.
> SystemKeyspace and Gossip in particular cause lots of weird problems like 
> nodes marking each other down because Gossip can’t process nodes being 
> removed (blocking flush each time in SystemKeyspace.removeNode)
> A very simple fix here might be to queue the post flush task at the same time 
> as the flush in a per CFS queue, and then submit the task only once the flush 
> is completed.
> If flushes complete out of order the queue will still ensure their 
> completions are processed in order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction

2024-04-30 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19597:
---
Attachment: ci_summary.html

> SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction
> -
>
> Key: CASSANDRA-19597
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19597
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Memtable
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Attachments: ci_summary.html
>
>
> There is a single post flush thread and that thread processes tasks in order 
> and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, 
> and that memtable flush can be blocked by slow IntervalTree building and 
> racing with compactors to try and build an interval tree.
> Unless there is a requirement for ordering we probably want to loosen this to 
> the actual ordering requirement so that problems in one keyspace can’t effect 
> another.
> SystemKeyspace and Gossip in particular cause lots of weird problems like 
> nodes marking each other down because Gossip can’t process nodes being 
> removed (blocking flush each time in SystemKeyspace.removeNode)
> A very simple fix here might be to queue the post flush task at the same time 
> as the flush in a per CFS queue, and then submit the task only once the flush 
> is completed.
> If flushes complete out of order the queue will still ensure their 
> completions are processed in order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction

2024-04-30 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19597:
---
 Bug Category: Parent values: Availability(12983)Level 1 values: 
Unavailable(12994)
   Complexity: Normal
  Component/s: Local/Memtable
Discovered By: User Report
 Severity: Normal
   Status: Open  (was: Triage Needed)

> SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction
> -
>
> Key: CASSANDRA-19597
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19597
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Memtable
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> There is a single post flush thread and that thread processes tasks in order 
> and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, 
> and that memtable flush can be blocked by slow IntervalTree building and 
> racing with compactors to try and build an interval tree.
> Unless there is a requirement for ordering we probably want to loosen this to 
> the actual ordering requirement so that problems in one keyspace can’t effect 
> another.
> SystemKeyspace and Gossip in particular cause lots of weird problems like 
> nodes marking each other down because Gossip can’t process nodes being 
> removed (blocking flush each time in SystemKeyspace.removeNode)
> A very simple fix here might be to queue the post flush task at the same time 
> as the flush in a per CFS queue, and then submit the task only once the flush 
> is completed.
> If flushes complete out of order the queue will still ensure their 
> completions are processed in order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction

2024-04-29 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842091#comment-17842091
 ] 

Ariel Weisberg commented on CASSANDRA-19597:


[~benedict] is the requirement for post flush processing that it be done in 
order per CFS so a per CFS queue would actually address the problem of keeping 
the post flush processing in order?

> SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction
> -
>
> Key: CASSANDRA-19597
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19597
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Priority: Normal
>
> There is a single post flush thread and that thread processes tasks in order 
> and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, 
> and that memtable flush can be blocked by slow IntervalTree building and 
> racing with compactors to try and build an interval tree.
> Unless there is a requirement for ordering we probably want to loosen this to 
> the actual ordering requirement so that problems in one keyspace can’t effect 
> another.
> SystemKeyspace and Gossip in particular cause lots of weird problems like 
> nodes marking each other down because Gossip can’t process nodes being 
> removed (blocking flush each time in SystemKeyspace.removeNode)
> A very simple fix here might be to queue the post flush task at the same time 
> as the flush in a per CFS queue, and then submit the task only once the flush 
> is completed.
> If flushes complete out of order the queue will still ensure their 
> completions are processed in order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction

2024-04-29 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg reassigned CASSANDRA-19597:
--

Assignee: Ariel Weisberg

> SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction
> -
>
> Key: CASSANDRA-19597
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19597
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
>
> There is a single post flush thread and that thread processes tasks in order 
> and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, 
> and that memtable flush can be blocked by slow IntervalTree building and 
> racing with compactors to try and build an interval tree.
> Unless there is a requirement for ordering we probably want to loosen this to 
> the actual ordering requirement so that problems in one keyspace can’t effect 
> another.
> SystemKeyspace and Gossip in particular cause lots of weird problems like 
> nodes marking each other down because Gossip can’t process nodes being 
> removed (blocking flush each time in SystemKeyspace.removeNode)
> A very simple fix here might be to queue the post flush task at the same time 
> as the flush in a per CFS queue, and then submit the task only once the flush 
> is completed.
> If flushes complete out of order the queue will still ensure their 
> completions are processed in order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19597) SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction

2024-04-29 Thread Ariel Weisberg (Jira)
Ariel Weisberg created CASSANDRA-19597:
--

 Summary: SystemKeyspace CFS flushing blocked by unrelated keyspace 
flushing/compaction
 Key: CASSANDRA-19597
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19597
 Project: Cassandra
  Issue Type: Bug
Reporter: Ariel Weisberg


There is a single post flush thread and that thread processes tasks in order 
and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, 
and that memtable flush can be blocked by slow IntervalTree building and racing 
with compactors to try and build an interval tree.

Unless there is a requirement for ordering we probably want to loosen this to 
the actual ordering requirement so that problems in one keyspace can’t effect 
another.

SystemKeyspace and Gossip in particular cause lots of weird problems like nodes 
marking each other down because Gossip can’t process nodes being removed 
(blocking flush each time in SystemKeyspace.removeNode)

A very simple fix here might be to queue the post flush task at the same time 
as the flush in a per CFS queue, and then submit the task only once the flush 
is completed.

If flushes complete out of order the queue will still ensure their completions 
are processed in order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19596) IntervalTree build throughput is low enough to be a bottleneck

2024-04-29 Thread Ariel Weisberg (Jira)
Ariel Weisberg created CASSANDRA-19596:
--

 Summary: IntervalTree build throughput is low enough to be a 
bottleneck
 Key: CASSANDRA-19596
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19596
 Project: Cassandra
  Issue Type: Improvement
  Components: Local/Compaction, Local/SSTable
Reporter: Ariel Weisberg


With several terabytes of data and 8 compactors it’s possible for the 
compactors to spend a lot of time blocked waiting on IntervalTrees to be built.

There is also a lot of wasted CPU because it’s updated optimistically so most 
of them end up being thrown away.

This can end up being quite painful because it can block memtable flushing as 
well and then a single slow CFS can block unrelated CFS because the memtable 
post flush executor is single threaded and shared across all CFS. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests

2024-04-16 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837791#comment-17837791
 ] 

Ariel Weisberg commented on CASSANDRA-19551:


TY!

> CCM nodes share the same environment variable map breaking upgrade tests
> 
>
> Key: CASSANDRA-19551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 3.0.31, 3.11.17, 4.0.13, 5.0-beta2, 5.1
>
> Attachments: ci_summary.html
>
>
> In {{node.py}} {{__environment_variables}} is generally always set with a map 
> that is passed in from {{cluster.py}} so it is [shared between 
> nodes|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151]
>  and if nodes modify the map, such as in {{start}} when [updating the Java 
> version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860]
>  then when {{get_env}} runs it will [overwrite the Java 
> version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244]
>  that is selected by {{update_java_version}}.
> This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
> some of the upgrade tests because after the first node upgrades to 4.0 it's 
> not longer possible for the subsequent nodes to select a Java version that 
> isn't 11 because it's overridden by  {{__environment_variables}}.
> I'm not even 100% clear on why the code in {{start}} should update 
> {{__environment_variables}} at all if we calculate the correct java version 
> on every invocation of other tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests

2024-04-16 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19551:
---
Description: 
In {{node.py}} {{__environment_variables}} is generally always set with a map 
that is passed in from {{cluster.py}} so it is [shared between 
nodes|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151]
 and if nodes modify the map, such as in {{start}} when [updating the Java 
version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860]
 then when {{get_env}} runs it will [overwrite the Java 
version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244]
 that is selected by {{update_java_version}}.

This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
some of the upgrade tests because after the first node upgrades to 4.0 it's not 
longer possible for the subsequent nodes to select a Java version that isn't 11 
because it's overridden by  {{__environment_variables}}.

I'm not even 100% clear on why the code in {{start}} should update 
{{__environment_variables}} at all if we calculate the correct java version on 
every invocation of other tools.

  was:
In {{node.py}} {{__environment_variables}} is generally always set with a map 
that is passed in from {{cluster.py}} so it is [shared between 
nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151)
 and if nodes modify the map, such as in {{start}} when [updating the Java 
version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860)
 then when {{get_env}} runs it will [overwrite the Java 
version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244)
 that is selected by {{update_java_version}}.

This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
some of the upgrade tests because after the first node upgrades to 4.0 it's not 
longer possible for the subsequent nodes to select a Java version that isn't 11 
because it's overridden by  {{__environment_variables}}.

I'm not even 100% clear on why the code in {{start}} should update 
{{__environment_variables}} at all if we calculate the correct java version on 
every invocation of other tools.


> CCM nodes share the same environment variable map breaking upgrade tests
> 
>
> Key: CASSANDRA-19551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> In {{node.py}} {{__environment_variables}} is generally always set with a map 
> that is passed in from {{cluster.py}} so it is [shared between 
> nodes|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151]
>  and if nodes modify the map, such as in {{start}} when [updating the Java 
> version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860]
>  then when {{get_env}} runs it will [overwrite the Java 
> version|https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244]
>  that is selected by {{update_java_version}}.
> This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
> some of the upgrade tests because after the first node upgrades to 4.0 it's 
> not longer possible for the subsequent nodes to select a Java version that 
> isn't 11 because it's overridden by  {{__environment_variables}}.
> I'm not even 100% clear on why the code in {{start}} should update 
> {{__environment_variables}} at all if we calculate the correct java version 
> on every invocation of other tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests

2024-04-15 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837447#comment-17837447
 ] 

Ariel Weisberg edited comment on CASSANDRA-19551 at 4/15/24 9:12 PM:
-

Looks like {{TestGossip::test_assassinate_valid_node}} and 
{{TestLargeColumn::test_cleanup}} consistently every time the past 5 runs, but 
{{bootstrap_test.py::TestBootstrap::test_cleanup}} I haven't seen a failure for.


was (Author: aweisberg):
Looks like `TestGossip::test_assassinate_valid_node` and 
`TestLargeColumn::test_cleanup` consistently every time the past 5 runs, but 
`bootstrap_test.py::TestBootstrap::test_cleanup` I haven't seen a failure for.

> CCM nodes share the same environment variable map breaking upgrade tests
> 
>
> Key: CASSANDRA-19551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> In {{node.py}} {{__environment_variables}} is generally always set with a map 
> that is passed in from {{cluster.py}} so it is [shared between 
> nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151)
>  and if nodes modify the map, such as in {{start}} when [updating the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860)
>  then when {{get_env}} runs it will [overwrite the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244)
>  that is selected by {{update_java_version}}.
> This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
> some of the upgrade tests because after the first node upgrades to 4.0 it's 
> not longer possible for the subsequent nodes to select a Java version that 
> isn't 11 because it's overridden by  {{__environment_variables}}.
> I'm not even 100% clear on why the code in {{start}} should update 
> {{__environment_variables}} at all if we calculate the correct java version 
> on every invocation of other tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests

2024-04-15 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837447#comment-17837447
 ] 

Ariel Weisberg commented on CASSANDRA-19551:


Looks like `TestGossip::test_assassinate_valid_node` and 
`TestLargeColumn::test_cleanup` consistently every time the past 5 runs, but 
`bootstrap_test.py::TestBootstrap::test_cleanup` I haven't seen a failure for.

> CCM nodes share the same environment variable map breaking upgrade tests
> 
>
> Key: CASSANDRA-19551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> In {{node.py}} {{__environment_variables}} is generally always set with a map 
> that is passed in from {{cluster.py}} so it is [shared between 
> nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151)
>  and if nodes modify the map, such as in {{start}} when [updating the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860)
>  then when {{get_env}} runs it will [overwrite the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244)
>  that is selected by {{update_java_version}}.
> This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
> some of the upgrade tests because after the first node upgrades to 4.0 it's 
> not longer possible for the subsequent nodes to select a Java version that 
> isn't 11 because it's overridden by  {{__environment_variables}}.
> I'm not even 100% clear on why the code in {{start}} should update 
> {{__environment_variables}} at all if we calculate the correct java version 
> on every invocation of other tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests

2024-04-15 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837444#comment-17837444
 ] 

Ariel Weisberg commented on CASSANDRA-19551:


Attached result of running on trunk with a copy of the environment variables 
for each node.

One failure is an assertion on some values which looks like an unrelated 
problem since the cluster is coming up and working.

Looking into the other failures now. I'll also have baseline nightlies tomorrow.

> CCM nodes share the same environment variable map breaking upgrade tests
> 
>
> Key: CASSANDRA-19551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> In {{node.py}} {{__environment_variables}} is generally always set with a map 
> that is passed in from {{cluster.py}} so it is [shared between 
> nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151)
>  and if nodes modify the map, such as in {{start}} when [updating the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860)
>  then when {{get_env}} runs it will [overwrite the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244)
>  that is selected by {{update_java_version}}.
> This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
> some of the upgrade tests because after the first node upgrades to 4.0 it's 
> not longer possible for the subsequent nodes to select a Java version that 
> isn't 11 because it's overridden by  {{__environment_variables}}.
> I'm not even 100% clear on why the code in {{start}} should update 
> {{__environment_variables}} at all if we calculate the correct java version 
> on every invocation of other tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests

2024-04-15 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19551:
---
Attachment: ci_summary.html

> CCM nodes share the same environment variable map breaking upgrade tests
> 
>
> Key: CASSANDRA-19551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> In {{node.py}} {{__environment_variables}} is generally always set with a map 
> that is passed in from {{cluster.py}} so it is [shared between 
> nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151)
>  and if nodes modify the map, such as in {{start}} when [updating the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860)
>  then when {{get_env}} runs it will [overwrite the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244)
>  that is selected by {{update_java_version}}.
> This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
> some of the upgrade tests because after the first node upgrades to 4.0 it's 
> not longer possible for the subsequent nodes to select a Java version that 
> isn't 11 because it's overridden by  {{__environment_variables}}.
> I'm not even 100% clear on why the code in {{start}} should update 
> {{__environment_variables}} at all if we calculate the correct java version 
> on every invocation of other tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests

2024-04-10 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835866#comment-17835866
 ] 

Ariel Weisberg commented on CASSANDRA-19551:


This doesn't make sense to me 
https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L844
 every time {{start}} is called after an upgrade we revert back to the old 
{{JAVA_HOME}} from before upgrade, and then replace that anyways with 
{{update_java_version}}. Nothing in {{update_java_version}} looks dependent on 
the existing value of {{JAVA_HOME}} in {{env}} and it doesn't have visibility 
to {{__environment_variables}} at all.

> CCM nodes share the same environment variable map breaking upgrade tests
> 
>
> Key: CASSANDRA-19551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
>
> In {{node.py}} {{__environment_variables}} is generally always set with a map 
> that is passed in from {{cluster.py}} so it is [shared between 
> nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151)
>  and if nodes modify the map, such as in {{start}} when [updating the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860)
>  then when {{get_env}} runs it will [overwrite the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244)
>  that is selected by {{update_java_version}}.
> This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
> some of the upgrade tests because after the first node upgrades to 4.0 it's 
> not longer possible for the subsequent nodes to select a Java version that 
> isn't 11 because it's overridden by  {{__environment_variables}}.
> I'm not even 100% clear on why the code in {{start}} should update 
> {{__environment_variables}} at all if we calculate the correct java version 
> on every invocation of other tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests

2024-04-10 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19551:
---
Test and Documentation Plan: Run all python dtests
 Status: Patch Available  (was: Open)

> CCM nodes share the same environment variable map breaking upgrade tests
> 
>
> Key: CASSANDRA-19551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
>
> In {{node.py}} {{__environment_variables}} is generally always set with a map 
> that is passed in from {{cluster.py}} so it is [shared between 
> nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151)
>  and if nodes modify the map, such as in {{start}} when [updating the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860)
>  then when {{get_env}} runs it will [overwrite the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244)
>  that is selected by {{update_java_version}}.
> This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
> some of the upgrade tests because after the first node upgrades to 4.0 it's 
> not longer possible for the subsequent nodes to select a Java version that 
> isn't 11 because it's overridden by  {{__environment_variables}}.
> I'm not even 100% clear on why the code in {{start}} should update 
> {{__environment_variables}} at all if we calculate the correct java version 
> on every invocation of other tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests

2024-04-10 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19551:
---
 Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)
   Complexity: Low Hanging Fruit
Discovered By: DTest
Fix Version/s: 5.x
Reviewers: Joshua McKenzie
 Severity: Normal
 Assignee: Ariel Weisberg
   Status: Open  (was: Triage Needed)

> CCM nodes share the same environment variable map breaking upgrade tests
> 
>
> Key: CASSANDRA-19551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 5.x
>
>
> In {{node.py}} {{__environment_variables}} is generally always set with a map 
> that is passed in from {{cluster.py}} so it is [shared between 
> nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151)
>  and if nodes modify the map, such as in {{start}} when [updating the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860)
>  then when {{get_env}} runs it will [overwrite the Java 
> version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244)
>  that is selected by {{update_java_version}}.
> This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
> some of the upgrade tests because after the first node upgrades to 4.0 it's 
> not longer possible for the subsequent nodes to select a Java version that 
> isn't 11 because it's overridden by  {{__environment_variables}}.
> I'm not even 100% clear on why the code in {{start}} should update 
> {{__environment_variables}} at all if we calculate the correct java version 
> on every invocation of other tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19551) CCM nodes share the same environment variable map breaking upgrade tests

2024-04-10 Thread Ariel Weisberg (Jira)
Ariel Weisberg created CASSANDRA-19551:
--

 Summary: CCM nodes share the same environment variable map 
breaking upgrade tests
 Key: CASSANDRA-19551
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19551
 Project: Cassandra
  Issue Type: Bug
  Components: Test/dtest/python
Reporter: Ariel Weisberg


In {{node.py}} {{__environment_variables}} is generally always set with a map 
that is passed in from {{cluster.py}} so it is [shared between 
nodes](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L151)
 and if nodes modify the map, such as in {{start}} when [updating the Java 
version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L860)
 then when {{get_env}} runs it will [overwrite the Java 
version](https://github.com/riptano/ccm/blob/ac264706c8ca007cc584871ce907d48db334d36d/ccmlib/node.py#L244)
 that is selected by {{update_java_version}}.

This results in {{nodetool drain}} failing when upgrading from 3.11 to 4.0 in 
some of the upgrade tests because after the first node upgrades to 4.0 it's not 
longer possible for the subsequent nodes to select a Java version that isn't 11 
because it's overridden by  {{__environment_variables}}.

I'm not even 100% clear on why the code in {{start}} should update 
{{__environment_variables}} at all if we calculate the correct java version on 
every invocation of other tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19444) AccordRepairJob should be async like CassandraRepairJob

2024-04-01 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832886#comment-17832886
 ] 

Ariel Weisberg commented on CASSANDRA-19444:


Blake will be fixing this in CASSANDRA-19472

> AccordRepairJob should be async like CassandraRepairJob
> ---
>
> Key: CASSANDRA-19444
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19444
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Priority: Normal
>
> The thread that manages repairs needs to be available and not block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19444) AccordRepairJob should be async like CassandraRepairJob

2024-04-01 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19444:
---
Resolution: Fixed
Status: Resolved  (was: Triage Needed)

> AccordRepairJob should be async like CassandraRepairJob
> ---
>
> Key: CASSANDRA-19444
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19444
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ariel Weisberg
>Priority: Normal
>
> The thread that manages repairs needs to be available and not block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19496) Add properties for redirecting build-resolve to mirrors

2024-03-27 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831465#comment-17831465
 ] 

Ariel Weisberg commented on CASSANDRA-19496:


Committed. We would need to release {{3.0.30}} and {{3.11.17}} and point to 
those in {{upgrade_manifest.py}} for this to be helpful. Or at least create 
some tag to use. It would be helpful to stick with the existing format just 
because some things do very kludgy parsing of {{upgrade_manifest.py}}.

> Add properties for redirecting build-resolve to mirrors
> ---
>
> Key: CASSANDRA-19496
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19496
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 3.0.30, 3.11.17
>
>
> When running upgrade tests in CI it's not always possible to reach the public 
> mirrors. Currently we have properties for configuring private mirrors in 4.0+ 
> but we don't have this for 3.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19496) Add properties for redirecting build-resolve to mirrors

2024-03-27 Thread Ariel Weisberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-19496:
---
  Fix Version/s: 3.0.30
 3.11.17
 (was: 3.0.x)
 (was: 3.11.x)
Source Control Link: 
https://github.com/apache/cassandra/commit/56d3efff0c574a7c1ac2ebb6c90d283c1d256ee8
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Add properties for redirecting build-resolve to mirrors
> ---
>
> Key: CASSANDRA-19496
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19496
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 3.0.30, 3.11.17
>
>
> When running upgrade tests in CI it's not always possible to reach the public 
> mirrors. Currently we have properties for configuring private mirrors in 4.0+ 
> but we don't have this for 3.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19496) Add properties for redirecting build-resolve to mirrors

2024-03-27 Thread Ariel Weisberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831405#comment-17831405
 ] 

Ariel Weisberg commented on CASSANDRA-19496:


[Looks like this should change from 4.0.11 to 
4.0.16?|https://github.com/apache/cassandra-dtest/blob/trunk/upgrade_tests/upgrade_manifest.py#L172]

> Add properties for redirecting build-resolve to mirrors
> ---
>
> Key: CASSANDRA-19496
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19496
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> When running upgrade tests in CI it's not always possible to reach the public 
> mirrors. Currently we have properties for configuring private mirrors in 4.0+ 
> but we don't have this for 3.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   3   4   5   6   7   8   9   10   >