[jira] [Updated] (CASSANDRA-19292) Update target Cassandra versions for integration tests, support new 4.0.x and 4.1.x
[ https://issues.apache.org/jira/browse/CASSANDRA-19292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bret McGuire updated CASSANDRA-19292: - Resolution: Fixed Status: Resolved (was: Open) > Update target Cassandra versions for integration tests, support new 4.0.x and > 4.1.x > --- > > Key: CASSANDRA-19292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19292 > Project: Cassandra > Issue Type: Task > Components: Client/java-driver >Reporter: Abe Ratnofsky >Assignee: Abe Ratnofsky >Priority: Normal > Time Spent: 50m > Remaining Estimate: 0h > > Currently, apache/cassandra-java-driver runs against 4.0.0 but not newer > 4.0.x or 4.1.x releases: > https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/api/core/Version.java#L54C1-L55C1 > 4.1 introduces changes to config as well, so there are failures to start CCM > clusters if we do a naive version bump, like: > "org.apache.cassandra.exceptions.ConfigurationException: Config contains both > old and new keys for the same configuration parameters, migrate old -> new: > [enable_user_defined_functions -> user_defined_functions_enabled]" > I have a patch ready for this, working on preparing it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Re: [PR] CASSANDRA-19292 Enable Jenkins to test against Cassandra 4.1.x [cassandra-java-driver]
absurdfarce merged PR #1924: URL: https://github.com/apache/cassandra-java-driver/pull/1924 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-java-driver) branch 4.x updated: CASSANDRA-19292: Enable Jenkins to test against Cassandra 4.1.x
This is an automated email from the ASF dual-hosted git repository. absurdfarce pushed a commit to branch 4.x in repository https://gitbox.apache.org/repos/asf/cassandra-java-driver.git The following commit(s) were added to refs/heads/4.x by this push: new 1492d6ced CASSANDRA-19292: Enable Jenkins to test against Cassandra 4.1.x 1492d6ced is described below commit 1492d6ced9d54bdd68deb043a0bfe232eaa2a8fc Author: absurdfarce AuthorDate: Fri Mar 29 00:46:46 2024 -0500 CASSANDRA-19292: Enable Jenkins to test against Cassandra 4.1.x patch by Bret McGuire; reviewed by Bret McGuire, Alexandre Dutra for CASSANDRA-19292 --- Jenkinsfile| 20 --- .../com/datastax/oss/driver/api/core/Version.java | 1 + .../oss/driver/core/metadata/SchemaIT.java | 13 + .../oss/driver/api/testinfra/ccm/CcmBridge.java| 61 -- 4 files changed, 86 insertions(+), 9 deletions(-) diff --git a/Jenkinsfile b/Jenkinsfile index 8d2b74c5b..0bfa4ca7f 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -256,8 +256,10 @@ pipeline { choices: ['2.1', // Legacy Apache CassandraⓇ '2.2', // Legacy Apache CassandraⓇ '3.0', // Previous Apache CassandraⓇ -'3.11', // Current Apache CassandraⓇ -'4.0', // Development Apache CassandraⓇ +'3.11', // Previous Apache CassandraⓇ +'4.0', // Previous Apache CassandraⓇ +'4.1', // Current Apache CassandraⓇ +'5.0', // Development Apache CassandraⓇ 'dse-4.8.16', // Previous EOSL DataStax Enterprise 'dse-5.0.15', // Long Term Support DataStax Enterprise 'dse-5.1.35', // Legacy DataStax Enterprise @@ -291,7 +293,11 @@ pipeline { 4.0 - Apache Cassandra v4.x (CURRENTLY UNDER DEVELOPMENT) + Apache Cassandra v4.0.x + + + 4.1 + Apache Cassandra v4.1.x dse-4.8.16 @@ -445,7 +451,7 @@ pipeline { axis { name 'SERVER_VERSION' values '3.11', // Latest stable Apache CassandraⓇ - '4.0', // Development Apache CassandraⓇ + '4.1', // Development Apache CassandraⓇ 'dse-6.8.30' // Current DataStax Enterprise } axis { @@ -554,8 +560,10 @@ pipeline { name 'SERVER_VERSION' values '2.1', // Legacy Apache CassandraⓇ '3.0', // Previous Apache CassandraⓇ - '3.11', // Current Apache CassandraⓇ - '4.0', // Development Apache CassandraⓇ + '3.11', // Previous Apache CassandraⓇ + '4.0', // Previous Apache CassandraⓇ + '4.1', // Current Apache CassandraⓇ + '5.0', // Development Apache CassandraⓇ 'dse-4.8.16', // Previous EOSL DataStax Enterprise 'dse-5.0.15', // Last EOSL DataStax Enterprise 'dse-5.1.35', // Legacy DataStax Enterprise diff --git a/core/src/main/java/com/datastax/oss/driver/api/core/Version.java b/core/src/main/java/com/datastax/oss/driver/api/core/Version.java index cc4931fe2..3f12c54fa 100644 --- a/core/src/main/java/com/datastax/oss/driver/api/core/Version.java +++ b/core/src/main/java/com/datastax/oss/driver/api/core/Version.java @@ -52,6 +52,7 @@ public class Version implements Comparable, Serializable { @NonNull public static final Version V2_2_0 = Objects.requireNonNull(parse("2.2.0")); @NonNull public static final Version V3_0_0 = Objects.requireNonNull(parse("3.0.0")); @NonNull public static final Version V4_0_0 = Objects.requireNonNull(parse("4.0.0")); + @NonNull public static final Version V4_1_0 = Objects.requireNonNull(parse("4.1.0")); @NonNull public static final Version V5_0_0 = Objects.requireNonNull(parse("5.0.0")); @NonNull public static final Version V6_7_0 = Objects.requireNonNull(parse("6.7.0")); @NonNull public static final Version V6_8_0 = Objects.requireNonNull(parse("6.8.0")); diff --git a/integration-tests/src/test/java/com/datastax/oss/driver/core/metadata/SchemaIT.java b/integration-tests/src/test/java/com/datastax/oss/driver/core/metadata/SchemaIT.java index caa96a647..6495b451d 100644 --- a/integration-tests/src/test/java/com/datastax/oss/driver/core/metadata/SchemaIT.java +++ b/integration-tests/src/test/java/com/datastax/oss/driver/core/metadata/SchemaIT.java @@ -265,6 +265,19 @@ public class SchemaIT { + "total
[jira] [Commented] (CASSANDRA-19580) Unable to contact any seeds with node in hibernate status
[ https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840266#comment-17840266 ] Cameron Zemek commented on CASSANDRA-19580: --- > If you have internode_compression=dc then replacement with the same IP will > not work, you need to use a different IP because the compression has already > been negotiated on the other nodes. Not to get too off topic to the issue at hand but I am able todo replacement with same IP with internode compression enabled. So what doesn't work about this? > Unable to contact any seeds with node in hibernate status > - > > Key: CASSANDRA-19580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19580 > Project: Cassandra > Issue Type: Bug >Reporter: Cameron Zemek >Priority: Normal > > We have customer running into the error 'Unable to contact any seeds!' . I > have been able to reproduce this issue if I kill Cassandra as its joining > which will put the node into hibernate status. Once a node is in hibernate it > will no longer receive any SYN messages from other nodes during startup and > as it sends only itself as digest in outbound SYN messages it never receives > any states in any of the ACK replies. So once it gets to the check > `seenAnySeed` in it fails as the endpointStateMap is empty. > > A workaround is copying the system.peers table from other node but this is > less than ideal. I tested modifying maybeGossipToSeed as follows: > {code:java} > /* Possibly gossip to a seed for facilitating partition healing */ > private void maybeGossipToSeed(MessageOut prod) > { > int size = seeds.size(); > if (size > 0) > { > if (size == 1 && > seeds.contains(FBUtilities.getBroadcastAddress())) > { > return; > } > if (liveEndpoints.size() == 0) > { > List gDigests = prod.payload.gDigests; > if (gDigests.size() == 1 && > gDigests.get(0).endpoint.equals(FBUtilities.getBroadcastAddress())) > { > gDigests = new ArrayList(); > GossipDigestSyn digestSynMessage = new > GossipDigestSyn(DatabaseDescriptor.getClusterName(), > > DatabaseDescriptor.getPartitionerName(), > > gDigests); > MessageOut message = new > MessageOut(MessagingService.Verb.GOSSIP_DIGEST_SYN, > > digestSynMessage, > > GossipDigestSyn.serializer); > sendGossip(message, seeds); > } > else > { > sendGossip(prod, seeds); > } > } > else > { > /* Gossip with the seed with some probability. */ > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > } > } > } > {code} > Only problem is this is the same as SYN from shadow round. It does resolve > the issue however as then receive an ACK with all the states. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19580) Unable to contact any seeds with node in hibernate status
[ https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840267#comment-17840267 ] Brandon Williams commented on CASSANDRA-19580: -- Set compression to all so there are no special cases and test again. > Unable to contact any seeds with node in hibernate status > - > > Key: CASSANDRA-19580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19580 > Project: Cassandra > Issue Type: Bug >Reporter: Cameron Zemek >Priority: Normal > > We have customer running into the error 'Unable to contact any seeds!' . I > have been able to reproduce this issue if I kill Cassandra as its joining > which will put the node into hibernate status. Once a node is in hibernate it > will no longer receive any SYN messages from other nodes during startup and > as it sends only itself as digest in outbound SYN messages it never receives > any states in any of the ACK replies. So once it gets to the check > `seenAnySeed` in it fails as the endpointStateMap is empty. > > A workaround is copying the system.peers table from other node but this is > less than ideal. I tested modifying maybeGossipToSeed as follows: > {code:java} > /* Possibly gossip to a seed for facilitating partition healing */ > private void maybeGossipToSeed(MessageOut prod) > { > int size = seeds.size(); > if (size > 0) > { > if (size == 1 && > seeds.contains(FBUtilities.getBroadcastAddress())) > { > return; > } > if (liveEndpoints.size() == 0) > { > List gDigests = prod.payload.gDigests; > if (gDigests.size() == 1 && > gDigests.get(0).endpoint.equals(FBUtilities.getBroadcastAddress())) > { > gDigests = new ArrayList(); > GossipDigestSyn digestSynMessage = new > GossipDigestSyn(DatabaseDescriptor.getClusterName(), > > DatabaseDescriptor.getPartitionerName(), > > gDigests); > MessageOut message = new > MessageOut(MessagingService.Verb.GOSSIP_DIGEST_SYN, > > digestSynMessage, > > GossipDigestSyn.serializer); > sendGossip(message, seeds); > } > else > { > sendGossip(prod, seeds); > } > } > else > { > /* Gossip with the seed with some probability. */ > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > } > } > } > {code} > Only problem is this is the same as SYN from shadow round. It does resolve > the issue however as then receive an ACK with all the states. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19585) syntax formatting on CQL doc is garbled
[ https://issues.apache.org/jira/browse/CASSANDRA-19585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19585: - Bug Category: Parent values: Correctness(12982) Complexity: Normal Discovered By: User Report Severity: Normal Status: Open (was: Triage Needed) > syntax formatting on CQL doc is garbled > --- > > Key: CASSANDRA-19585 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19585 > Project: Cassandra > Issue Type: Bug > Components: Documentation/Website >Reporter: Jon Haddad >Priority: Normal > Attachments: image-2024-04-23-17-37-54-438.png > > > It looks like the build process for the 4.1 docs isn't correctly processed. > Screenshot attached. > https://cassandra.apache.org/doc/4.1/cassandra/cql/cql_singlefile.html#alterTableStmt > !image-2024-04-23-17-37-54-438.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19585) syntax formatting on CQL doc is garbled
Jon Haddad created CASSANDRA-19585: -- Summary: syntax formatting on CQL doc is garbled Key: CASSANDRA-19585 URL: https://issues.apache.org/jira/browse/CASSANDRA-19585 Project: Cassandra Issue Type: Bug Components: Documentation/Website Reporter: Jon Haddad Attachments: image-2024-04-23-17-37-54-438.png It looks like the build process for the 4.1 docs isn't correctly processed. Screenshot attached. https://cassandra.apache.org/doc/4.1/cassandra/cql/cql_singlefile.html#alterTableStmt !image-2024-04-23-17-37-54-438.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19584) Glossary labeled as DataStax glossary
Jon Haddad created CASSANDRA-19584: -- Summary: Glossary labeled as DataStax glossary Key: CASSANDRA-19584 URL: https://issues.apache.org/jira/browse/CASSANDRA-19584 Project: Cassandra Issue Type: Bug Components: Documentation/Website Reporter: Jon Haddad Should be Cassandra glossary https://cassandra.apache.org/_/glossary.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19583) setting compaction throughput to 0 throws a startup error
[ https://issues.apache.org/jira/browse/CASSANDRA-19583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840253#comment-17840253 ] Jon Haddad commented on CASSANDRA-19583: I found it testing 5.0, but I'm guessing it's present since whenever we updated the config. This is with the new compaction_throughput setting, not compaction_throughput_in_mb. > setting compaction throughput to 0 throws a startup error > - > > Key: CASSANDRA-19583 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19583 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Jon Haddad >Priority: Normal > > The inline docs say: > {noformat} > Setting this to 0 disables throttling. > {noformat} > However, on startup, we throw this error: > {noformat} > Caused by: java.lang.IllegalArgumentException: Invalid data rate: 0 Accepted > units: MiB/s, KiB/s, B/s where case matters and only non-negative values a> > Apr 23 23:12:01 cassandra0 cassandra[3424]: at > org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:52) > Apr 23 23:12:01 cassandra0 cassandra[3424]: at > org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:61) > Apr 23 23:12:01 cassandra0 cassandra[3424]: at > org.apache.cassandra.config.DataRateSpec$LongBytesPerSecondBound.(DataRateSpec.java:232) > Apr 23 23:12:01 cassandra0 cassandra[3424]: ... 27 common frames > omitted > {noformat} > We should allow 0 as per the inline doc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19583) setting compaction throughput to 0 throws a startup error
[ https://issues.apache.org/jira/browse/CASSANDRA-19583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840251#comment-17840251 ] Brandon Williams commented on CASSANDRA-19583: -- Which version was this? > setting compaction throughput to 0 throws a startup error > - > > Key: CASSANDRA-19583 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19583 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Jon Haddad >Priority: Normal > > The inline docs say: > {noformat} > Setting this to 0 disables throttling. > {noformat} > However, on startup, we throw this error: > {noformat} > Caused by: java.lang.IllegalArgumentException: Invalid data rate: 0 Accepted > units: MiB/s, KiB/s, B/s where case matters and only non-negative values a> > Apr 23 23:12:01 cassandra0 cassandra[3424]: at > org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:52) > Apr 23 23:12:01 cassandra0 cassandra[3424]: at > org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:61) > Apr 23 23:12:01 cassandra0 cassandra[3424]: at > org.apache.cassandra.config.DataRateSpec$LongBytesPerSecondBound.(DataRateSpec.java:232) > Apr 23 23:12:01 cassandra0 cassandra[3424]: ... 27 common frames > omitted > {noformat} > We should allow 0 as per the inline doc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19583) setting compaction throughput to 0 throws a startup error
[ https://issues.apache.org/jira/browse/CASSANDRA-19583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Haddad updated CASSANDRA-19583: --- Description: The inline docs say: {noformat} Setting this to 0 disables throttling. {noformat} However, on startup, we throw this error: {noformat} Caused by: java.lang.IllegalArgumentException: Invalid data rate: 0 Accepted units: MiB/s, KiB/s, B/s where case matters and only non-negative values a> Apr 23 23:12:01 cassandra0 cassandra[3424]: at org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:52) Apr 23 23:12:01 cassandra0 cassandra[3424]: at org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:61) Apr 23 23:12:01 cassandra0 cassandra[3424]: at org.apache.cassandra.config.DataRateSpec$LongBytesPerSecondBound.(DataRateSpec.java:232) Apr 23 23:12:01 cassandra0 cassandra[3424]: ... 27 common frames omitted {noformat} We should allow 0 as per the inline doc. was: The inline docs say: {noformat} Setting this to 0 disables throttling. {noformat} However, on startup, we throw this error: {noformat} Caused by: java.lang.IllegalArgumentException: Invalid data rate: 0 Accepted units: MiB/s, KiB/s, B/s where case matters and only non-negative values a> Apr 23 23:12:01 cassandra0 cassandra[3424]: at org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:52) Apr 23 23:12:01 cassandra0 cassandra[3424]: at org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:61) Apr 23 23:12:01 cassandra0 cassandra[3424]: at org.apache.cassandra.config.DataRateSpec$LongBytesPerSecondBound.(DataRateSpec.java:232) Apr 23 23:12:01 cassandra0 cassandra[3424]: ... 27 common frames omitted {noformat} > setting compaction throughput to 0 throws a startup error > - > > Key: CASSANDRA-19583 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19583 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Jon Haddad >Priority: Normal > > The inline docs say: > {noformat} > Setting this to 0 disables throttling. > {noformat} > However, on startup, we throw this error: > {noformat} > Caused by: java.lang.IllegalArgumentException: Invalid data rate: 0 Accepted > units: MiB/s, KiB/s, B/s where case matters and only non-negative values a> > Apr 23 23:12:01 cassandra0 cassandra[3424]: at > org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:52) > Apr 23 23:12:01 cassandra0 cassandra[3424]: at > org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:61) > Apr 23 23:12:01 cassandra0 cassandra[3424]: at > org.apache.cassandra.config.DataRateSpec$LongBytesPerSecondBound.(DataRateSpec.java:232) > Apr 23 23:12:01 cassandra0 cassandra[3424]: ... 27 common frames > omitted > {noformat} > We should allow 0 as per the inline doc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19583) setting compaction throughput to 0 throws a startup error
Jon Haddad created CASSANDRA-19583: -- Summary: setting compaction throughput to 0 throws a startup error Key: CASSANDRA-19583 URL: https://issues.apache.org/jira/browse/CASSANDRA-19583 Project: Cassandra Issue Type: Bug Components: Local/Config Reporter: Jon Haddad The inline docs say: {noformat} Setting this to 0 disables throttling. {noformat} However, on startup, we throw this error: {noformat} Caused by: java.lang.IllegalArgumentException: Invalid data rate: 0 Accepted units: MiB/s, KiB/s, B/s where case matters and only non-negative values a> Apr 23 23:12:01 cassandra0 cassandra[3424]: at org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:52) Apr 23 23:12:01 cassandra0 cassandra[3424]: at org.apache.cassandra.config.DataRateSpec.(DataRateSpec.java:61) Apr 23 23:12:01 cassandra0 cassandra[3424]: at org.apache.cassandra.config.DataRateSpec$LongBytesPerSecondBound.(DataRateSpec.java:232) Apr 23 23:12:01 cassandra0 cassandra[3424]: ... 27 common frames omitted {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19580) Unable to contact any seeds with node in hibernate status
[ https://issues.apache.org/jira/browse/CASSANDRA-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840241#comment-17840241 ] Cameron Zemek commented on CASSANDRA-19580: --- Yeah so what breaks if use same state as when replacing with different address? I looked through CASSANDRA-8523 and didn't understand what different about replacing when reusing the same IP address. Why isn't the node in UJ state when doing replacements, that is receiving writes but not reads. What do you think would be the correct fix here? Is sending an empty SYN like shadow round okay? Why does examineGossiper not send back states for missing digests (it only compares for the digests in the SYN)? Considering that SYN messages are sent randomly, it seems like could also end up with this 'Unable to contact any seeds!' path if none of the nodes randomly pick the replacement node to send a SYN to. > Unable to contact any seeds with node in hibernate status > - > > Key: CASSANDRA-19580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19580 > Project: Cassandra > Issue Type: Bug >Reporter: Cameron Zemek >Priority: Normal > > We have customer running into the error 'Unable to contact any seeds!' . I > have been able to reproduce this issue if I kill Cassandra as its joining > which will put the node into hibernate status. Once a node is in hibernate it > will no longer receive any SYN messages from other nodes during startup and > as it sends only itself as digest in outbound SYN messages it never receives > any states in any of the ACK replies. So once it gets to the check > `seenAnySeed` in it fails as the endpointStateMap is empty. > > A workaround is copying the system.peers table from other node but this is > less than ideal. I tested modifying maybeGossipToSeed as follows: > {code:java} > /* Possibly gossip to a seed for facilitating partition healing */ > private void maybeGossipToSeed(MessageOut prod) > { > int size = seeds.size(); > if (size > 0) > { > if (size == 1 && > seeds.contains(FBUtilities.getBroadcastAddress())) > { > return; > } > if (liveEndpoints.size() == 0) > { > List gDigests = prod.payload.gDigests; > if (gDigests.size() == 1 && > gDigests.get(0).endpoint.equals(FBUtilities.getBroadcastAddress())) > { > gDigests = new ArrayList(); > GossipDigestSyn digestSynMessage = new > GossipDigestSyn(DatabaseDescriptor.getClusterName(), > > DatabaseDescriptor.getPartitionerName(), > > gDigests); > MessageOut message = new > MessageOut(MessagingService.Verb.GOSSIP_DIGEST_SYN, > > digestSynMessage, > > GossipDigestSyn.serializer); > sendGossip(message, seeds); > } > else > { > sendGossip(prod, seeds); > } > } > else > { > /* Gossip with the seed with some probability. */ > double probability = seeds.size() / (double) > (liveEndpoints.size() + unreachableEndpoints.size()); > double randDbl = random.nextDouble(); > if (randDbl <= probability) > sendGossip(prod, seeds); > } > } > } > {code} > Only problem is this is the same as SYN from shadow round. It does resolve > the issue however as then receive an ACK with all the states. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures
[ https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840228#comment-17840228 ] Brandon Williams commented on CASSANDRA-15439: -- With the failed bootstrap timeout separated out, we could take this opportunity to also increase it to give users some protection from the scenario you ran into by default, and also aid resumable bootstrap. WDYT? /cc [~paulo] > Token metadata for bootstrapping nodes is lost under temporary failures > --- > > Key: CASSANDRA-15439 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15439 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Josh Snyder >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the > bootstrapping node after RING_DELAY, since it will evicted from the TMD > pending ranges. Should we create a ticket to address this?" > CASSANDRA-15264 relates to the most likely cause of such situations, where > the Cassandra daemon on the bootstrapping node completely crashes. Based on > testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it > also is possible to remove token metadata (and thus pending ranges, and thus > hints) for a bootstrapping node, simply by affecting its status in the > failure detector. > A node in the cluster sees the bootstrapping node this way: > {noformat} > INFO [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node > /PUBLIC-IP is now part of the cluster > INFO [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - > InetAddress /PUBLIC-IP is now UP > INFO [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 > OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP > INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 > StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 > ID#0] Creating new streaming plan for Bootstrap > INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 > StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, > ID#0] Received streaming plan for Bootstrap > INFO [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 > StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, > ID#0] Received streaming plan for Bootstrap > INFO [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 > StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 > ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 > files(139744616815 bytes) > INFO [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - > InetAddress /PUBLIC-IP is now DOWN > INFO [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient > /PUBLIC-IP has been silent for 3ms, removing from gossip > {noformat} > Since the bootstrapping node has no tokens, it is treated like a fat client, > and it is removed from the ring. For correctness purposes, I believe we must > keep storing hints for the downed bootstrapping node until it is either > assassinated or until a replacement attempts to bootstrap for the same token. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures
[ https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15439: - Fix Version/s: (was: 3.0.x) (was: 3.11.x) (was: 5.x) > Token metadata for bootstrapping nodes is lost under temporary failures > --- > > Key: CASSANDRA-15439 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15439 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Josh Snyder >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x > > Time Spent: 20m > Remaining Estimate: 0h > > In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the > bootstrapping node after RING_DELAY, since it will evicted from the TMD > pending ranges. Should we create a ticket to address this?" > CASSANDRA-15264 relates to the most likely cause of such situations, where > the Cassandra daemon on the bootstrapping node completely crashes. Based on > testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it > also is possible to remove token metadata (and thus pending ranges, and thus > hints) for a bootstrapping node, simply by affecting its status in the > failure detector. > A node in the cluster sees the bootstrapping node this way: > {noformat} > INFO [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node > /PUBLIC-IP is now part of the cluster > INFO [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - > InetAddress /PUBLIC-IP is now UP > INFO [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 > OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP > INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 > StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 > ID#0] Creating new streaming plan for Bootstrap > INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 > StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, > ID#0] Received streaming plan for Bootstrap > INFO [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 > StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, > ID#0] Received streaming plan for Bootstrap > INFO [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 > StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 > ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 > files(139744616815 bytes) > INFO [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - > InetAddress /PUBLIC-IP is now DOWN > INFO [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient > /PUBLIC-IP has been silent for 3ms, removing from gossip > {noformat} > Since the bootstrapping node has no tokens, it is treated like a fat client, > and it is removed from the ring. For correctness purposes, I believe we must > keep storing hints for the downed bootstrapping node until it is either > assassinated or until a replacement attempts to bootstrap for the same token. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness
[ https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840220#comment-17840220 ] Stefan Miklosovic edited comment on CASSANDRA-19572 at 4/23/24 8:28 PM: OK so more digging ... I was trying to put into each afterTest "SSTableReader.resetTidying();" and it did help, below is each job with 5k repetitions. 4.0 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4223/workflows/a82d0483-a0df-44ed-8127-088b303c78ba/jobs/225432/steps 4.1 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4224/workflows/eae7a5e2-89dd-46cd-aaca-1e4250d0fa8b/jobs/225531/steps 5.0 j11 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225728/steps 5.0 j17 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225727/steps However, I just noticed that there is already afterTest in CQLTester which ImportTest extends and I was _not_ calling it (super.afterTest()) in my afterTest. What CQLTester's afterTest does is this (1). It removes the tables and it deletes all SSTables on the disk, so I guess it also calls tidying, just by other means, but that whole operation runs in ScheduledExecutors.optionalTasks which is asynchronous. So, what happens, when we run a test method, then afterTest is invoked and removal is done asynchronously? Then JUnit does not wait until it is finished, right? I think that this work then might leak beyond the scope of afterTest and a new test is run etc ... I feel uneasy about this and that is probably the real cause of the issues we see when it comes to these refs. What I am doing right now is that I am tidying it up before calling super.afterTest and I run multiplex on 4.0 again. If it fails, I guess the next step will be to run the logic in afterTest synchronously. (1) https://github.com/apache/cassandra/blob/cassandra-4.1/test/unit/org/apache/cassandra/cql3/CQLTester.java#L417-L433 was (Author: smiklosovic): OK so more digging ... I was trying to put into each afterTest "SSTableReader.resetTidying();" and it did help, below is each job with 5k repetitions. 4.0 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4223/workflows/a82d0483-a0df-44ed-8127-088b303c78ba/jobs/225432/steps 4.1 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4224/workflows/eae7a5e2-89dd-46cd-aaca-1e4250d0fa8b/jobs/225531/steps 5.0 j11 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225728/steps 5.0 j17 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225727/steps However, I just noticed that there is already afterTest in CQLTester which ImportTest extends and I was _not_ calling it (super.afterTest()) in my afterTest. What CQLTester's afterTest does is this (1). It removes the tables and it deletes all SSTables on the disk, so I guess it also calls tidying, just by other means, but that whole operation runs in ScheduledExecutors.optionalTasks which is asynchronous. So, what happens, when we run a test method, then afterTest is invoked and removal is done asynchronously? Then JUnit does not wait until is is finished, right? I think that this work then might leak beyond the scope of afterTest and a new test is run etc ... I feel uneasy about this and that is probably the real cause of the issues we see when it comes to these refs. What I am doing right now is that I am tidying it up before calling super.afterTest and I run multiplex on 4.0 again. If it fails, I guess the next step will be to run the logic in afterTest synchronously. (1) https://github.com/apache/cassandra/blob/cassandra-4.1/test/unit/org/apache/cassandra/cql3/CQLTester.java#L417-L433 > Test failure: org.apache.cassandra.db.ImportTest flakiness > -- > > Key: CASSANDRA-19572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19572 > Project: Cassandra > Issue Type: Bug > Components: Tool/bulk load >Reporter: Brandon Williams >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > As discovered on CASSANDRA-19401, the tests in this class are flaky, at least > the following: > * testImportCorruptWithoutValidationWithCopying > * testImportInvalidateCache > * testImportCorruptWithCopying > * testImportCacheEnabledWithoutSrcDir > [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness
[ https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840220#comment-17840220 ] Stefan Miklosovic edited comment on CASSANDRA-19572 at 4/23/24 8:27 PM: OK so more digging ... I was trying to put into each afterTest "SSTableReader.resetTidying();" and it did help, below is each job with 5k repetitions. 4.0 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4223/workflows/a82d0483-a0df-44ed-8127-088b303c78ba/jobs/225432/steps 4.1 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4224/workflows/eae7a5e2-89dd-46cd-aaca-1e4250d0fa8b/jobs/225531/steps 5.0 j11 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225728/steps 5.0 j17 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225727/steps However, I just noticed that there is already afterTest in CQLTester which ImportTest extends and I was _not_ calling it (super.afterTest()) in my afterTest. What CQLTester's afterTest does is this (1). It removes the tables and it deletes all SSTables on the disk, so I guess it also calls tidying, just by other means, but that whole operation runs in ScheduledExecutors.optionalTasks which is asynchronous. So, what happens, when we run a test method, then afterTest is invoked and removal is done asynchronously? Then JUnit does not wait until is is finished, right? I think that this work then might leak beyond the scope of afterTest and a new test is run etc ... I feel uneasy about this and that is probably the real cause of the issues we see when it comes to these refs. What I am doing right now is that I am tidying it up before calling super.afterTest and I run multiplex on 4.0 again. If it fails, I guess the next step will be to run the logic in afterTest synchronously. (1) https://github.com/apache/cassandra/blob/cassandra-4.1/test/unit/org/apache/cassandra/cql3/CQLTester.java#L417-L433 was (Author: smiklosovic): OK so more digging ... I was trying to put into each afterTest "SSTableReader.resetTidying();" and it did help, below is each job with 5k repetitions. 4.0 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4223/workflows/a82d0483-a0df-44ed-8127-088b303c78ba/jobs/225432/steps 4.1 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4224/workflows/eae7a5e2-89dd-46cd-aaca-1e4250d0fa8b/jobs/225531/steps 5.0 j11 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225728/steps 5.0 j17 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225727/steps However, I just noticed that there is already afterTest in CQLTester which ImportTest extends and I was _not_ calling it (super.afterTest()) in my afterTest. What CQLTester's afterTest does is this (1). It removes the tables and it deletes all SSTables on the disk, so I guess it also calls tidying, just by other means, but that whole operation runs in ScheduledExecutors.optionalTasks which is asynchronous. So, what happens, when we run a test method, then afterTest is invoked and removal is done asynchronously? Then JUnit does not wait until is is finished, right? I think that this work then might leak beyond the scope of afterTest and a new test is run etc ... I fee uneasy about this and that is probably the real cause of the issues we see when it comes to these refs. What I am doing right now is that I am tidying it up before calling super.afterTest and I run multiplex on 4.0 again. If it fails, I guess the next step will be to run the logic in afterTest synchronously. (1) https://github.com/apache/cassandra/blob/cassandra-4.1/test/unit/org/apache/cassandra/cql3/CQLTester.java#L417-L433 > Test failure: org.apache.cassandra.db.ImportTest flakiness > -- > > Key: CASSANDRA-19572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19572 > Project: Cassandra > Issue Type: Bug > Components: Tool/bulk load >Reporter: Brandon Williams >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > As discovered on CASSANDRA-19401, the tests in this class are flaky, at least > the following: > * testImportCorruptWithoutValidationWithCopying > * testImportInvalidateCache > * testImportCorruptWithCopying > * testImportCacheEnabledWithoutSrcDir > [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness
[ https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840220#comment-17840220 ] Stefan Miklosovic edited comment on CASSANDRA-19572 at 4/23/24 8:27 PM: OK so more digging ... I was trying to put into each afterTest "SSTableReader.resetTidying();" and it did help, below is each job with 5k repetitions. 4.0 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4223/workflows/a82d0483-a0df-44ed-8127-088b303c78ba/jobs/225432/steps 4.1 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4224/workflows/eae7a5e2-89dd-46cd-aaca-1e4250d0fa8b/jobs/225531/steps 5.0 j11 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225728/steps 5.0 j17 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225727/steps However, I just noticed that there is already afterTest in CQLTester which ImportTest extends and I was _not_ calling it (super.afterTest()) in my afterTest. What CQLTester's afterTest does is this (1). It removes the tables and it deletes all SSTables on the disk, so I guess it also calls tidying, just by other means, but that whole operation runs in ScheduledExecutors.optionalTasks which is asynchronous. So, what happens, when we run a test method, then afterTest is invoked and removal is done asynchronously? Then JUnit does not wait until is is finished, right? I think that this work then might leak beyond the scope of afterTest and a new test is run etc ... I fee uneasy about this and that is probably the real cause of the issues we see when it comes to these refs. What I am doing right now is that I am tidying it up before calling super.afterTest and I run multiplex on 4.0 again. If it fails, I guess the next step will be to run the logic in afterTest synchronously. (1) https://github.com/apache/cassandra/blob/cassandra-4.1/test/unit/org/apache/cassandra/cql3/CQLTester.java#L417-L433 was (Author: smiklosovic): OK so more digging ... I was trying to put into each afterTest "SSTableReader.resetTidying();" and it did help, below is each job with 5k repetitions. 4.0 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4223/workflows/a82d0483-a0df-44ed-8127-088b303c78ba/jobs/225432/steps 4.1 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4224/workflows/eae7a5e2-89dd-46cd-aaca-1e4250d0fa8b/jobs/225531/steps 5.0 j11 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225728/steps 5.0 j17 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225727/steps However, I just noticed that there is already afterTest in CQLTester which ImportTest extends and I was _not_ calling it (super.afterTest()) in my afterTest. What CQLTester's afterTest does is this (1). It removes the tables and it deletes all SSTables on the disk, so I guess it also calls tidying, just by other means, but that whole operation runs in ScheduledExecutors.optionalTasks which is asynchronous. So, what happens, when we run a test method, then afterTest is invoked and removal is done asynchronously? Then JUnit does not wait until is is finished, right? I think that this work then might leaks beyond the scope of afterTest and a new test is run etc ... I fee uneasy about this and that is probably the real cause of the issues we see when it comes to these refs. What I am doing right now is that I am tidying it up before calling super.afterTest and I run multiplex on 4.0 again. If it fails, I guess the next step will be to run the logic in afterTest synchronously. (1) https://github.com/apache/cassandra/blob/cassandra-4.1/test/unit/org/apache/cassandra/cql3/CQLTester.java#L417-L433 > Test failure: org.apache.cassandra.db.ImportTest flakiness > -- > > Key: CASSANDRA-19572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19572 > Project: Cassandra > Issue Type: Bug > Components: Tool/bulk load >Reporter: Brandon Williams >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > As discovered on CASSANDRA-19401, the tests in this class are flaky, at least > the following: > * testImportCorruptWithoutValidationWithCopying > * testImportInvalidateCache > * testImportCorruptWithCopying > * testImportCacheEnabledWithoutSrcDir > [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness
[ https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840220#comment-17840220 ] Stefan Miklosovic commented on CASSANDRA-19572: --- OK so more digging ... I was trying to put into each afterTest "SSTableReader.resetTidying();" and it did help, below is each job with 5k repetitions. 4.0 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4223/workflows/a82d0483-a0df-44ed-8127-088b303c78ba/jobs/225432/steps 4.1 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4224/workflows/eae7a5e2-89dd-46cd-aaca-1e4250d0fa8b/jobs/225531/steps 5.0 j11 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225728/steps 5.0 j17 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225727/steps However, I just noticed that there is already afterTest in CQLTester which ImportTest extends and I was _not_ calling it (super.afterTest()) in my afterTest. What CQLTester's afterTest does is this (1). It removes the tables and it deletes all SSTables on the disk, so I guess it also calls tidying, just by other means, but that whole operation runs in ScheduledExecutors.optionalTasks which is asynchronous. So, what happens, when we run a test method, then afterTest is invoked and removal is done asynchronously? Then JUnit does not wait until is is finished, right? I think that this work then might leaks beyond the scope of afterTest and a new test is run etc ... I fee uneasy about this and that is probably the real cause of the issues we see when it comes to these refs. What I am doing right now is that I am tidying it up before calling super.afterTest and I run multiplex on 4.0 again. If it fails, I guess the next step will be to run the logic in afterTest synchronously. (1) https://github.com/apache/cassandra/blob/cassandra-4.1/test/unit/org/apache/cassandra/cql3/CQLTester.java#L417-L433 > Test failure: org.apache.cassandra.db.ImportTest flakiness > -- > > Key: CASSANDRA-19572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19572 > Project: Cassandra > Issue Type: Bug > Components: Tool/bulk load >Reporter: Brandon Williams >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > As discovered on CASSANDRA-19401, the tests in this class are flaky, at least > the following: > * testImportCorruptWithoutValidationWithCopying > * testImportInvalidateCache > * testImportCorruptWithCopying > * testImportCacheEnabledWithoutSrcDir > [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18130) Log hardware and container params during test runs to help troubleshoot intermittent failures
[ https://issues.apache.org/jira/browse/CASSANDRA-18130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-18130: --- Fix Version/s: 2.2.20 3.0.31 3.11.18 4.0.13 4.1.5 5.0-beta2 5.0 5.1 (was: 5.x) (was: 4.0.x) (was: 4.1.x) > Log hardware and container params during test runs to help troubleshoot > intermittent failures > - > > Key: CASSANDRA-18130 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18130 > Project: Cassandra > Issue Type: Task > Components: Test/dtest/java, Test/dtest/python, Test/unit >Reporter: Josh McKenzie >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 2.2.20, 3.0.31, 3.11.18, 4.0.13, 4.1.5, 5.0-beta2, 5.0, > 5.1 > > > {color:#00}We’ve long had flakiness in our containerized ASF CI > environment that we don’t see in circleci. The environment itself is both > containerized and heterogenous, so there are differences in both the hardware > environment and the software environment in which it executes. For reference, > see: > [https://github.com/apache/cassandra-builds/blob/trunk/ASF-jenkins-agents.md#current-agents]{color} > {color:#00} {color} > {color:#00}We should log a variety of hardware, container, and software > environment details to help get to the bottom of where some test failures may > be occurring. As we don’t have shell access to the machines it’ll be easier > to have this information logged / retrieved during test runs than to try and > profile each host independently.{color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18130) Log hardware and container params during test runs to help troubleshoot intermittent failures
[ https://issues.apache.org/jira/browse/CASSANDRA-18130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-18130: --- Resolution: Fixed Status: Resolved (was: Open) Committed as https://github.com/apache/cassandra-builds/commit/eab310bd76329be5d47c7a8c4e8837bbb3e2fff0 as part of CASSANDRA-19558 > Log hardware and container params during test runs to help troubleshoot > intermittent failures > - > > Key: CASSANDRA-18130 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18130 > Project: Cassandra > Issue Type: Task > Components: Test/dtest/java, Test/dtest/python, Test/unit >Reporter: Josh McKenzie >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.x > > > {color:#00}We’ve long had flakiness in our containerized ASF CI > environment that we don’t see in circleci. The environment itself is both > containerized and heterogenous, so there are differences in both the hardware > environment and the software environment in which it executes. For reference, > see: > [https://github.com/apache/cassandra-builds/blob/trunk/ASF-jenkins-agents.md#current-agents]{color} > {color:#00} {color} > {color:#00}We should log a variety of hardware, container, and software > environment details to help get to the bottom of where some test failures may > be occurring. As we don’t have shell access to the machines it’ll be easier > to have this information logged / retrieved during test runs than to try and > profile each host independently.{color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19582) [Analytics] Consume new Sidecar client API to stream SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-19582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francisco Guerrero updated CASSANDRA-19582: --- Fix Version/s: NA Source Control Link: https://github.com/apache/cassandra-analytics/commit/86420f9d52991fb148b322031df55494669532d3 Resolution: Fixed Status: Resolved (was: Ready to Commit) > [Analytics] Consume new Sidecar client API to stream SSTables > - > > Key: CASSANDRA-19582 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19582 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > Fix For: NA > > Time Spent: 20m > Remaining Estimate: 0h > > A new client API was recently introduced in Sidecar to stream SSTables. > Cassandra Analytics needs to start consuming the new API in order to take > advantage of the fixes when streaming SSTables from a Cassandra installation > with more than one data directory. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-analytics) branch trunk updated: CASSANDRA-19582: Consume new Sidecar client API to stream SSTables (#54)
This is an automated email from the ASF dual-hosted git repository. frankgh pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra-analytics.git The following commit(s) were added to refs/heads/trunk by this push: new 86420f9 CASSANDRA-19582: Consume new Sidecar client API to stream SSTables (#54) 86420f9 is described below commit 86420f9d52991fb148b322031df55494669532d3 Author: Francisco Guerrero AuthorDate: Tue Apr 23 12:51:53 2024 -0700 CASSANDRA-19582: Consume new Sidecar client API to stream SSTables (#54) Patch by Francisco Guerrero; Reviewed by Yifan Cai for CASSANDRA-19582 --- .../cassandra/spark/bulkwriter/BulkSparkConf.java | 7 +++-- .../spark/data/SidecarProvisionedSSTable.java | 32 +- .../spark/data/SidecarProvisionedSSTableTest.java | 1 + scripts/build-sidecar.sh | 2 +- 4 files changed, 20 insertions(+), 22 deletions(-) diff --git a/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/BulkSparkConf.java b/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/BulkSparkConf.java index 2db19d5..8bc9a7f 100644 --- a/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/BulkSparkConf.java +++ b/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/BulkSparkConf.java @@ -29,6 +29,7 @@ import java.util.Base64; import java.util.Collections; import java.util.List; import java.util.Map; +import java.util.Objects; import java.util.Optional; import java.util.Set; import java.util.concurrent.TimeUnit; @@ -155,7 +156,7 @@ public class BulkSparkConf implements Serializable Optional sidecarPortFromOptions = MapUtils.getOptionalInt(options, WriterOptions.SIDECAR_PORT.name(), "sidecar port"); this.userProvidedSidecarPort = sidecarPortFromOptions.isPresent() ? sidecarPortFromOptions.get() : getOptionalInt(SIDECAR_PORT).orElse(-1); this.effectiveSidecarPort = this.userProvidedSidecarPort == -1 ? DEFAULT_SIDECAR_PORT : this.userProvidedSidecarPort; -this.sidecarInstancesValue = MapUtils.getOrThrow(options, WriterOptions.SIDECAR_INSTANCES.name(), "sidecar_instances"); +this.sidecarInstancesValue = MapUtils.getOrDefault(options, WriterOptions.SIDECAR_INSTANCES.name(), null); this.sidecarInstances = sidecarInstances(); this.keyspace = MapUtils.getOrThrow(options, WriterOptions.KEYSPACE.name()); this.table = MapUtils.getOrThrow(options, WriterOptions.TABLE.name()); @@ -264,7 +265,9 @@ public class BulkSparkConf implements Serializable protected Set buildSidecarInstances() { -return Arrays.stream(sidecarInstancesValue.split(",")) +String[] split = Objects.requireNonNull(sidecarInstancesValue, "Unable to build sidecar instances from null value") +.split(","); +return Arrays.stream(split) .map(hostname -> new SidecarInstanceImpl(hostname, effectiveSidecarPort)) .collect(Collectors.toSet()); } diff --git a/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/SidecarProvisionedSSTable.java b/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/SidecarProvisionedSSTable.java index 6e4ff0f..db9e2fd 100644 --- a/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/SidecarProvisionedSSTable.java +++ b/cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/data/SidecarProvisionedSSTable.java @@ -124,7 +124,7 @@ public class SidecarProvisionedSSTable extends SSTable { return null; } -return openStream(snapshotFile.fileName, snapshotFile.size, fileType); +return openStream(snapshotFile, fileType); } public long length(FileType fileType) @@ -144,20 +144,20 @@ public class SidecarProvisionedSSTable extends SSTable } @Nullable -private InputStream openStream(String component, long size, FileType fileType) +private InputStream openStream(ListSnapshotFilesResponse.FileInfo snapshotFile, FileType fileType) { -if (component == null) +if (snapshotFile == null) { return null; } if (fileType == FileType.COMPRESSION_INFO) { -String key = String.format("%s/%s/%s/%s/%s", instance.hostname(), keyspace, table, snapshotName, component); +String key = String.format("%s/%s/%s/%s/%s", instance.hostname(), keyspace, table, snapshotName, snapshotFile.fileName); byte[] bytes; try { -bytes = COMPRESSION_CACHE.get(key, () -> IOUtils.toByteArray(open(component, fileType, size))); +bytes = COMPRESSION_CACHE.get(key, () -> IOUtils.toByteArray(open(snapshotFile, fileType))); } catch
Re: [PR] CASSANDRA-19582: Consume new Sidecar client API to stream SSTables [cassandra-analytics]
frankgh merged PR #54: URL: https://github.com/apache/cassandra-analytics/pull/54 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19558) Standalone jenkinsfile first round bug fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-19558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-19558: --- Source Control Link: https://github.com/apache/cassandra-builds/commit/5a9ba1a1962794a338cecaa7d8e7f23cd0ea09fd https://github.com/apache/cassandra-builds/commit/eab310bd76329be5d47c7a8c4e8837bbb3e2fff0 > Standalone jenkinsfile first round bug fixes > > > Key: CASSANDRA-19558 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19558 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: CASSANDRA-19558_50_#5_ci_summary.html, > CASSANDRA-19558_50_#5_results_details.tar.xz, > CASSANDRA-19558-5.0_#13_ci_summary.html, > CASSANDRA-19558-5.0_#13_results_details.tar.xz, > CASSANDRA-19558-5.0_#16_ci_summary.html, > CASSANDRA-19558-5.0_#16_results_details.tar.xz, > CASSANDRA-19558_#8_ci_summary.html, CASSANDRA-19558_#8_results_details.tar.xz > > > A few follow up improvements and bug fixes for the standalone jenkinsfile. > - add at top a list of test failures in ci_summary.html > - docker scripts always try to login (as base images need to be pulled too) > - move simulator-dtests to large containers (they need 8g just heap) > - in ubuntu2004_test.docker make sure /home/cassandra exists and has correct > perms (from marcuse) > - persist the jenkinsfile parameters from run to run (important for the > post-commit jobs to keep their non-default branch and profile values) (was > CASSANDRA-19536) > - increase jvm-dtest splits from 8 to 12 > - when on ci-cassandra, replace use of copyArtifacts in Jenkinsfile > generateTestReports() with manual wget of test files, allowing the summary > phase to be run on any agent (copyArtifact would take >4hrs otherwise) (was > INFRA-25694) > - copy ci_summary.html and results_details.tar.xz to nightlies -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-builds) branch trunk updated: Add agent_scripts/ for reporting and cleaning agents in a jenkins installation with permament agents
This is an automated email from the ASF dual-hosted git repository. mck pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git The following commit(s) were added to refs/heads/trunk by this push: new eab310b Add agent_scripts/ for reporting and cleaning agents in a jenkins installation with permament agents eab310b is described below commit eab310bd76329be5d47c7a8c4e8837bbb3e2fff0 Author: Mick Semb Wever AuthorDate: Sat Apr 20 22:10:49 2024 +0200 Add agent_scripts/ for reporting and cleaning agents in a jenkins installation with permament agents This scripts are not embedded into the in-tree Jenkinsfile's so that they can be more easily edited. (They are infrastructure related, rather than release branch related.) Also solves CASSANDRA-18130 patch by Mick Semb Wever; reviewed by Brandon Williams for CASSANDRA-19558 --- jenkins-dsl/agent_scripts/agent_report.sh | 30 +++ jenkins-dsl/agent_scripts/docker_agent_cleaner.sh | 52 + jenkins-dsl/agent_scripts/docker_image_pruner.py | 62 + jenkins-dsl/cassandra_job_dsl_seed.groovy | 264 +- jenkins-dsl/cassandra_pipeline.groovy | 2 +- 5 files changed, 257 insertions(+), 153 deletions(-) diff --git a/jenkins-dsl/agent_scripts/agent_report.sh b/jenkins-dsl/agent_scripts/agent_report.sh new file mode 100644 index 000..d190b4f --- /dev/null +++ b/jenkins-dsl/agent_scripts/agent_report.sh @@ -0,0 +1,30 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Report agent hardware and storage +# +# CASSANDRA-18130 + +echo "" +echo $(date) +echo "${JOB_NAME} ${BUILD_NUMBER} ${STAGE_NAME}" +echo +du -xmh / 2>/dev/null | sort -rh | head -n 30 +echo +df -h +echo +docker system df -v \ No newline at end of file diff --git a/jenkins-dsl/agent_scripts/docker_agent_cleaner.sh b/jenkins-dsl/agent_scripts/docker_agent_cleaner.sh new file mode 100644 index 000..1da2259 --- /dev/null +++ b/jenkins-dsl/agent_scripts/docker_agent_cleaner.sh @@ -0,0 +1,52 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Cleans jenkins agents. Primarily used by ci-cassandra.a.o +# +# First argument is `maxJobHours`, all docker objects older than this are pruned +# +# Assumes a CI running multiple C* branches and other jobs + + +# pre-conditions +command -v docker >/dev/null 2>&1 || { error 1 "docker needs to be installed"; } +command -v virtualenv >/dev/null 2>&1 || { error 1 "virtualenv needs to be installed"; } +(docker info >/dev/null 2>&1) || { error 1 "docker needs to running"; } +[ -f "./docker_image_pruner.py" ] || { error 1 "./docker_image_pruner.py must exist"; } + +# arguments +maxJobHours=12 +[ "$#" -gt 0 ] && maxJobHours=$1 + +error() { +echo >&2 $2; +set -x +exit $1 +} + +echo -n "docker system prune --all --force --filter \"until=${maxJobHours}h\" : " +docker system prune --all --force --filter "until=${maxJobHours}h" +if !( pgrep -xa docker &> /dev/null || pgrep -af "build/docker" &> /dev/null || pgrep -af "cassandra-builds/build-scripts" &> /dev/null ) ; then +echo -n "docker system prune --force : " +docker system prune --force || true ; +fi; + +virtualenv -p python3 -q .venv +source .venv/bin/activate +pip -q install requests +python docker_image_pruner.py +deactivate \ No newline at end of file diff --git a/jenkins-dsl/agent_scripts/docker_image_pruner.py
[jira] [Comment Edited] (CASSANDRA-19558) Standalone jenkinsfile first round bug fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-19558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839287#comment-17839287 ] Michael Semb Wever edited comment on CASSANDRA-19558 at 4/23/24 7:31 PM: - A number of issues have been identified. – need to capture the generateTestReports logs, – fix ant version in centos7-build.docker – remove docker login (if credentials exist then docker is logged in) – prefetch docker images from jfrog (to reduce dockerhub pull rate limits) – use scripts from cassandra-builds to clean and report on agents – stream xz where possible (not an issue, just perf improvement) The docker pull rate limit was the most serious, blocking. patches… 1. this is part of CASSANDRA-18594 (most of it is already running (manually deployed) to ci-cassandra) [https://github.com/apache/cassandra-builds/compare/trunk...thelastpickle:cassandra-builds:mck/18594] ( [https://github.com/apache/cassandra-builds/commit/eb3eb5e] ) 2. and for CASSANDRA-19558 [https://github.com/apache/cassandra-builds/compare/trunk...thelastpickle:cassandra-builds:mck/19558] ( [https://github.com/thelastpickle/cassandra-builds/commit/9f8ae9dcacd0d744992cc4eaf50f29a8836ffdbd] ) 3. and then (also for CASSANDRA-19558 ) but comes last (i can test it manually once (2) is committed) [https://github.com/apache/cassandra/commit/92c0cb7] (this is on top of the previous commit (that already passed review) in [https://github.com/apache/cassandra/compare/apache:cassandra:cassandra-5.0...thelastpickle:cassandra:mck/jenkinsfile-persist-parameters-5.0|https://github.com/apache/cassandra/compare/apache:cassandra:cassandra-5.0...thelastpickle:cassandra:mck/jenkinsfile-persist-parameters-5.0] ) was (Author: michaelsembwever): A number of issues have been identified. – need to capture the generateTestReports logs, – fix ant version in centos7-build.docker – remove docker login (if credentials exist then docker is logged in) – prefetch docker images from jfrog (to reduce dockerhub pull rate limits) – use scripts from cassandra-builds to clean and report on agents – stream xz where possible (not an issue, just perf improvement) The docker pull rate limit was the most serious, blocking. patches… 1. this is part of CASSANDRA-18594 (most of it is already running (manually deployed) to ci-cassandra) [https://github.com/apache/cassandra-builds/compare/trunk...thelastpickle:cassandra-builds:mck/18594] ( [https://github.com/apache/cassandra-builds/commit/eb3eb5e] ) 2. and for CASSANDRA-19558 [https://github.com/thelastpickle/cassandra-builds/compare/mck/18594...thelastpickle:cassandra-builds:mck/19558] ( [https://github.com/thelastpickle/cassandra-builds/commit/9f8ae9dcacd0d744992cc4eaf50f29a8836ffdbd] ) 3. and then (also for CASSANDRA-19558 ) but comes last (i can test it manually once (2) is committed) [https://github.com/apache/cassandra/commit/92c0cb7] (this is on top of the previous commit (that already passed review) in [https://github.com/apache/cassandra/compare/apache:cassandra:cassandra-5.0...thelastpickle:cassandra:mck/jenkinsfile-persist-parameters-5.0|https://github.com/apache/cassandra/compare/apache:cassandra:cassandra-5.0...thelastpickle:cassandra:mck/jenkinsfile-persist-parameters-5.0] ) > Standalone jenkinsfile first round bug fixes > > > Key: CASSANDRA-19558 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19558 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: CASSANDRA-19558_50_#5_ci_summary.html, > CASSANDRA-19558_50_#5_results_details.tar.xz, > CASSANDRA-19558-5.0_#13_ci_summary.html, > CASSANDRA-19558-5.0_#13_results_details.tar.xz, > CASSANDRA-19558-5.0_#16_ci_summary.html, > CASSANDRA-19558-5.0_#16_results_details.tar.xz, > CASSANDRA-19558_#8_ci_summary.html, CASSANDRA-19558_#8_results_details.tar.xz > > > A few follow up improvements and bug fixes for the standalone jenkinsfile. > - add at top a list of test failures in ci_summary.html > - docker scripts always try to login (as base images need to be pulled too) > - move simulator-dtests to large containers (they need 8g just heap) > - in ubuntu2004_test.docker make sure /home/cassandra exists and has correct > perms (from marcuse) > - persist the jenkinsfile parameters from run to run (important for the > post-commit jobs to keep their non-default branch and profile values) (was > CASSANDRA-19536) > - increase jvm-dtest splits from 8 to 12 > - when on ci-cassandra, replace use of copyArtifacts in Jenkinsfile > generateTestReports() with manual wget of test files, allowing the summary > phase to be run on any agent (copyArtifact would
[jira] [Comment Edited] (CASSANDRA-19558) Standalone jenkinsfile first round bug fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-19558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839287#comment-17839287 ] Michael Semb Wever edited comment on CASSANDRA-19558 at 4/23/24 7:26 PM: - A number of issues have been identified. – need to capture the generateTestReports logs, – fix ant version in centos7-build.docker – remove docker login (if credentials exist then docker is logged in) – prefetch docker images from jfrog (to reduce dockerhub pull rate limits) – use scripts from cassandra-builds to clean and report on agents – stream xz where possible (not an issue, just perf improvement) The docker pull rate limit was the most serious, blocking. patches… 1. this is part of CASSANDRA-18594 (most of it is already running (manually deployed) to ci-cassandra) [https://github.com/apache/cassandra-builds/compare/trunk...thelastpickle:cassandra-builds:mck/18594] ( [https://github.com/apache/cassandra-builds/commit/eb3eb5e] ) 2. and for CASSANDRA-19558 [https://github.com/thelastpickle/cassandra-builds/compare/mck/18594...thelastpickle:cassandra-builds:mck/19558] ( [https://github.com/thelastpickle/cassandra-builds/commit/9f8ae9dcacd0d744992cc4eaf50f29a8836ffdbd] ) 3. and then (also for CASSANDRA-19558 ) but comes last (i can test it manually once (2) is committed) [https://github.com/apache/cassandra/commit/92c0cb7] (this is on top of the previous commit (that already passed review) in [https://github.com/apache/cassandra/compare/apache:cassandra:cassandra-5.0...thelastpickle:cassandra:mck/jenkinsfile-persist-parameters-5.0|https://github.com/apache/cassandra/compare/apache:cassandra:cassandra-5.0...thelastpickle:cassandra:mck/jenkinsfile-persist-parameters-5.0] ) was (Author: michaelsembwever): A number of issues have been identified. – need to capture the generateTestReports logs, – fix ant version in centos7-build.docker – remove docker login (if credentials exist then docker is logged in) – prefetch docker images from jfrog (to reduce dockerhub pull rate limits) – use scripts from cassandra-builds to clean and report on agents – stream xz where possible (not an issue, just perf improvement) The docker pull rate limit was the most serious, blocking. patches… 1. this is part of CASSANDRA-18594 (most of it is already running (manually deployed) to ci-cassandra) [https://github.com/apache/cassandra-builds/compare/trunk...thelastpickle:cassandra-builds:mck/18594] ( [https://github.com/apache/cassandra-builds/commit/eb3eb5e] ) 2. and for CASSANDRA-19558 [https://github.com/thelastpickle/cassandra-builds/compare/mck/18594...thelastpickle:cassandra-builds:mck/19558] ( [https://github.com/thelastpickle/cassandra-builds/commit/9f8ae9dcacd0d744992cc4eaf50f29a8836ffdbd] ) 3. and then (also for CASSANDRA-19558 ) but comes last (i can test it manually once (2) is committed) [https://github.com/apache/cassandra/commit/b3e1e40658e99c37f2a142e247ca4305fcc52eb0] (this is on top of the previous commit (that already passed review) in [https://github.com/apache/cassandra/compare/apache:cassandra:cassandra-5.0...thelastpickle:cassandra:mck/mck/jenkinsfile-persist-parameters-5.0-test] ) > Standalone jenkinsfile first round bug fixes > > > Key: CASSANDRA-19558 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19558 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: CASSANDRA-19558_50_#5_ci_summary.html, > CASSANDRA-19558_50_#5_results_details.tar.xz, > CASSANDRA-19558-5.0_#13_ci_summary.html, > CASSANDRA-19558-5.0_#13_results_details.tar.xz, > CASSANDRA-19558-5.0_#16_ci_summary.html, > CASSANDRA-19558-5.0_#16_results_details.tar.xz, > CASSANDRA-19558_#8_ci_summary.html, CASSANDRA-19558_#8_results_details.tar.xz > > > A few follow up improvements and bug fixes for the standalone jenkinsfile. > - add at top a list of test failures in ci_summary.html > - docker scripts always try to login (as base images need to be pulled too) > - move simulator-dtests to large containers (they need 8g just heap) > - in ubuntu2004_test.docker make sure /home/cassandra exists and has correct > perms (from marcuse) > - persist the jenkinsfile parameters from run to run (important for the > post-commit jobs to keep their non-default branch and profile values) (was > CASSANDRA-19536) > - increase jvm-dtest splits from 8 to 12 > - when on ci-cassandra, replace use of copyArtifacts in Jenkinsfile > generateTestReports() with manual wget of test files, allowing the summary > phase to be run on any agent (copyArtifact would take >4hrs otherwise) (was > INFRA-25694) > - copy ci_summary.html and
[jira] [Updated] (CASSANDRA-19567) Minimize the heap consumption when registering metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-19567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-19567: Test and Documentation Plan: n/a Status: Patch Available (was: In Progress) > Minimize the heap consumption when registering metrics > -- > > Key: CASSANDRA-19567 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19567 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Normal > Fix For: 5.x > > Attachments: summary.png > > Time Spent: 10m > Remaining Estimate: 0h > > The problem is only reproducible on the x86 machine, the problem is not > reproducible on the arm64. A quick analysis showed a lot of MetricName > objects stored in the heap, although the real cause could be related to > something else, the MetricName object requires extra attention. > To reproduce run the command run locally: > {code} > ant test-jvm-dtest-some > -Dtest.name=org.apache.cassandra.distributed.test.ReadRepairTest > {code} > The error: > {code:java} > [junit-timeout] Exception in thread "main" java.lang.OutOfMemoryError: Java > heap space > [junit-timeout] at > java.base/java.lang.StringLatin1.newString(StringLatin1.java:769) > [junit-timeout] at > java.base/java.lang.StringBuffer.toString(StringBuffer.java:716) > [junit-timeout] at > org.apache.cassandra.CassandraBriefJUnitResultFormatter.endTestSuite(CassandraBriefJUnitResultFormatter.java:191) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.fireEndTestSuite(JUnitTestRunner.java:854) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:578) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1197) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1042) > [junit-timeout] Testsuite: > org.apache.cassandra.distributed.test.ReadRepairTest-cassandra.testtag_IS_UNDEFINED > [junit-timeout] Testsuite: > org.apache.cassandra.distributed.test.ReadRepairTest-cassandra.testtag_IS_UNDEFINED > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec > [junit-timeout] > [junit-timeout] Testcase: > org.apache.cassandra.distributed.test.ReadRepairTest:readRepairRTRangeMovementTest-cassandra.testtag_IS_UNDEFINED: > Caused an ERROR > [junit-timeout] Forked Java VM exited abnormally. Please note the time in the > report does not reflect the time until the VM exit. > [junit-timeout] junit.framework.AssertionFailedError: Forked Java VM exited > abnormally. Please note the time in the report does not reflect the time > until the VM exit. > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at java.base/java.util.Vector.forEach(Vector.java:1365) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at java.base/java.util.Vector.forEach(Vector.java:1365) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] > [junit-timeout] > [junit-timeout] Test org.apache.cassandra.distributed.test.ReadRepairTest > FAILED (crashed)BUILD FAILED > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840204#comment-17840204 ] Alex Petrov commented on CASSANDRA-19221: - Addressed your comments [~samt], both failures are timeouts that are unrelated to the patch. I believe we should split the {{MetadataChangeSimulationTest}} since after adding transient tests it seems to sometimes cross the timeout deadline. > CMS: Nodes can restart with new ipaddress already defined in the cluster > > > Key: CASSANDRA-19221 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19221 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata >Reporter: Paul Chandler >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-1.html, ci_summary.html > > > I am simulating running a cluster in Kubernetes and testing what happens when > several pods go down and ip addresses are swapped between nodes. In 4.0 this > is blocked and the node cannot be restarted. > To simulate this I create a 3 node cluster on a local machine using 3 > loopback addresses > {code} > 127.0.0.1 > 127.0.0.2 > 127.0.0.3 > {code} > The nodes are created correctly and the first node is assigned as a CMS node > as shown: > {code} > bin/nodetool -p 7199 describecms > {code} > Cluster Metadata Service: > {code} > Members: /127.0.0.1:7000 > Is Member: true > Service State: LOCAL > {code} > At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip > addresses for the rpc_address and listen_address > > The nodes come back as normal, but the nodeid has now been swapped against > the ip address: > Before: > {code} > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 75.2 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-0003 rack1 > UN 127.0.0.2 86.77 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-0002 rack1 > UN 127.0.0.1 80.88 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-0001 rack1 > {code} > After: > {code} > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 149.62 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-0003 rack1 > UN 127.0.0.2 155.48 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-0002 rack1 > UN 127.0.0.1 75.74 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-0001 rack1 > {code} > On previous tests of this I have created a table with a replication factor of > 1, inserted some data before the swap. After the swap the data on nodes 2 > and 3 is now missing. > One theory I have is that I am using different port numbers for the different > nodes, and I am only swapping the ip addresses and not the port numbers, so > the ip:port still looks unique > i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044 > and 127.0.0.3:9044 becomes 127.0.0.3:9043 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19567) Minimize the heap consumption when registering metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-19567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840203#comment-17840203 ] Caleb Rackliffe commented on CASSANDRA-19567: - That looks promising. Reviewing shortly... > Minimize the heap consumption when registering metrics > -- > > Key: CASSANDRA-19567 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19567 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Normal > Fix For: 5.x > > Attachments: summary.png > > Time Spent: 10m > Remaining Estimate: 0h > > The problem is only reproducible on the x86 machine, the problem is not > reproducible on the arm64. A quick analysis showed a lot of MetricName > objects stored in the heap, although the real cause could be related to > something else, the MetricName object requires extra attention. > To reproduce run the command run locally: > {code} > ant test-jvm-dtest-some > -Dtest.name=org.apache.cassandra.distributed.test.ReadRepairTest > {code} > The error: > {code:java} > [junit-timeout] Exception in thread "main" java.lang.OutOfMemoryError: Java > heap space > [junit-timeout] at > java.base/java.lang.StringLatin1.newString(StringLatin1.java:769) > [junit-timeout] at > java.base/java.lang.StringBuffer.toString(StringBuffer.java:716) > [junit-timeout] at > org.apache.cassandra.CassandraBriefJUnitResultFormatter.endTestSuite(CassandraBriefJUnitResultFormatter.java:191) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.fireEndTestSuite(JUnitTestRunner.java:854) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:578) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1197) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1042) > [junit-timeout] Testsuite: > org.apache.cassandra.distributed.test.ReadRepairTest-cassandra.testtag_IS_UNDEFINED > [junit-timeout] Testsuite: > org.apache.cassandra.distributed.test.ReadRepairTest-cassandra.testtag_IS_UNDEFINED > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec > [junit-timeout] > [junit-timeout] Testcase: > org.apache.cassandra.distributed.test.ReadRepairTest:readRepairRTRangeMovementTest-cassandra.testtag_IS_UNDEFINED: > Caused an ERROR > [junit-timeout] Forked Java VM exited abnormally. Please note the time in the > report does not reflect the time until the VM exit. > [junit-timeout] junit.framework.AssertionFailedError: Forked Java VM exited > abnormally. Please note the time in the report does not reflect the time > until the VM exit. > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at java.base/java.util.Vector.forEach(Vector.java:1365) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at java.base/java.util.Vector.forEach(Vector.java:1365) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] > [junit-timeout] > [junit-timeout] Test org.apache.cassandra.distributed.test.ReadRepairTest > FAILED (crashed)BUILD FAILED > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19567) Minimize the heap consumption when registering metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-19567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-19567: Reviewers: Caleb Rackliffe, Caleb Rackliffe (was: Caleb Rackliffe) Caleb Rackliffe, Caleb Rackliffe (was: Caleb Rackliffe) Status: Review In Progress (was: Patch Available) > Minimize the heap consumption when registering metrics > -- > > Key: CASSANDRA-19567 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19567 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Normal > Fix For: 5.x > > Attachments: summary.png > > Time Spent: 10m > Remaining Estimate: 0h > > The problem is only reproducible on the x86 machine, the problem is not > reproducible on the arm64. A quick analysis showed a lot of MetricName > objects stored in the heap, although the real cause could be related to > something else, the MetricName object requires extra attention. > To reproduce run the command run locally: > {code} > ant test-jvm-dtest-some > -Dtest.name=org.apache.cassandra.distributed.test.ReadRepairTest > {code} > The error: > {code:java} > [junit-timeout] Exception in thread "main" java.lang.OutOfMemoryError: Java > heap space > [junit-timeout] at > java.base/java.lang.StringLatin1.newString(StringLatin1.java:769) > [junit-timeout] at > java.base/java.lang.StringBuffer.toString(StringBuffer.java:716) > [junit-timeout] at > org.apache.cassandra.CassandraBriefJUnitResultFormatter.endTestSuite(CassandraBriefJUnitResultFormatter.java:191) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.fireEndTestSuite(JUnitTestRunner.java:854) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:578) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1197) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1042) > [junit-timeout] Testsuite: > org.apache.cassandra.distributed.test.ReadRepairTest-cassandra.testtag_IS_UNDEFINED > [junit-timeout] Testsuite: > org.apache.cassandra.distributed.test.ReadRepairTest-cassandra.testtag_IS_UNDEFINED > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec > [junit-timeout] > [junit-timeout] Testcase: > org.apache.cassandra.distributed.test.ReadRepairTest:readRepairRTRangeMovementTest-cassandra.testtag_IS_UNDEFINED: > Caused an ERROR > [junit-timeout] Forked Java VM exited abnormally. Please note the time in the > report does not reflect the time until the VM exit. > [junit-timeout] junit.framework.AssertionFailedError: Forked Java VM exited > abnormally. Please note the time in the report does not reflect the time > until the VM exit. > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at java.base/java.util.Vector.forEach(Vector.java:1365) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at java.base/java.util.Vector.forEach(Vector.java:1365) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] > [junit-timeout] > [junit-timeout] Test org.apache.cassandra.distributed.test.ReadRepairTest > FAILED (crashed)BUILD FAILED > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-19221: Attachment: ci_summary-1.html > CMS: Nodes can restart with new ipaddress already defined in the cluster > > > Key: CASSANDRA-19221 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19221 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata >Reporter: Paul Chandler >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-1.html, ci_summary.html > > > I am simulating running a cluster in Kubernetes and testing what happens when > several pods go down and ip addresses are swapped between nodes. In 4.0 this > is blocked and the node cannot be restarted. > To simulate this I create a 3 node cluster on a local machine using 3 > loopback addresses > {code} > 127.0.0.1 > 127.0.0.2 > 127.0.0.3 > {code} > The nodes are created correctly and the first node is assigned as a CMS node > as shown: > {code} > bin/nodetool -p 7199 describecms > {code} > Cluster Metadata Service: > {code} > Members: /127.0.0.1:7000 > Is Member: true > Service State: LOCAL > {code} > At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip > addresses for the rpc_address and listen_address > > The nodes come back as normal, but the nodeid has now been swapped against > the ip address: > Before: > {code} > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 75.2 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-0003 rack1 > UN 127.0.0.2 86.77 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-0002 rack1 > UN 127.0.0.1 80.88 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-0001 rack1 > {code} > After: > {code} > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 149.62 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-0003 rack1 > UN 127.0.0.2 155.48 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-0002 rack1 > UN 127.0.0.1 75.74 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-0001 rack1 > {code} > On previous tests of this I have created a table with a replication factor of > 1, inserted some data before the swap. After the swap the data on nodes 2 > and 3 is now missing. > One theory I have is that I am using different port numbers for the different > nodes, and I am only swapping the ip addresses and not the port numbers, so > the ip:port still looks unique > i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044 > and 127.0.0.3:9044 becomes 127.0.0.3:9043 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15439) Token metadata for bootstrapping nodes is lost under temporary failures
[ https://issues.apache.org/jira/browse/CASSANDRA-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840198#comment-17840198 ] Raymond Huffman commented on CASSANDRA-15439: - Here's a PR that implements the additional config option. https://github.com/apache/cassandra/pull/3270 > Token metadata for bootstrapping nodes is lost under temporary failures > --- > > Key: CASSANDRA-15439 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15439 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Josh Snyder >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > In CASSANDRA-8838, [~pauloricardomg] asked "hints will not be stored to the > bootstrapping node after RING_DELAY, since it will evicted from the TMD > pending ranges. Should we create a ticket to address this?" > CASSANDRA-15264 relates to the most likely cause of such situations, where > the Cassandra daemon on the bootstrapping node completely crashes. Based on > testing with {{kill -STOP}} on a bootstrapping Cassandra JVM, I believe it > also is possible to remove token metadata (and thus pending ranges, and thus > hints) for a bootstrapping node, simply by affecting its status in the > failure detector. > A node in the cluster sees the bootstrapping node this way: > {noformat} > INFO [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java: - Node > /PUBLIC-IP is now part of the cluster > INFO [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - > InetAddress /PUBLIC-IP is now UP > INFO [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 > OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP > INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 > StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 > ID#0] Creating new streaming plan for Bootstrap > INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 > StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, > ID#0] Received streaming plan for Bootstrap > INFO [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 > StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, > ID#0] Received streaming plan for Bootstrap > INFO [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 > StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 > ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 > files(139744616815 bytes) > INFO [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - > InetAddress /PUBLIC-IP is now DOWN > INFO [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient > /PUBLIC-IP has been silent for 3ms, removing from gossip > {noformat} > Since the bootstrapping node has no tokens, it is treated like a fat client, > and it is removed from the ring. For correctness purposes, I believe we must > keep storing hints for the downed bootstrapping node until it is either > assassinated or until a replacement attempts to bootstrap for the same token. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19564) MemtablePostFlush deadlock leads to stuck nodes and crashes
[ https://issues.apache.org/jira/browse/CASSANDRA-19564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840197#comment-17840197 ] Jon Haddad edited comment on CASSANDRA-19564 at 4/23/24 6:53 PM: - I've rolled out a small patch (https://github.com/rustyrazorblade/cassandra/tree/jhaddad/4.1.4-extra-logging) that adds a monitoring thread to each flush and have found that in every case I see stacktraces related to the filesystem, which is ZFS: {noformat} "MemtablePostFlush:1" daemon prio=5 Id=429 RUNNABLE at java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.unlink0(Native Method) at java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.unlink(UnixNativeDispatcher.java:156) at java.base@11.0.22/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:236) at java.base@11.0.22/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) at java.base@11.0.22/java.nio.file.Files.delete(Files.java:1142) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:252) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:299) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:306) ... Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@1fc5251a {noformat} and {noformat} "MemtablePostFlush:1" daemon prio=5 Id=429 RUNNABLE at java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.lstat0(Native Method) at java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.lstat(UnixNativeDispatcher.java:332) at java.base@11.0.22/sun.nio.fs.UnixFileAttributes.get(UnixFileAttributes.java:72) at java.base@11.0.22/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:232) at java.base@11.0.22/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) at java.base@11.0.22/java.nio.file.Files.delete(Files.java:1142) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:252) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:299) ... Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@1fc5251a {noformat} I have several dozen of these and am finding in every case we're either at {{java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.unlink0}} or {{java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.lstat0}} I'm moving this cluster off ZFS and onto XFS, if I find the issue goes away I'll close this out. I don't think there's anything we can do about unreliable filesystems other than improving our error reporting around it. was (Author: rustyrazorblade): I've rolled out a small patch (https://github.com/rustyrazorblade/cassandra/tree/jhaddad/4.1.4-extra-logging) that adds a monitoring thread to each flush and have found that in every case I see stacktraces related to the filesystem (ZFS), such as this: {noformat} "MemtablePostFlush:1" daemon prio=5 Id=429 RUNNABLE at java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.unlink0(Native Method) at java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.unlink(UnixNativeDispatcher.java:156) at java.base@11.0.22/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:236) at java.base@11.0.22/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) at java.base@11.0.22/java.nio.file.Files.delete(Files.java:1142) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:252) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:299) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:306) ... Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@1fc5251a {noformat} and {noformat} "MemtablePostFlush:1" daemon prio=5 Id=429 RUNNABLE at java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.lstat0(Native Method) at java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.lstat(UnixNativeDispatcher.java:332) at java.base@11.0.22/sun.nio.fs.UnixFileAttributes.get(UnixFileAttributes.java:72) at java.base@11.0.22/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:232) at java.base@11.0.22/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) at java.base@11.0.22/java.nio.file.Files.delete(Files.java:1142) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:252) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:299) ... Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@1fc5251a {noformat} I have
[jira] [Commented] (CASSANDRA-19564) MemtablePostFlush deadlock leads to stuck nodes and crashes
[ https://issues.apache.org/jira/browse/CASSANDRA-19564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840197#comment-17840197 ] Jon Haddad commented on CASSANDRA-19564: I've rolled out a small patch (https://github.com/rustyrazorblade/cassandra/tree/jhaddad/4.1.4-extra-logging) that adds a monitoring thread to each flush and have found that in every case I see stacktraces related to the filesystem (ZFS), such as this: {noformat} "MemtablePostFlush:1" daemon prio=5 Id=429 RUNNABLE at java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.unlink0(Native Method) at java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.unlink(UnixNativeDispatcher.java:156) at java.base@11.0.22/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:236) at java.base@11.0.22/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) at java.base@11.0.22/java.nio.file.Files.delete(Files.java:1142) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:252) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:299) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:306) ... Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@1fc5251a {noformat} and {noformat} "MemtablePostFlush:1" daemon prio=5 Id=429 RUNNABLE at java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.lstat0(Native Method) at java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.lstat(UnixNativeDispatcher.java:332) at java.base@11.0.22/sun.nio.fs.UnixFileAttributes.get(UnixFileAttributes.java:72) at java.base@11.0.22/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:232) at java.base@11.0.22/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105) at java.base@11.0.22/java.nio.file.Files.delete(Files.java:1142) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:252) at app//org.apache.cassandra.io.util.PathUtils.delete(PathUtils.java:299) ... Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@1fc5251a {noformat} I have several dozen of these and am finding in every case we're either at {{java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.unlink0}} or {{java.base@11.0.22/sun.nio.fs.UnixNativeDispatcher.lstat0}} I'm moving this cluster off ZFS and onto XFS, if I find the issue goes away I'll close this out. I don't think there's anything we can do about unreliable filesystems other than improving our error reporting around it. > MemtablePostFlush deadlock leads to stuck nodes and crashes > --- > > Key: CASSANDRA-19564 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19564 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/Memtable >Reporter: Jon Haddad >Priority: Urgent > Fix For: 4.1.x > > Attachments: image-2024-04-16-11-55-54-750.png, > image-2024-04-16-12-29-15-386.png, image-2024-04-16-13-43-11-064.png, > image-2024-04-16-13-53-24-455.png, image-2024-04-17-18-46-29-474.png, > image-2024-04-17-19-13-06-769.png, image-2024-04-17-19-14-34-344.png, > screenshot-1.png > > > I've run into an issue on a 4.1.4 cluster where an entire node has locked up > due to what I believe is a deadlock in memtable flushing. Here's what I know > so far. I've stitched together what happened based on conversations, logs, > and some flame graphs. > *Log reports memtable flushing* > The last successful flush happens at 12:19. > {noformat} > INFO [NativePoolCleaner] 2024-04-16 12:19:53,634 > AbstractAllocatorMemtable.java:286 - Flushing largest CFS(Keyspace='ks', > ColumnFamily='version') to free up room. Used total: 0.24/0.33, live: > 0.16/0.20, flushing: 0.09/0.13, this: 0.13/0.15 > INFO [NativePoolCleaner] 2024-04-16 12:19:53,634 ColumnFamilyStore.java:1012 > - Enqueuing flush of ks.version, Reason: MEMTABLE_LIMIT, Usage: 660.521MiB > (13%) on-heap, 790.606MiB (15%) off-heap > {noformat} > *MemtablePostFlush appears to be blocked* > At this point, MemtablePostFlush completed tasks stops incrementing, active > stays at 1 and pending starts to rise. > {noformat} > MemtablePostFlush 1 1 3446 0 0 > {noformat} > > The flame graph reveals that PostFlush.call is stuck. I don't have the line > number, but I know we're stuck in > {{org.apache.cassandra.db.ColumnFamilyStore.PostFlush#call}} given the visual > below: > *!image-2024-04-16-13-43-11-064.png!* > *Memtable flushing is now blocked.* > All MemtableFlushWriter threads are Parked waiting on > {{{}OpOrder.Barrier.await{}}}. A wall clock
[jira] [Updated] (CASSANDRA-19582) [Analytics] Consume new Sidecar client API to stream SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-19582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francisco Guerrero updated CASSANDRA-19582: --- Status: Ready to Commit (was: Review In Progress) > [Analytics] Consume new Sidecar client API to stream SSTables > - > > Key: CASSANDRA-19582 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19582 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > A new client API was recently introduced in Sidecar to stream SSTables. > Cassandra Analytics needs to start consuming the new API in order to take > advantage of the fixes when streaming SSTables from a Cassandra installation > with more than one data directory. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19582) [Analytics] Consume new Sidecar client API to stream SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-19582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francisco Guerrero updated CASSANDRA-19582: --- Reviewers: Yifan Cai Status: Review In Progress (was: Patch Available) > [Analytics] Consume new Sidecar client API to stream SSTables > - > > Key: CASSANDRA-19582 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19582 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > A new client API was recently introduced in Sidecar to stream SSTables. > Cassandra Analytics needs to start consuming the new API in order to take > advantage of the fixes when streaming SSTables from a Cassandra installation > with more than one data directory. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability
[ https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840173#comment-17840173 ] Alex Petrov commented on CASSANDRA-19534: - Sorry for the lack of clarity; today there’s no deadline at all. Tasks will live in the system essentially forever clogging queues doing busy work. I was intending to post a patch but it is currently in my CI queue; however otherwise ready to go. i believe with 12 seconds default, users will only see an improvement and there will be no learning curve at all. All configurable are for the people who understand their request lifetimes and want to get an even better profile. > unbounded queues in native transport requests lead to node instability > -- > > Key: CASSANDRA-19534 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19534 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Jon Haddad >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.0-rc, 5.x > > Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - > QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, > Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg > > > When a node is under pressure, hundreds of thousands of requests can show up > in the native transport queue, and it looks like it can take way longer to > timeout than is configured. We should be shedding load much more > aggressively and use a bounded queue for incoming work. This is extremely > evident when we combine a resource consuming workload with a smaller one: > Running 5.0 HEAD on a single node as of today: > {noformat} > # populate only > easy-cass-stress run RandomPartitionAccess -p 100 -r 1 > --workload.rows=10 --workload.select=partition --maxrlat 100 --populate > 10m --rate 50k -n 1 > # workload 1 - larger reads > easy-cass-stress run RandomPartitionAccess -p 100 -r 1 > --workload.rows=10 --workload.select=partition --rate 200 -d 1d > # second workload - small reads > easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat} > It appears our results don't time out at the requested server time either: > > {noformat} > Writes Reads > Deletes Errors > Count Latency (p99) 1min (req/s) | Count Latency (p99) 1min (req/s) | > Count Latency (p99) 1min (req/s) | Count 1min (errors/s) > 950286 70403.93 634.77 | 789524 70442.07 426.02 | > 0 0 0 | 9580484 18980.45 > 952304 70567.62 640.1 | 791072 70634.34 428.36 | > 0 0 0 | 9636658 18969.54 > 953146 70767.34 640.1 | 791400 70767.76 428.36 | > 0 0 0 | 9695272 18969.54 > 956833 71171.28 623.14 | 794009 71175.6 412.79 | > 0 0 0 | 9749377 19002.44 > 959627 71312.58 656.93 | 795703 71349.87 435.56 | > 0 0 0 | 9804907 18943.11{noformat} > > After stopping the load test altogether, it took nearly a minute before the > requests were no longer queued. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability
[ https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840171#comment-17840171 ] Brandon Williams edited comment on CASSANDRA-19534 at 4/23/24 5:53 PM: --- I think this all sounds good, though there may be a bit of a learning curve for users. Native request deadline is easy enough to understand, but things get a bit nuanced past that. Regarding native_transport_timeout_in_ms: bq. Default is 100 seconds, which is unreasonably high, but not unbounded. In practice, we should use at most 12 seconds. Do you mean this currently exists at 100? If not, what is the rationale for that default? was (Author: brandon.williams): I think this all sounds good, though there may be a bit of a learning curve for users. Native request deadline is easy enough to understand, but things get a bit nuanced past that. bq. Default is 100 seconds, which is unreasonably high, but not unbounded. In practice, we should use at most 12 seconds. Do you mean this currently exists at 100? If not, what is the rationale for that default? > unbounded queues in native transport requests lead to node instability > -- > > Key: CASSANDRA-19534 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19534 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Jon Haddad >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.0-rc, 5.x > > Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - > QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, > Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg > > > When a node is under pressure, hundreds of thousands of requests can show up > in the native transport queue, and it looks like it can take way longer to > timeout than is configured. We should be shedding load much more > aggressively and use a bounded queue for incoming work. This is extremely > evident when we combine a resource consuming workload with a smaller one: > Running 5.0 HEAD on a single node as of today: > {noformat} > # populate only > easy-cass-stress run RandomPartitionAccess -p 100 -r 1 > --workload.rows=10 --workload.select=partition --maxrlat 100 --populate > 10m --rate 50k -n 1 > # workload 1 - larger reads > easy-cass-stress run RandomPartitionAccess -p 100 -r 1 > --workload.rows=10 --workload.select=partition --rate 200 -d 1d > # second workload - small reads > easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat} > It appears our results don't time out at the requested server time either: > > {noformat} > Writes Reads > Deletes Errors > Count Latency (p99) 1min (req/s) | Count Latency (p99) 1min (req/s) | > Count Latency (p99) 1min (req/s) | Count 1min (errors/s) > 950286 70403.93 634.77 | 789524 70442.07 426.02 | > 0 0 0 | 9580484 18980.45 > 952304 70567.62 640.1 | 791072 70634.34 428.36 | > 0 0 0 | 9636658 18969.54 > 953146 70767.34 640.1 | 791400 70767.76 428.36 | > 0 0 0 | 9695272 18969.54 > 956833 71171.28 623.14 | 794009 71175.6 412.79 | > 0 0 0 | 9749377 19002.44 > 959627 71312.58 656.93 | 795703 71349.87 435.56 | > 0 0 0 | 9804907 18943.11{noformat} > > After stopping the load test altogether, it took nearly a minute before the > requests were no longer queued. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability
[ https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840171#comment-17840171 ] Brandon Williams commented on CASSANDRA-19534: -- I think this all sounds good, though there may be a bit of a learning curve for users. Native request deadline is easy enough to understand, but things get a bit nuanced past that. bq. Default is 100 seconds, which is unreasonably high, but not unbounded. In practice, we should use at most 12 seconds. Do you mean this currently exists at 100? If not, what is the rationale for that default? > unbounded queues in native transport requests lead to node instability > -- > > Key: CASSANDRA-19534 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19534 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Jon Haddad >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.0-rc, 5.x > > Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - > QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, > Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg > > > When a node is under pressure, hundreds of thousands of requests can show up > in the native transport queue, and it looks like it can take way longer to > timeout than is configured. We should be shedding load much more > aggressively and use a bounded queue for incoming work. This is extremely > evident when we combine a resource consuming workload with a smaller one: > Running 5.0 HEAD on a single node as of today: > {noformat} > # populate only > easy-cass-stress run RandomPartitionAccess -p 100 -r 1 > --workload.rows=10 --workload.select=partition --maxrlat 100 --populate > 10m --rate 50k -n 1 > # workload 1 - larger reads > easy-cass-stress run RandomPartitionAccess -p 100 -r 1 > --workload.rows=10 --workload.select=partition --rate 200 -d 1d > # second workload - small reads > easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat} > It appears our results don't time out at the requested server time either: > > {noformat} > Writes Reads > Deletes Errors > Count Latency (p99) 1min (req/s) | Count Latency (p99) 1min (req/s) | > Count Latency (p99) 1min (req/s) | Count 1min (errors/s) > 950286 70403.93 634.77 | 789524 70442.07 426.02 | > 0 0 0 | 9580484 18980.45 > 952304 70567.62 640.1 | 791072 70634.34 428.36 | > 0 0 0 | 9636658 18969.54 > 953146 70767.34 640.1 | 791400 70767.76 428.36 | > 0 0 0 | 9695272 18969.54 > 956833 71171.28 623.14 | 794009 71175.6 412.79 | > 0 0 0 | 9749377 19002.44 > 959627 71312.58 656.93 | 795703 71349.87 435.56 | > 0 0 0 | 9804907 18943.11{noformat} > > After stopping the load test altogether, it took nearly a minute before the > requests were no longer queued. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-123) Add missing method to retrieve the InetSocketAddress to DriverUtils
[ https://issues.apache.org/jira/browse/CASSANDRASC-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francisco Guerrero updated CASSANDRASC-123: --- Fix Version/s: 1.0 Source Control Link: https://github.com/apache/cassandra-sidecar/commit/77c815071a66fb53b97e9e07695417004dd88804 Resolution: Fixed Status: Resolved (was: Ready to Commit) > Add missing method to retrieve the InetSocketAddress to DriverUtils > --- > > Key: CASSANDRASC-123 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-123 > Project: Sidecar for Apache Cassandra > Issue Type: Bug > Components: Rest API >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > Labels: pull-request-available > Fix For: 1.0 > > > Sidecar introduced a shim layer to access the java driver in CASSANDRASC-79, > and later enhanced that access in CASSANDRASC-88. However, the > {{getInetSocketAddress}} was missed in the shim layer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-sidecar) branch trunk updated: CASSANDRASC-123: Add missing method to retrieve the InetSocketAddress to DriverUtils (#114)
This is an automated email from the ASF dual-hosted git repository. frankgh pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra-sidecar.git The following commit(s) were added to refs/heads/trunk by this push: new 77c8150 CASSANDRASC-123: Add missing method to retrieve the InetSocketAddress to DriverUtils (#114) 77c8150 is described below commit 77c815071a66fb53b97e9e07695417004dd88804 Author: Francisco Guerrero AuthorDate: Tue Apr 23 10:44:28 2024 -0700 CASSANDRASC-123: Add missing method to retrieve the InetSocketAddress to DriverUtils (#114) Patch by Francisco Guerrero; Reviewed by Yifan Cai for CASSANDRASC-123 --- .../apache/cassandra/sidecar/common/utils/DriverUtils.java| 11 +++ .../cassandra/sidecar/cluster/CassandraAdapterDelegate.java | 2 +- .../cassandra/sidecar/cluster/SidecarLoadBalancingPolicy.java | 2 +- 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/common/src/main/java/org/apache/cassandra/sidecar/common/utils/DriverUtils.java b/common/src/main/java/org/apache/cassandra/sidecar/common/utils/DriverUtils.java index b070637..aea1351 100644 --- a/common/src/main/java/org/apache/cassandra/sidecar/common/utils/DriverUtils.java +++ b/common/src/main/java/org/apache/cassandra/sidecar/common/utils/DriverUtils.java @@ -54,4 +54,15 @@ public class DriverUtils { return com.datastax.driver.core.DriverUtils.getHost(metadata, localNativeTransportAddress); } + +/** + * Returns the address that the driver will use to connect to the node. + * + * @param host the host to which reconnect attempts will be made + * @return the address. + */ +public InetSocketAddress getSocketAddress(Host host) +{ +return host.getEndPoint().resolve(); +} } diff --git a/src/main/java/org/apache/cassandra/sidecar/cluster/CassandraAdapterDelegate.java b/src/main/java/org/apache/cassandra/sidecar/cluster/CassandraAdapterDelegate.java index cc0a952..5a1628d 100644 --- a/src/main/java/org/apache/cassandra/sidecar/cluster/CassandraAdapterDelegate.java +++ b/src/main/java/org/apache/cassandra/sidecar/cluster/CassandraAdapterDelegate.java @@ -547,7 +547,7 @@ public class CassandraAdapterDelegate implements ICassandraAdapter, Host.StateLi private void runIfThisHost(Host host, Runnable runnable) { -if (this.localNativeTransportAddress.equals(host.getEndPoint().resolve())) +if (this.localNativeTransportAddress.equals(driverUtils.getSocketAddress(host))) { runnable.run(); } diff --git a/src/main/java/org/apache/cassandra/sidecar/cluster/SidecarLoadBalancingPolicy.java b/src/main/java/org/apache/cassandra/sidecar/cluster/SidecarLoadBalancingPolicy.java index bd6ae95..7bbe6be 100644 --- a/src/main/java/org/apache/cassandra/sidecar/cluster/SidecarLoadBalancingPolicy.java +++ b/src/main/java/org/apache/cassandra/sidecar/cluster/SidecarLoadBalancingPolicy.java @@ -233,6 +233,6 @@ class SidecarLoadBalancingPolicy implements LoadBalancingPolicy private boolean isLocalHost(Host host) { -return localHostAddresses.contains(host.getEndPoint().resolve()); +return localHostAddresses.contains(driverUtils.getSocketAddress(host)); } } - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-123) Add missing method to retrieve the InetSocketAddress to DriverUtils
[ https://issues.apache.org/jira/browse/CASSANDRASC-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-123: -- Reviewers: Yifan Cai, Yifan Cai Status: Review In Progress (was: Patch Available) +1 > Add missing method to retrieve the InetSocketAddress to DriverUtils > --- > > Key: CASSANDRASC-123 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-123 > Project: Sidecar for Apache Cassandra > Issue Type: Bug > Components: Rest API >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > Labels: pull-request-available > > Sidecar introduced a shim layer to access the java driver in CASSANDRASC-79, > and later enhanced that access in CASSANDRASC-88. However, the > {{getInetSocketAddress}} was missed in the shim layer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-123) Add missing method to retrieve the InetSocketAddress to DriverUtils
[ https://issues.apache.org/jira/browse/CASSANDRASC-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRASC-123: -- Status: Ready to Commit (was: Review In Progress) > Add missing method to retrieve the InetSocketAddress to DriverUtils > --- > > Key: CASSANDRASC-123 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-123 > Project: Sidecar for Apache Cassandra > Issue Type: Bug > Components: Rest API >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > Labels: pull-request-available > > Sidecar introduced a shim layer to access the java driver in CASSANDRASC-79, > and later enhanced that access in CASSANDRASC-88. However, the > {{getInetSocketAddress}} was missed in the shim layer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19582) [Analytics] Consume new Sidecar client API to stream SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-19582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francisco Guerrero updated CASSANDRA-19582: --- Test and Documentation Plan: Update unit tests to use the new API Status: Patch Available (was: In Progress) PR: https://github.com/apache/cassandra-analytics/pull/54 CI: https://app.circleci.com/pipelines/github/frankgh/cassandra-analytics/178 > [Analytics] Consume new Sidecar client API to stream SSTables > - > > Key: CASSANDRA-19582 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19582 > Project: Cassandra > Issue Type: Improvement > Components: Analytics Library >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > A new client API was recently introduced in Sidecar to stream SSTables. > Cassandra Analytics needs to start consuming the new API in order to take > advantage of the fixes when streaming SSTables from a Cassandra installation > with more than one data directory. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-123) Add missing method to retrieve the InetSocketAddress to DriverUtils
[ https://issues.apache.org/jira/browse/CASSANDRASC-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francisco Guerrero updated CASSANDRASC-123: --- Authors: Francisco Guerrero Test and Documentation Plan: Existing unit tests Status: Patch Available (was: In Progress) PR: https://github.com/apache/cassandra-sidecar/pull/114 CI: https://app.circleci.com/pipelines/github/frankgh/cassandra-sidecar/481 > Add missing method to retrieve the InetSocketAddress to DriverUtils > --- > > Key: CASSANDRASC-123 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-123 > Project: Sidecar for Apache Cassandra > Issue Type: Bug > Components: Rest API >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > Labels: pull-request-available > > Sidecar introduced a shim layer to access the java driver in CASSANDRASC-79, > and later enhanced that access in CASSANDRASC-88. However, the > {{getInetSocketAddress}} was missed in the shim layer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRASC-123) Add missing method to retrieve the InetSocketAddress to DriverUtils
[ https://issues.apache.org/jira/browse/CASSANDRASC-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francisco Guerrero updated CASSANDRASC-123: --- Bug Category: Parent values: Correctness(12982)Level 1 values: Consistency(12989) Complexity: Low Hanging Fruit Discovered By: User Report Severity: Normal Status: Open (was: Triage Needed) > Add missing method to retrieve the InetSocketAddress to DriverUtils > --- > > Key: CASSANDRASC-123 > URL: https://issues.apache.org/jira/browse/CASSANDRASC-123 > Project: Sidecar for Apache Cassandra > Issue Type: Bug > Components: Rest API >Reporter: Francisco Guerrero >Assignee: Francisco Guerrero >Priority: Normal > > Sidecar introduced a shim layer to access the java driver in CASSANDRASC-79, > and later enhanced that access in CASSANDRASC-88. However, the > {{getInetSocketAddress}} was missed in the shim layer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRASC-123) Add missing method to retrieve the InetSocketAddress to DriverUtils
Francisco Guerrero created CASSANDRASC-123: -- Summary: Add missing method to retrieve the InetSocketAddress to DriverUtils Key: CASSANDRASC-123 URL: https://issues.apache.org/jira/browse/CASSANDRASC-123 Project: Sidecar for Apache Cassandra Issue Type: Bug Components: Rest API Reporter: Francisco Guerrero Assignee: Francisco Guerrero Sidecar introduced a shim layer to access the java driver in CASSANDRASC-79, and later enhanced that access in CASSANDRASC-88. However, the {{getInetSocketAddress}} was missed in the shim layer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19577) Queries are not visible to the "system_views.queries" virtual table at the coordinator level
[ https://issues.apache.org/jira/browse/CASSANDRA-19577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839848#comment-17839848 ] Caleb Rackliffe edited comment on CASSANDRA-19577 at 4/23/24 4:29 PM: -- The [4.1 patch|https://github.com/apache/cassandra/pull/3268] is up. CI results are clean, modulo a couple environment specific things, OOMs, and known issues that have nothing to do w/ vtables or {{DebuggableTask}}. was (Author: maedhroz): The [4.1 patch|https://github.com/apache/cassandra/pull/3268] is up. CI results will be posted soon... > Queries are not visible to the "system_views.queries" virtual table at the > coordinator level > > > Key: CASSANDRA-19577 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19577 > Project: Cassandra > Issue Type: Bug > Components: Feature/Virtual Tables >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.1 > > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > There appears to be a hole in the implementation of CASSANDRA-15241 where > {{DebuggableTasks}} at the coordinator are not preserved through the creation > of {{FutureTasks}} in {{TaskFactory}}. This means that {{QueriesTable}} can't > see them when is asks {{SharedExecutorPool}} for running tasks. It should be > possible to fix this in {{TaskFactory}} by making sure to propagate any > {{RunnableDebuggableTask}} we encounter. We already do this in > {{toExecute()}}, but it also needs to happen in the relevant {{toSubmit()}} > method(s). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19577) Queries are not visible to the "system_views.queries" virtual table at the coordinator level
[ https://issues.apache.org/jira/browse/CASSANDRA-19577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-19577: Attachment: ci_summary.html > Queries are not visible to the "system_views.queries" virtual table at the > coordinator level > > > Key: CASSANDRA-19577 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19577 > Project: Cassandra > Issue Type: Bug > Components: Feature/Virtual Tables >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.1 > > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > There appears to be a hole in the implementation of CASSANDRA-15241 where > {{DebuggableTasks}} at the coordinator are not preserved through the creation > of {{FutureTasks}} in {{TaskFactory}}. This means that {{QueriesTable}} can't > see them when is asks {{SharedExecutorPool}} for running tasks. It should be > possible to fix this in {{TaskFactory}} by making sure to propagate any > {{RunnableDebuggableTask}} we encounter. We already do this in > {{toExecute()}}, but it also needs to happen in the relevant {{toSubmit()}} > method(s). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18112) Add the feature of INDEX HINT for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-18112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-18112: Epic Link: CASSANDRA-19224 > Add the feature of INDEX HINT for CQL > -- > > Key: CASSANDRA-18112 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18112 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Syntax, Feature/SAI, Legacy/CQL >Reporter: Maxwell Guo >Assignee: Maxwell Guo >Priority: Normal > Fix For: 5.0.x > > > It seems that CQL do not have the ability of INDEX HINT , such as when we > have more than one secondary index for some data table,And if the query hit > the indexes, the index with more estimate rows will be returned. But if we > want the query to be executed under our willing , we can use a hint like > ,hint specified index or ignore the index. > At first I want to open a jira that to add the feature of hint for CQL ,But I > think that may be a gigantic task with no clear goal. > Besides I think there may need a DISSCUSS for the specific grammatical form > before starting the work. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19577) Queries are not visible to the "system_views.queries" virtual table at the coordinator level
[ https://issues.apache.org/jira/browse/CASSANDRA-19577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caleb Rackliffe updated CASSANDRA-19577: Reviewers: Chris Lohfink > Queries are not visible to the "system_views.queries" virtual table at the > coordinator level > > > Key: CASSANDRA-19577 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19577 > Project: Cassandra > Issue Type: Bug > Components: Feature/Virtual Tables >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.1 > > Time Spent: 10m > Remaining Estimate: 0h > > There appears to be a hole in the implementation of CASSANDRA-15241 where > {{DebuggableTasks}} at the coordinator are not preserved through the creation > of {{FutureTasks}} in {{TaskFactory}}. This means that {{QueriesTable}} can't > see them when is asks {{SharedExecutorPool}} for running tasks. It should be > possible to fix this in {{TaskFactory}} by making sure to propagate any > {{RunnableDebuggableTask}} we encounter. We already do this in > {{toExecute()}}, but it also needs to happen in the relevant {{toSubmit()}} > method(s). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness
[ https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic reassigned CASSANDRA-19572: - Assignee: Stefan Miklosovic > Test failure: org.apache.cassandra.db.ImportTest flakiness > -- > > Key: CASSANDRA-19572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19572 > Project: Cassandra > Issue Type: Bug > Components: Tool/bulk load >Reporter: Brandon Williams >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > As discovered on CASSANDRA-19401, the tests in this class are flaky, at least > the following: > * testImportCorruptWithoutValidationWithCopying > * testImportInvalidateCache > * testImportCorruptWithCopying > * testImportCacheEnabledWithoutSrcDir > [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness
[ https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840112#comment-17840112 ] Stefan Miklosovic commented on CASSANDRA-19572: --- I have clean 5k here https://app.circleci.com/pipelines/github/instaclustr/cassandra/4223/workflows/a82d0483-a0df-44ed-8127-088b303c78ba/jobs/225432/steps I put SSTableReader.resetTidying() into "after test", I noticed that in "after", there are still uncleared references and cleaning them up seem to help. I will prepare other branches. > Test failure: org.apache.cassandra.db.ImportTest flakiness > -- > > Key: CASSANDRA-19572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19572 > Project: Cassandra > Issue Type: Bug > Components: Tool/bulk load >Reporter: Brandon Williams >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > As discovered on CASSANDRA-19401, the tests in this class are flaky, at least > the following: > * testImportCorruptWithoutValidationWithCopying > * testImportInvalidateCache > * testImportCorruptWithCopying > * testImportCacheEnabledWithoutSrcDir > [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19498) Error reading data from credential file
[ https://issues.apache.org/jira/browse/CASSANDRA-19498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840086#comment-17840086 ] Brad Schoening commented on CASSANDRA-19498: [~slavavrn] any suggestions regarding the test_legacy_auth.py? No other pytest imports from bin.cqlsh. > Error reading data from credential file > --- > > Key: CASSANDRA-19498 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19498 > Project: Cassandra > Issue Type: Bug > Components: Documentation, Tool/cqlsh >Reporter: Slava >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > The pylib/cqlshlib/cqlshmain.py code reads data from the credentials file, > however, it is immediately ignored. > https://github.com/apache/cassandra/blob/c9625e0102dab66f41d3ef2338c54d499e73a8c5/pylib/cqlshlib/cqlshmain.py#L2070 > {code:java} > if not options.username: > credentials = configparser.ConfigParser() > if options.credentials is not None: > credentials.read(options.credentials) # use the username > from credentials file but fallback to cqlshrc if username is absent from the > command line parameters > options.username = username_from_cqlshrc if not options.password: > rawcredentials = configparser.RawConfigParser() > if options.credentials is not None: > rawcredentials.read(options.credentials) # handling > password in the same way as username, priority cli > credentials > cqlshrc > options.password = option_with_default(rawcredentials.get, > 'plain_text_auth', 'password', password_from_cqlshrc) > options.password = password_from_cqlshrc{code} > These corrections have been made in accordance with > https://issues.apache.org/jira/browse/CASSANDRA-16983 and > https://issues.apache.org/jira/browse/CASSANDRA-16456. > The documentation does not indicate that AuthProviders can be used in the > cqlshrc and credentials files. > I propose to return the ability to use the legacy option of specifying the > user and password in the credentials file in the [plain_text_auth] section. > It is also required to describe the rules for using the credentials file in > the documentation. > I can make a corresponding pull request. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17667) Text value containing "/*" interpreted as multiline comment in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-17667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840085#comment-17840085 ] Brad Schoening commented on CASSANDRA-17667: [~brandon.williams] thanks, it's not forgotten about it, it's next up after I close out CASSANDRA-19450 and 19498. > Text value containing "/*" interpreted as multiline comment in cqlsh > > > Key: CASSANDRA-17667 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17667 > Project: Cassandra > Issue Type: Bug > Components: CQL/Interpreter >Reporter: ANOOP THOMAS >Assignee: Brad Schoening >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > I use CQLSH command line utility to load some DDLs. The version of utility I > use is this: > {noformat} > [cqlsh 6.0.0 | Cassandra 4.0.0.47 | CQL spec 3.4.5 | Native protocol > v5]{noformat} > Command that loads DDL.cql: > {noformat} > cqlsh -u username -p password cassandra.example.com 65503 --ssl -f DDL.cql > {noformat} > I have a line in CQL script that breaks the syntax. > {noformat} > INSERT into tablename (key,columnname1,columnname2) VALUES > ('keyName','value1','/value2/*/value3');{noformat} > {{/*}} here is interpreted as start of multi-line comment. It used to work on > older versions of cqlsh. The error I see looks like this: > {noformat} > SyntaxException: line 4:2 mismatched input 'Update' expecting ')' > (...,'value1','/value2INSERT into tablename(INSERT into tablename > (key,columnname1,columnname2)) VALUES ('[Update]-...) SyntaxException: line > 1:0 no viable alternative at input '(' ([(]...) > {noformat} > Same behavior while running in interactive mode too. {{/*}} inside a CQL > statement should not be interpreted as start of multi-line comment. > With schema: > {code:java} > CREATE TABLE tablename ( key text primary key, columnname1 text, columnname2 > text);{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19558) Standalone jenkinsfile first round bug fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-19558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-19558: --- Reviewers: Brandon Williams, Michael Semb Wever (was: Brandon Williams) Status: Review In Progress (was: Patch Available) > Standalone jenkinsfile first round bug fixes > > > Key: CASSANDRA-19558 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19558 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: CASSANDRA-19558_50_#5_ci_summary.html, > CASSANDRA-19558_50_#5_results_details.tar.xz, > CASSANDRA-19558-5.0_#13_ci_summary.html, > CASSANDRA-19558-5.0_#13_results_details.tar.xz, > CASSANDRA-19558-5.0_#16_ci_summary.html, > CASSANDRA-19558-5.0_#16_results_details.tar.xz, > CASSANDRA-19558_#8_ci_summary.html, CASSANDRA-19558_#8_results_details.tar.xz > > > A few follow up improvements and bug fixes for the standalone jenkinsfile. > - add at top a list of test failures in ci_summary.html > - docker scripts always try to login (as base images need to be pulled too) > - move simulator-dtests to large containers (they need 8g just heap) > - in ubuntu2004_test.docker make sure /home/cassandra exists and has correct > perms (from marcuse) > - persist the jenkinsfile parameters from run to run (important for the > post-commit jobs to keep their non-default branch and profile values) (was > CASSANDRA-19536) > - increase jvm-dtest splits from 8 to 12 > - when on ci-cassandra, replace use of copyArtifacts in Jenkinsfile > generateTestReports() with manual wget of test files, allowing the summary > phase to be run on any agent (copyArtifact would take >4hrs otherwise) (was > INFRA-25694) > - copy ci_summary.html and results_details.tar.xz to nightlies -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19558) Standalone jenkinsfile first round bug fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-19558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840060#comment-17840060 ] Michael Semb Wever edited comment on CASSANDRA-19558 at 4/23/24 10:55 AM: -- (1) is committed. wrt (2). hit a few small problems. INFRA-25738 being one (though unrelated to the patch). needed to make it work for the <5 jobs, and against different possible python versions (>=3.6). these are fixed. i'm ready to commit (2) which is now [https://github.com/apache/cassandra-builds/compare/trunk...thelastpickle:cassandra-builds:mck/19558] (excluding the throwaway commit, ofc) Otherwise we have good test runs in #3 to #7 in [https://ci-cassandra.apache.org/job/Cassandra-devbranch-before-5-artifacts/] and in #588 to #594 in [https://ci-cassandra.apache.org/job/Cassandra-4.1-artifacts/] For (3) it is also needed ensuring it works when github is down (or goes down mid build)… was (Author: michaelsembwever): (1) is committed. wrt (2). hit a few small problems. INFRA-25738 being one (though unrelated to the patch). needed to make it work for the <5 jobs, and against different possible python versions (>=3.6). these are fixed. i'm ready to commit (2) which is now https://github.com/apache/cassandra-builds/compare/trunk...thelastpickle:cassandra-builds:mck/19558 Otherwise we have good test runs in #3 to #7 in [https://ci-cassandra.apache.org/job/Cassandra-devbranch-before-5-artifacts/] and in #588 to #594 in [https://ci-cassandra.apache.org/job/Cassandra-4.1-artifacts/] For (3) it is also needed ensuring it works when github is down (or goes down mid build)… > Standalone jenkinsfile first round bug fixes > > > Key: CASSANDRA-19558 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19558 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: CASSANDRA-19558_50_#5_ci_summary.html, > CASSANDRA-19558_50_#5_results_details.tar.xz, > CASSANDRA-19558-5.0_#13_ci_summary.html, > CASSANDRA-19558-5.0_#13_results_details.tar.xz, > CASSANDRA-19558-5.0_#16_ci_summary.html, > CASSANDRA-19558-5.0_#16_results_details.tar.xz, > CASSANDRA-19558_#8_ci_summary.html, CASSANDRA-19558_#8_results_details.tar.xz > > > A few follow up improvements and bug fixes for the standalone jenkinsfile. > - add at top a list of test failures in ci_summary.html > - docker scripts always try to login (as base images need to be pulled too) > - move simulator-dtests to large containers (they need 8g just heap) > - in ubuntu2004_test.docker make sure /home/cassandra exists and has correct > perms (from marcuse) > - persist the jenkinsfile parameters from run to run (important for the > post-commit jobs to keep their non-default branch and profile values) (was > CASSANDRA-19536) > - increase jvm-dtest splits from 8 to 12 > - when on ci-cassandra, replace use of copyArtifacts in Jenkinsfile > generateTestReports() with manual wget of test files, allowing the summary > phase to be run on any agent (copyArtifact would take >4hrs otherwise) (was > INFRA-25694) > - copy ci_summary.html and results_details.tar.xz to nightlies -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19558) Standalone jenkinsfile first round bug fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-19558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-19558: --- Status: Ready to Commit (was: Review In Progress) > Standalone jenkinsfile first round bug fixes > > > Key: CASSANDRA-19558 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19558 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: CASSANDRA-19558_50_#5_ci_summary.html, > CASSANDRA-19558_50_#5_results_details.tar.xz, > CASSANDRA-19558-5.0_#13_ci_summary.html, > CASSANDRA-19558-5.0_#13_results_details.tar.xz, > CASSANDRA-19558-5.0_#16_ci_summary.html, > CASSANDRA-19558-5.0_#16_results_details.tar.xz, > CASSANDRA-19558_#8_ci_summary.html, CASSANDRA-19558_#8_results_details.tar.xz > > > A few follow up improvements and bug fixes for the standalone jenkinsfile. > - add at top a list of test failures in ci_summary.html > - docker scripts always try to login (as base images need to be pulled too) > - move simulator-dtests to large containers (they need 8g just heap) > - in ubuntu2004_test.docker make sure /home/cassandra exists and has correct > perms (from marcuse) > - persist the jenkinsfile parameters from run to run (important for the > post-commit jobs to keep their non-default branch and profile values) (was > CASSANDRA-19536) > - increase jvm-dtest splits from 8 to 12 > - when on ci-cassandra, replace use of copyArtifacts in Jenkinsfile > generateTestReports() with manual wget of test files, allowing the summary > phase to be run on any agent (copyArtifact would take >4hrs otherwise) (was > INFRA-25694) > - copy ci_summary.html and results_details.tar.xz to nightlies -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19558) Standalone jenkinsfile first round bug fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-19558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840060#comment-17840060 ] Michael Semb Wever edited comment on CASSANDRA-19558 at 4/23/24 10:51 AM: -- (1) is committed. wrt (2). hit a few small problems. INFRA-25738 being one (though unrelated to the patch). needed to make it work for the <5 jobs, and against different possible python versions (>=3.6). these are fixed. i'm ready to commit (2) which is now https://github.com/apache/cassandra-builds/compare/trunk...thelastpickle:cassandra-builds:mck/19558 Otherwise we have good test runs in #3 to #7 in [https://ci-cassandra.apache.org/job/Cassandra-devbranch-before-5-artifacts/] and in #588 to #594 in [https://ci-cassandra.apache.org/job/Cassandra-4.1-artifacts/] For (3) it is also needed ensuring it works when github is down (or goes down mid build)… was (Author: michaelsembwever): (1) is committed. wrt (2). hit a few small problems. INFRA-25738 being one (though unrelated to the patch). needed to make it work for the <5 jobs, and against different possible python versions (>=3.6). these are fixed. i'm ready to commit (2). Otherwise we have good test runs in #3 to #7 in [https://ci-cassandra.apache.org/job/Cassandra-devbranch-before-5-artifacts/] and in #588 to #594 in [https://ci-cassandra.apache.org/job/Cassandra-4.1-artifacts/] For (3) it is also needed ensuring it works when github is down (or goes down mid build)… > Standalone jenkinsfile first round bug fixes > > > Key: CASSANDRA-19558 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19558 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: CASSANDRA-19558_50_#5_ci_summary.html, > CASSANDRA-19558_50_#5_results_details.tar.xz, > CASSANDRA-19558-5.0_#13_ci_summary.html, > CASSANDRA-19558-5.0_#13_results_details.tar.xz, > CASSANDRA-19558-5.0_#16_ci_summary.html, > CASSANDRA-19558-5.0_#16_results_details.tar.xz, > CASSANDRA-19558_#8_ci_summary.html, CASSANDRA-19558_#8_results_details.tar.xz > > > A few follow up improvements and bug fixes for the standalone jenkinsfile. > - add at top a list of test failures in ci_summary.html > - docker scripts always try to login (as base images need to be pulled too) > - move simulator-dtests to large containers (they need 8g just heap) > - in ubuntu2004_test.docker make sure /home/cassandra exists and has correct > perms (from marcuse) > - persist the jenkinsfile parameters from run to run (important for the > post-commit jobs to keep their non-default branch and profile values) (was > CASSANDRA-19536) > - increase jvm-dtest splits from 8 to 12 > - when on ci-cassandra, replace use of copyArtifacts in Jenkinsfile > generateTestReports() with manual wget of test files, allowing the summary > phase to be run on any agent (copyArtifact would take >4hrs otherwise) (was > INFRA-25694) > - copy ci_summary.html and results_details.tar.xz to nightlies -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19558) Standalone jenkinsfile first round bug fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-19558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840060#comment-17840060 ] Michael Semb Wever edited comment on CASSANDRA-19558 at 4/23/24 10:50 AM: -- (1) is committed. wrt (2). hit a few small problems. INFRA-25738 being one (though unrelated to the patch). needed to make it work for the <5 jobs, and against different possible python versions (>=3.6). these are fixed. i'm ready to commit (2). Otherwise we have good test runs in #3 to #7 in [https://ci-cassandra.apache.org/job/Cassandra-devbranch-before-5-artifacts/] and in #588 to #594 in [https://ci-cassandra.apache.org/job/Cassandra-4.1-artifacts/] For (3) it is also needed ensuring it works when github is down (or goes down mid build)… was (Author: michaelsembwever): (1) is committed. working on (2). hit a few small problems. INFRA-25738 being one (though unrelated to the patch). needed to make it work for the <5 jobs, and against different possible python versions (>=3.6), and ensuring it works when github is down (or goes down mid build)… Otherwise we have good test runs in #3 to #7 in [https://ci-cassandra.apache.org/job/Cassandra-devbranch-before-5-artifacts/] and in #588 to #594 in [https://ci-cassandra.apache.org/job/Cassandra-4.1-artifacts/] > Standalone jenkinsfile first round bug fixes > > > Key: CASSANDRA-19558 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19558 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: CASSANDRA-19558_50_#5_ci_summary.html, > CASSANDRA-19558_50_#5_results_details.tar.xz, > CASSANDRA-19558-5.0_#13_ci_summary.html, > CASSANDRA-19558-5.0_#13_results_details.tar.xz, > CASSANDRA-19558-5.0_#16_ci_summary.html, > CASSANDRA-19558-5.0_#16_results_details.tar.xz, > CASSANDRA-19558_#8_ci_summary.html, CASSANDRA-19558_#8_results_details.tar.xz > > > A few follow up improvements and bug fixes for the standalone jenkinsfile. > - add at top a list of test failures in ci_summary.html > - docker scripts always try to login (as base images need to be pulled too) > - move simulator-dtests to large containers (they need 8g just heap) > - in ubuntu2004_test.docker make sure /home/cassandra exists and has correct > perms (from marcuse) > - persist the jenkinsfile parameters from run to run (important for the > post-commit jobs to keep their non-default branch and profile values) (was > CASSANDRA-19536) > - increase jvm-dtest splits from 8 to 12 > - when on ci-cassandra, replace use of copyArtifacts in Jenkinsfile > generateTestReports() with manual wget of test files, allowing the summary > phase to be run on any agent (copyArtifact would take >4hrs otherwise) (was > INFRA-25694) > - copy ci_summary.html and results_details.tar.xz to nightlies -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19558) Standalone jenkinsfile first round bug fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-19558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840060#comment-17840060 ] Michael Semb Wever edited comment on CASSANDRA-19558 at 4/23/24 10:48 AM: -- (1) is committed. working on (2). hit a few small problems. INFRA-25738 being one (though unrelated to the patch). needed to make it work for the <5 jobs, and against different possible python versions (>=3.6), and ensuring it works when github is down (or goes down mid build)… Otherwise we have good test runs in #3 to #7 in [https://ci-cassandra.apache.org/job/Cassandra-devbranch-before-5-artifacts/] and in #588 to #594 in [https://ci-cassandra.apache.org/job/Cassandra-4.1-artifacts/] was (Author: michaelsembwever): (1) is committed. working on (2). hit a few small problems. INFRA-25738 being one (though unrelated to the patch). needed to make it work for the <5 jobs, and against different possible python versions (>=3.6), and ensuring it works when github is down (or goes down mid build)… > Standalone jenkinsfile first round bug fixes > > > Key: CASSANDRA-19558 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19558 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: CASSANDRA-19558_50_#5_ci_summary.html, > CASSANDRA-19558_50_#5_results_details.tar.xz, > CASSANDRA-19558-5.0_#13_ci_summary.html, > CASSANDRA-19558-5.0_#13_results_details.tar.xz, > CASSANDRA-19558-5.0_#16_ci_summary.html, > CASSANDRA-19558-5.0_#16_results_details.tar.xz, > CASSANDRA-19558_#8_ci_summary.html, CASSANDRA-19558_#8_results_details.tar.xz > > > A few follow up improvements and bug fixes for the standalone jenkinsfile. > - add at top a list of test failures in ci_summary.html > - docker scripts always try to login (as base images need to be pulled too) > - move simulator-dtests to large containers (they need 8g just heap) > - in ubuntu2004_test.docker make sure /home/cassandra exists and has correct > perms (from marcuse) > - persist the jenkinsfile parameters from run to run (important for the > post-commit jobs to keep their non-default branch and profile values) (was > CASSANDRA-19536) > - increase jvm-dtest splits from 8 to 12 > - when on ci-cassandra, replace use of copyArtifacts in Jenkinsfile > generateTestReports() with manual wget of test files, allowing the summary > phase to be run on any agent (copyArtifact would take >4hrs otherwise) (was > INFRA-25694) > - copy ci_summary.html and results_details.tar.xz to nightlies -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19558) Standalone jenkinsfile first round bug fixes
[ https://issues.apache.org/jira/browse/CASSANDRA-19558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840060#comment-17840060 ] Michael Semb Wever commented on CASSANDRA-19558: (1) is committed. working on (2). hit a few small problems. INFRA-25738 being one (though unrelated to the patch). needed to make it work for the <5 jobs, and against different possible python versions (>=3.6), and ensuring it works when github is down (or goes down mid build)… > Standalone jenkinsfile first round bug fixes > > > Key: CASSANDRA-19558 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19558 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 5.0.x, 5.x > > Attachments: CASSANDRA-19558_50_#5_ci_summary.html, > CASSANDRA-19558_50_#5_results_details.tar.xz, > CASSANDRA-19558-5.0_#13_ci_summary.html, > CASSANDRA-19558-5.0_#13_results_details.tar.xz, > CASSANDRA-19558-5.0_#16_ci_summary.html, > CASSANDRA-19558-5.0_#16_results_details.tar.xz, > CASSANDRA-19558_#8_ci_summary.html, CASSANDRA-19558_#8_results_details.tar.xz > > > A few follow up improvements and bug fixes for the standalone jenkinsfile. > - add at top a list of test failures in ci_summary.html > - docker scripts always try to login (as base images need to be pulled too) > - move simulator-dtests to large containers (they need 8g just heap) > - in ubuntu2004_test.docker make sure /home/cassandra exists and has correct > perms (from marcuse) > - persist the jenkinsfile parameters from run to run (important for the > post-commit jobs to keep their non-default branch and profile values) (was > CASSANDRA-19536) > - increase jvm-dtest splits from 8 to 12 > - when on ci-cassandra, replace use of copyArtifacts in Jenkinsfile > generateTestReports() with manual wget of test files, allowing the summary > phase to be run on any agent (copyArtifact would take >4hrs otherwise) (was > INFRA-25694) > - copy ci_summary.html and results_details.tar.xz to nightlies -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18942) Repeatable java test runs on jenkins
[ https://issues.apache.org/jira/browse/CASSANDRA-18942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839853#comment-17839853 ] Michael Semb Wever edited comment on CASSANDRA-18942 at 4/23/24 10:43 AM: -- dependencies are all done, re-opening. [~bereng] , can we rebase this, and make the changes in the existing scripts (i'll review and help with ensuring the changes don't break compat) was (Author: michaelsembwever): dependencies are all done, re-opening. > Repeatable java test runs on jenkins > > > Key: CASSANDRA-18942 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18942 > Project: Cassandra > Issue Type: New Feature > Components: Build, CI >Reporter: Berenguer Blasi >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 5.0, 5.0.x > > Attachments: jenkins_job.xml, testJava.txt, testJavaDocker.txt, > testJavaSplits.txt > > Time Spent: 1h 20m > Remaining Estimate: 0h > > It is our policy to loop new introduced tests to avoid introducing flakies. > We also want to add the possibility to repeat a test N number of times to > test robustness, debug flakies, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability
[ https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-19534: Attachment: Scenario 2 - QUEUE + Backpressure.jpg Scenario 2 - QUEUE.jpg Scenario 2 - Stock.jpg > unbounded queues in native transport requests lead to node instability > -- > > Key: CASSANDRA-19534 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19534 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Jon Haddad >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.0-rc, 5.x > > Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - > QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, > Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg > > > When a node is under pressure, hundreds of thousands of requests can show up > in the native transport queue, and it looks like it can take way longer to > timeout than is configured. We should be shedding load much more > aggressively and use a bounded queue for incoming work. This is extremely > evident when we combine a resource consuming workload with a smaller one: > Running 5.0 HEAD on a single node as of today: > {noformat} > # populate only > easy-cass-stress run RandomPartitionAccess -p 100 -r 1 > --workload.rows=10 --workload.select=partition --maxrlat 100 --populate > 10m --rate 50k -n 1 > # workload 1 - larger reads > easy-cass-stress run RandomPartitionAccess -p 100 -r 1 > --workload.rows=10 --workload.select=partition --rate 200 -d 1d > # second workload - small reads > easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat} > It appears our results don't time out at the requested server time either: > > {noformat} > Writes Reads > Deletes Errors > Count Latency (p99) 1min (req/s) | Count Latency (p99) 1min (req/s) | > Count Latency (p99) 1min (req/s) | Count 1min (errors/s) > 950286 70403.93 634.77 | 789524 70442.07 426.02 | > 0 0 0 | 9580484 18980.45 > 952304 70567.62 640.1 | 791072 70634.34 428.36 | > 0 0 0 | 9636658 18969.54 > 953146 70767.34 640.1 | 791400 70767.76 428.36 | > 0 0 0 | 9695272 18969.54 > 956833 71171.28 623.14 | 794009 71175.6 412.79 | > 0 0 0 | 9749377 19002.44 > 959627 71312.58 656.93 | 795703 71349.87 435.56 | > 0 0 0 | 9804907 18943.11{noformat} > > After stopping the load test altogether, it took nearly a minute before the > requests were no longer queued. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability
[ https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-19534: Attachment: Scenario 1 - QUEUE.jpg Scenario 1 - QUEUE + Backpressure.jpg Scenario 1 - Stock.jpg > unbounded queues in native transport requests lead to node instability > -- > > Key: CASSANDRA-19534 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19534 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Jon Haddad >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.0-rc, 5.x > > Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - > QUEUE.jpg, Scenario 1 - Stock.jpg > > > When a node is under pressure, hundreds of thousands of requests can show up > in the native transport queue, and it looks like it can take way longer to > timeout than is configured. We should be shedding load much more > aggressively and use a bounded queue for incoming work. This is extremely > evident when we combine a resource consuming workload with a smaller one: > Running 5.0 HEAD on a single node as of today: > {noformat} > # populate only > easy-cass-stress run RandomPartitionAccess -p 100 -r 1 > --workload.rows=10 --workload.select=partition --maxrlat 100 --populate > 10m --rate 50k -n 1 > # workload 1 - larger reads > easy-cass-stress run RandomPartitionAccess -p 100 -r 1 > --workload.rows=10 --workload.select=partition --rate 200 -d 1d > # second workload - small reads > easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat} > It appears our results don't time out at the requested server time either: > > {noformat} > Writes Reads > Deletes Errors > Count Latency (p99) 1min (req/s) | Count Latency (p99) 1min (req/s) | > Count Latency (p99) 1min (req/s) | Count 1min (errors/s) > 950286 70403.93 634.77 | 789524 70442.07 426.02 | > 0 0 0 | 9580484 18980.45 > 952304 70567.62 640.1 | 791072 70634.34 428.36 | > 0 0 0 | 9636658 18969.54 > 953146 70767.34 640.1 | 791400 70767.76 428.36 | > 0 0 0 | 9695272 18969.54 > 956833 71171.28 623.14 | 794009 71175.6 412.79 | > 0 0 0 | 9749377 19002.44 > 959627 71312.58 656.93 | 795703 71349.87 435.56 | > 0 0 0 | 9804907 18943.11{noformat} > > After stopping the load test altogether, it took nearly a minute before the > requests were no longer queued. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability
[ https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840058#comment-17840058 ] Alex Petrov commented on CASSANDRA-19534: - The main change is the introduction of (currently implicit) configurable {_}native request deadline{_}. No request, read or write, will be allowed to prolong its execution beyond this deadline. Some of the hidden places that would allow requests to stay overdue were local executor runnables, replica-side writes, and hints. Default is 12 seconds, since this is how much time 3.x driver (which I believe is still the most used version in the community) waits until removing its handlers after which any response from the server will just be ignored. Now, there is an _option_ to enable expiration based on the queue time, which will be _disabled_ by default to preserve existing semantics, but my tests have shown enabling it only has positive effects. We will try it out cautiously in different clusters over the next months and will see if tests match up with real loads before we change any of the defaults. So by default behaviour will be as follows: # If request has spent more than 12 seconds in the NATIVE queue, we throw Overloaded exception back to the client. This timeout used to be max of read/write/range/counter rpc timeout. # If requests has spent less than 12 seconds, it is allowed to execute; any request issued by the coordinator can live: ## _either_ {{Verb.timeout}} number of milliseconds, ## _or_ up to the up to the native request deadline, as measured from the time when the request was admitted to the coordinator's NATIVE queue, whichever one of these is happening earlier. Example 1, read timeout is 5 seconds: # Client sends a request; request spends 6 seconds in the NATIVE queue # Coordinator issues requests to replicas; two replicas respond within 3 seconds # Coordinator responds to the client with success Example 2, read timeout is 5 seconds: # Client sends a request; request spends 6 seconds in the NATIVE queue # Coordinator issues requests to replicas; one replica responds within 3 seconds; other replicas fail to respond within 5 seconds of read timeout # Coordinator responds to the client with read timeout (preserves current behaviour) Example 3, read timeout is 5 seconds: # Client sends a request; request spends 10 seconds in the NATIVE queue # Coordinator issues requests to replicas; all replicas fail to respond within 2 seconds # Coordinator responds to the client with read timeout; if messages are still in queue on replicas, they will get dropped before processing There will be a _new_ metric that shows how many of the timeouts would have been “blind timeouts” previously. I.e. client _would_ register them as timeouts, but we as server-side operators would be oblivious to them. This metric will keep us collectively motivated even if we see there is a slight uptick in timeouts after committing the patch. Lastly, there is an option to say how much of the 12 seconds client requests are allowed to spend in the native queue. You can say that if there is a client request that has spent 80% of its max 12 seconds in the native queue, we start applying backpressure to the client socket (or throwing overloaded exception, depending on the value of {{{}native_transport_throw_on_overload{}}}). We have to be careful with enabling this one, since my tests have shown that while we see fewer timeouts server side, clients see more timeouts, because part of the time they consider “request time” is now spent somewhere in TCP queues, which we can not account for. h3. New Configuration Params h3. cql_start_time Configures what is considered to be a base for the replica-side timeout. This has actually existed before, it is now actually safe to enable. It still defaults to {{REQUEST}} (processing start time is taken as a timeout base), and an alternative is {{QUEUE}} (queue admission time is taken as a timeout base). Unfortunately, there is no consistent view of the timeout base in the community: some people think that server-side read/write timeouts are how much time _replicas_ have to respond to coordinator. Some believe they mean how much time _coordinator_ has to respond to the client. This patch is agnostic to these beliefs. h3. native_transport_throw_on_overload Whether we should apply backpressure to client (i.e. stop reading from the socket), or throw Overloaded exception. Default is socket backpressure, and this is probably fine for now. In principle, this can also be set by the client on per-connection basis via protocol options. However, 3.x series of the driver do not have this addition implemented, so in practice this is not really used. If used, setting from the client takes precedence. h3. native_transport_timeout_in_ms The absolute maximum amount of time the server has to respond to
[jira] [Updated] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness
[ https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19572: -- Description: As discovered on CASSANDRA-19401, the tests in this class are flaky, at least the following: * testImportCorruptWithoutValidationWithCopying * testImportInvalidateCache * testImportCorruptWithCopying * testImportCacheEnabledWithoutSrcDir [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests] was: As discovered on CASSANDRA-19401, the tests in this class are flaky, at least the following: * testImportCorruptWithoutValidationWithCopying * testImportInvalidateCache * testImportCorruptWithCopying * testImportCacheEnabledWithoutSrcDir * testImportInvalidateCache [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests] > Test failure: org.apache.cassandra.db.ImportTest flakiness > -- > > Key: CASSANDRA-19572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19572 > Project: Cassandra > Issue Type: Bug > Components: Tool/bulk load >Reporter: Brandon Williams >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > As discovered on CASSANDRA-19401, the tests in this class are flaky, at least > the following: > * testImportCorruptWithoutValidationWithCopying > * testImportInvalidateCache > * testImportCorruptWithCopying > * testImportCacheEnabledWithoutSrcDir > [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19190) ForceSnapshot transformations should not be persisted in the local log table
[ https://issues.apache.org/jira/browse/CASSANDRA-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-19190: Status: Ready to Commit (was: Review In Progress) > ForceSnapshot transformations should not be persisted in the local log table > > > Key: CASSANDRA-19190 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19190 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-2.html > > Time Spent: 10m > Remaining Estimate: 0h > > Per its inline comments, ForceSnapshot is a synthetic transformation whose > purpose it to enable the local log to jump missing epochs. A common use for > this is when replaying persisted events from the metadata log at startup. The > log is initialised with {{Epoch.EMPTY}} but rather that replaying every > single entry since the beginning of history, we select the most recent > snapshot held locally and start the replay from that point. Likewise, when > catching up from a peer, a node may receive a snapshot plus subsequent log > entries. In order to bring local metadata to the same state as the snapshot, > a {{ForceSnapshot}} with the same epoch as the snapshot is inserted into the > {{LocalLog}} and enacted like any other other transformation. These synthetic > transformations should not be persisted in the `system.local_metadata_log`, > as they do not exist in the distributed metadata log. We _should_ persist the > snapshot itself in {{system.metadata_snapshots}} so that we can avoid having > to re-fetch remote snapshots (i.e. if a node were to restart shortly after > receiving a catchup from a peer). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19190) ForceSnapshot transformations should not be persisted in the local log table
[ https://issues.apache.org/jira/browse/CASSANDRA-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-19190: Reviewers: Marcus Eriksson Status: Review In Progress (was: Patch Available) > ForceSnapshot transformations should not be persisted in the local log table > > > Key: CASSANDRA-19190 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19190 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-2.html > > Time Spent: 10m > Remaining Estimate: 0h > > Per its inline comments, ForceSnapshot is a synthetic transformation whose > purpose it to enable the local log to jump missing epochs. A common use for > this is when replaying persisted events from the metadata log at startup. The > log is initialised with {{Epoch.EMPTY}} but rather that replaying every > single entry since the beginning of history, we select the most recent > snapshot held locally and start the replay from that point. Likewise, when > catching up from a peer, a node may receive a snapshot plus subsequent log > entries. In order to bring local metadata to the same state as the snapshot, > a {{ForceSnapshot}} with the same epoch as the snapshot is inserted into the > {{LocalLog}} and enacted like any other other transformation. These synthetic > transformations should not be persisted in the `system.local_metadata_log`, > as they do not exist in the distributed metadata log. We _should_ persist the > snapshot itself in {{system.metadata_snapshots}} so that we can avoid having > to re-fetch remote snapshots (i.e. if a node were to restart shortly after > receiving a catchup from a peer). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19190) ForceSnapshot transformations should not be persisted in the local log table
[ https://issues.apache.org/jira/browse/CASSANDRA-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-19190: Source Control Link: https://github.com/apache/cassandra/commit/17ecece5437ab39aaeaa0eb4b42434cddd9960b5 Resolution: Fixed Status: Resolved (was: Ready to Commit) > ForceSnapshot transformations should not be persisted in the local log table > > > Key: CASSANDRA-19190 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19190 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-2.html > > Time Spent: 10m > Remaining Estimate: 0h > > Per its inline comments, ForceSnapshot is a synthetic transformation whose > purpose it to enable the local log to jump missing epochs. A common use for > this is when replaying persisted events from the metadata log at startup. The > log is initialised with {{Epoch.EMPTY}} but rather that replaying every > single entry since the beginning of history, we select the most recent > snapshot held locally and start the replay from that point. Likewise, when > catching up from a peer, a node may receive a snapshot plus subsequent log > entries. In order to bring local metadata to the same state as the snapshot, > a {{ForceSnapshot}} with the same epoch as the snapshot is inserted into the > {{LocalLog}} and enacted like any other other transformation. These synthetic > transformations should not be persisted in the `system.local_metadata_log`, > as they do not exist in the distributed metadata log. We _should_ persist the > snapshot itself in {{system.metadata_snapshots}} so that we can avoid having > to re-fetch remote snapshots (i.e. if a node were to restart shortly after > receiving a catchup from a peer). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch trunk updated: ForceSnapshot transformations should not be persisted in the local log table
This is an automated email from the ASF dual-hosted git repository. marcuse pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/trunk by this push: new 17ecece543 ForceSnapshot transformations should not be persisted in the local log table 17ecece543 is described below commit 17ecece5437ab39aaeaa0eb4b42434cddd9960b5 Author: Sam Tunnicliffe AuthorDate: Thu Dec 14 17:55:05 2023 + ForceSnapshot transformations should not be persisted in the local log table Patch by Sam Tunnicliffe; reviewed by marcuse for CASSANDRA-19190 --- .../apache/cassandra/schema/DistributedSchema.java | 11 +- .../org/apache/cassandra/tcm/ClusterMetadata.java | 2 +- .../cassandra/tcm/StubClusterMetadataService.java | 83 - .../tcm/listeners/MetadataSnapshotListener.java| 10 +- .../org/apache/cassandra/tcm/log/LocalLog.java | 6 +- .../test/log/ClusterMetadataTestHelper.java| 19 ++- .../listeners/MetadataSnapshotListenerTest.java| 133 + .../org/apache/cassandra/tcm/log/LocalLogTest.java | 54 + 8 files changed, 310 insertions(+), 8 deletions(-) diff --git a/src/java/org/apache/cassandra/schema/DistributedSchema.java b/src/java/org/apache/cassandra/schema/DistributedSchema.java index 86dd1d5117..a837b0773d 100644 --- a/src/java/org/apache/cassandra/schema/DistributedSchema.java +++ b/src/java/org/apache/cassandra/schema/DistributedSchema.java @@ -58,9 +58,16 @@ public class DistributedSchema implements MetadataValue return new DistributedSchema(Keyspaces.none(), Epoch.EMPTY); } -public static DistributedSchema first() +public static DistributedSchema first(Set knownDatacenters) { -return new DistributedSchema(Keyspaces.of(DistributedMetadataLogKeyspace.initialMetadata(Collections.singleton(DatabaseDescriptor.getLocalDataCenter(, Epoch.FIRST); +if (knownDatacenters.isEmpty()) +{ +if (DatabaseDescriptor.getLocalDataCenter() != null) +knownDatacenters = Collections.singleton(DatabaseDescriptor.getLocalDataCenter()); +else +knownDatacenters = Collections.singleton("DC1"); +} +return new DistributedSchema(Keyspaces.of(DistributedMetadataLogKeyspace.initialMetadata(knownDatacenters)), Epoch.FIRST); } private final Keyspaces keyspaces; diff --git a/src/java/org/apache/cassandra/tcm/ClusterMetadata.java b/src/java/org/apache/cassandra/tcm/ClusterMetadata.java index 33886bec40..fdf4942c13 100644 --- a/src/java/org/apache/cassandra/tcm/ClusterMetadata.java +++ b/src/java/org/apache/cassandra/tcm/ClusterMetadata.java @@ -107,7 +107,7 @@ public class ClusterMetadata @VisibleForTesting public ClusterMetadata(IPartitioner partitioner, Directory directory) { -this(partitioner, directory, DistributedSchema.first()); +this(partitioner, directory, DistributedSchema.first(directory.knownDatacenters())); } @VisibleForTesting diff --git a/src/java/org/apache/cassandra/tcm/StubClusterMetadataService.java b/src/java/org/apache/cassandra/tcm/StubClusterMetadataService.java index 475e8ef21b..8e191307d1 100644 --- a/src/java/org/apache/cassandra/tcm/StubClusterMetadataService.java +++ b/src/java/org/apache/cassandra/tcm/StubClusterMetadataService.java @@ -20,15 +20,24 @@ package org.apache.cassandra.tcm; import java.util.Collections; +import com.google.common.collect.ImmutableMap; + import org.apache.cassandra.config.DatabaseDescriptor; +import org.apache.cassandra.dht.IPartitioner; import org.apache.cassandra.schema.DistributedMetadataLogKeyspace; import org.apache.cassandra.schema.DistributedSchema; import org.apache.cassandra.schema.KeyspaceMetadata; import org.apache.cassandra.schema.Keyspaces; +import org.apache.cassandra.tcm.Commit.Replicator; import org.apache.cassandra.tcm.log.Entry; import org.apache.cassandra.tcm.log.LocalLog; import org.apache.cassandra.tcm.membership.Directory; +import org.apache.cassandra.tcm.ownership.DataPlacements; +import org.apache.cassandra.tcm.ownership.PlacementProvider; +import org.apache.cassandra.tcm.ownership.TokenMap; import org.apache.cassandra.tcm.ownership.UniformRangePlacement; +import org.apache.cassandra.tcm.sequences.InProgressSequences; +import org.apache.cassandra.tcm.sequences.LockedRanges; public class StubClusterMetadataService extends ClusterMetadataService { @@ -73,12 +82,24 @@ public class StubClusterMetadataService extends ClusterMetadataService .withInitialState(initial) .createLog(), new StubProcessor(), - Commit.Replicator.NO_OP, + Replicator.NO_OP, false); this.metadata = initial; this.log().readyUnchecked(); } +private StubClusterMetadataService(PlacementProvider
[jira] [Commented] (CASSANDRA-19190) ForceSnapshot transformations should not be persisted in the local log table
[ https://issues.apache.org/jira/browse/CASSANDRA-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840045#comment-17840045 ] Marcus Eriksson commented on CASSANDRA-19190: - attaching new ci run, two failures, CASSANDRA-17339 and a counter mismatch, so I'm +1 here, will get it committed > ForceSnapshot transformations should not be persisted in the local log table > > > Key: CASSANDRA-19190 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19190 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-2.html > > Time Spent: 10m > Remaining Estimate: 0h > > Per its inline comments, ForceSnapshot is a synthetic transformation whose > purpose it to enable the local log to jump missing epochs. A common use for > this is when replaying persisted events from the metadata log at startup. The > log is initialised with {{Epoch.EMPTY}} but rather that replaying every > single entry since the beginning of history, we select the most recent > snapshot held locally and start the replay from that point. Likewise, when > catching up from a peer, a node may receive a snapshot plus subsequent log > entries. In order to bring local metadata to the same state as the snapshot, > a {{ForceSnapshot}} with the same epoch as the snapshot is inserted into the > {{LocalLog}} and enacted like any other other transformation. These synthetic > transformations should not be persisted in the `system.local_metadata_log`, > as they do not exist in the distributed metadata log. We _should_ persist the > snapshot itself in {{system.metadata_snapshots}} so that we can avoid having > to re-fetch remote snapshots (i.e. if a node were to restart shortly after > receiving a catchup from a peer). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19190) ForceSnapshot transformations should not be persisted in the local log table
[ https://issues.apache.org/jira/browse/CASSANDRA-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-19190: Attachment: (was: ci_summary-1.html) > ForceSnapshot transformations should not be persisted in the local log table > > > Key: CASSANDRA-19190 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19190 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-2.html > > Time Spent: 10m > Remaining Estimate: 0h > > Per its inline comments, ForceSnapshot is a synthetic transformation whose > purpose it to enable the local log to jump missing epochs. A common use for > this is when replaying persisted events from the metadata log at startup. The > log is initialised with {{Epoch.EMPTY}} but rather that replaying every > single entry since the beginning of history, we select the most recent > snapshot held locally and start the replay from that point. Likewise, when > catching up from a peer, a node may receive a snapshot plus subsequent log > entries. In order to bring local metadata to the same state as the snapshot, > a {{ForceSnapshot}} with the same epoch as the snapshot is inserted into the > {{LocalLog}} and enacted like any other other transformation. These synthetic > transformations should not be persisted in the `system.local_metadata_log`, > as they do not exist in the distributed metadata log. We _should_ persist the > snapshot itself in {{system.metadata_snapshots}} so that we can avoid having > to re-fetch remote snapshots (i.e. if a node were to restart shortly after > receiving a catchup from a peer). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19190) ForceSnapshot transformations should not be persisted in the local log table
[ https://issues.apache.org/jira/browse/CASSANDRA-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-19190: Attachment: (was: ci_summary.html) > ForceSnapshot transformations should not be persisted in the local log table > > > Key: CASSANDRA-19190 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19190 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-2.html > > Time Spent: 10m > Remaining Estimate: 0h > > Per its inline comments, ForceSnapshot is a synthetic transformation whose > purpose it to enable the local log to jump missing epochs. A common use for > this is when replaying persisted events from the metadata log at startup. The > log is initialised with {{Epoch.EMPTY}} but rather that replaying every > single entry since the beginning of history, we select the most recent > snapshot held locally and start the replay from that point. Likewise, when > catching up from a peer, a node may receive a snapshot plus subsequent log > entries. In order to bring local metadata to the same state as the snapshot, > a {{ForceSnapshot}} with the same epoch as the snapshot is inserted into the > {{LocalLog}} and enacted like any other other transformation. These synthetic > transformations should not be persisted in the `system.local_metadata_log`, > as they do not exist in the distributed metadata log. We _should_ persist the > snapshot itself in {{system.metadata_snapshots}} so that we can avoid having > to re-fetch remote snapshots (i.e. if a node were to restart shortly after > receiving a catchup from a peer). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19190) ForceSnapshot transformations should not be persisted in the local log table
[ https://issues.apache.org/jira/browse/CASSANDRA-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-19190: Attachment: ci_summary-2.html > ForceSnapshot transformations should not be persisted in the local log table > > > Key: CASSANDRA-19190 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19190 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-2.html > > Time Spent: 10m > Remaining Estimate: 0h > > Per its inline comments, ForceSnapshot is a synthetic transformation whose > purpose it to enable the local log to jump missing epochs. A common use for > this is when replaying persisted events from the metadata log at startup. The > log is initialised with {{Epoch.EMPTY}} but rather that replaying every > single entry since the beginning of history, we select the most recent > snapshot held locally and start the replay from that point. Likewise, when > catching up from a peer, a node may receive a snapshot plus subsequent log > entries. In order to bring local metadata to the same state as the snapshot, > a {{ForceSnapshot}} with the same epoch as the snapshot is inserted into the > {{LocalLog}} and enacted like any other other transformation. These synthetic > transformations should not be persisted in the `system.local_metadata_log`, > as they do not exist in the distributed metadata log. We _should_ persist the > snapshot itself in {{system.metadata_snapshots}} so that we can avoid having > to re-fetch remote snapshots (i.e. if a node were to restart shortly after > receiving a catchup from a peer). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-website) branch asf-staging updated (916d1569 -> a9e9af59)
This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a change to branch asf-staging in repository https://gitbox.apache.org/repos/asf/cassandra-website.git discard 916d1569 generate docs for cc1c7113 new a9e9af59 generate docs for cc1c7113 This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (916d1569) \ N -- N -- N refs/heads/asf-staging (a9e9af59) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: site-ui/build/ui-bundle.zip | Bin 4883646 -> 4883646 bytes 1 file changed, 0 insertions(+), 0 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness
[ https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840013#comment-17840013 ] Stefan Miklosovic commented on CASSANDRA-19572: --- [~marcuse] There is already something about "releasing" SSTables here, I wonder what was your thought process behind that as that is the part of the functionality where it is failing. What's the context? (1) https://github.com/apache/cassandra/blob/cassandra-4.0/test/unit/org/apache/cassandra/db/ImportTest.java#L235 > Test failure: org.apache.cassandra.db.ImportTest flakiness > -- > > Key: CASSANDRA-19572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19572 > Project: Cassandra > Issue Type: Bug > Components: Tool/bulk load >Reporter: Brandon Williams >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > As discovered on CASSANDRA-19401, the tests in this class are flaky, at least > the following: > * testImportCorruptWithoutValidationWithCopying > * testImportInvalidateCache > * testImportCorruptWithCopying > * testImportCacheEnabledWithoutSrcDir > * testImportInvalidateCache > [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19572) Test failure: org.apache.cassandra.db.ImportTest flakiness
[ https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839994#comment-17839994 ] Marcus Eriksson commented on CASSANDRA-19572: - sorry, don't remember seeing these errors > Test failure: org.apache.cassandra.db.ImportTest flakiness > -- > > Key: CASSANDRA-19572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19572 > Project: Cassandra > Issue Type: Bug > Components: Tool/bulk load >Reporter: Brandon Williams >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > As discovered on CASSANDRA-19401, the tests in this class are flaky, at least > the following: > * testImportCorruptWithoutValidationWithCopying > * testImportInvalidateCache > * testImportCorruptWithCopying > * testImportCacheEnabledWithoutSrcDir > * testImportInvalidateCache > [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19191) Optimisations to PlacementForRange, improve lookup on r/w path
[ https://issues.apache.org/jira/browse/CASSANDRA-19191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-19191: Source Control Link: https://github.com/apache/cassandra/commit/34d999c47a4da6d43a67910354fb9888184b23ab Resolution: Fixed Status: Resolved (was: Ready to Commit) and committed, thanks > Optimisations to PlacementForRange, improve lookup on r/w path > -- > > Key: CASSANDRA-19191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19191 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-1.html, ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > The lookup used when selecting the appropriate replica group for a range or > token while peforming reads and writes is extremely simplistic and > inefficient. There is plenty of scope to improve {{PlacementsForRange}} to by > replacing the current naive iteration with use a more efficient lookup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra) branch trunk updated: Optimisations to PlacementForRange, improve lookup on r/w path
This is an automated email from the ASF dual-hosted git repository. marcuse pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/trunk by this push: new 34d999c47a Optimisations to PlacementForRange, improve lookup on r/w path 34d999c47a is described below commit 34d999c47a4da6d43a67910354fb9888184b23ab Author: Marcus Eriksson AuthorDate: Wed Mar 20 15:53:50 2024 +0100 Optimisations to PlacementForRange, improve lookup on r/w path Patch by marcuse and Sam Tunnicliffe; reviewed by Sam Tunnicliffe for CASSANDRA-19191 Co-authored-by: Sam Tunnicliffe Co-authored-by: Marcus Eriksson --- .../apache/cassandra/locator/LocalStrategy.java| 6 +- .../cassandra/locator/NetworkTopologyStrategy.java | 6 +- .../apache/cassandra/locator/SimpleStrategy.java | 6 +- .../org/apache/cassandra/tcm/ClusterMetadata.java | 16 +- .../cassandra/tcm/ownership/DataPlacement.java | 68 +++ .../cassandra/tcm/ownership/DataPlacements.java| 14 +- .../{PlacementForRange.java => ReplicaGroups.java} | 206 + .../org/apache/cassandra/tcm/sequences/Move.java | 4 +- .../cassandra/tcm/sequences/RemoveNodeStreams.java | 4 +- .../cassandra/distributed/shared/ClusterUtils.java | 4 +- .../test/log/MetadataChangeSimulationTest.java | 26 +-- .../test/log/OperationalEquivalenceTest.java | 4 +- .../distributed/test/log/SimulatedOperation.java | 4 +- .../distributed/test/ring/RangeVersioningTest.java | 4 +- .../test/microbench/ReplicaGroupsBench.java| 138 ++ .../tcm/compatibility/GossipHelperTest.java| 6 +- .../tcm/ownership/UniformRangePlacementTest.java | 68 --- .../InProgressSequenceCancellationTest.java| 18 +- .../cassandra/tcm/sequences/SequencesUtils.java| 2 +- 19 files changed, 392 insertions(+), 212 deletions(-) diff --git a/src/java/org/apache/cassandra/locator/LocalStrategy.java b/src/java/org/apache/cassandra/locator/LocalStrategy.java index 69193090c4..4032ce1594 100644 --- a/src/java/org/apache/cassandra/locator/LocalStrategy.java +++ b/src/java/org/apache/cassandra/locator/LocalStrategy.java @@ -26,7 +26,7 @@ import org.apache.cassandra.dht.Token; import org.apache.cassandra.tcm.ClusterMetadata; import org.apache.cassandra.tcm.Epoch; import org.apache.cassandra.tcm.ownership.DataPlacement; -import org.apache.cassandra.tcm.ownership.PlacementForRange; +import org.apache.cassandra.tcm.ownership.ReplicaGroups; import org.apache.cassandra.tcm.ownership.VersionedEndpoints; import org.apache.cassandra.utils.FBUtilities; @@ -65,7 +65,7 @@ public class LocalStrategy extends SystemStrategy { public static final Range entireRange = new Range<>(DatabaseDescriptor.getPartitioner().getMinimumToken(), DatabaseDescriptor.getPartitioner().getMinimumToken()); public static final EndpointsForRange localReplicas = EndpointsForRange.of(new Replica(FBUtilities.getBroadcastAddressAndPort(), entireRange, true)); -public static final DataPlacement placement = new DataPlacement(PlacementForRange.builder().withReplicaGroup(VersionedEndpoints.forRange(Epoch.FIRST, localReplicas)).build(), - PlacementForRange.builder().withReplicaGroup(VersionedEndpoints.forRange(Epoch.FIRST, localReplicas)).build()); +public static final DataPlacement placement = new DataPlacement(ReplicaGroups.builder().withReplicaGroup(VersionedEndpoints.forRange(Epoch.FIRST, localReplicas)).build(), + ReplicaGroups.builder().withReplicaGroup(VersionedEndpoints.forRange(Epoch.FIRST, localReplicas)).build()); } } diff --git a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java index d48ee31610..05bfcfb9ed 100644 --- a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java +++ b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java @@ -49,7 +49,7 @@ import org.apache.cassandra.tcm.membership.Directory; import org.apache.cassandra.tcm.membership.Location; import org.apache.cassandra.tcm.membership.NodeId; import org.apache.cassandra.tcm.ownership.DataPlacement; -import org.apache.cassandra.tcm.ownership.PlacementForRange; +import org.apache.cassandra.tcm.ownership.ReplicaGroups; import org.apache.cassandra.tcm.ownership.TokenMap; import org.apache.cassandra.tcm.ownership.VersionedEndpoints; import org.apache.cassandra.utils.FBUtilities; @@ -194,7 +194,7 @@ public class NetworkTopologyStrategy extends AbstractReplicationStrategy Directory directory, TokenMap tokenMap) { -PlacementForRange.Builder
Re: [PR] Minor fix to unit test [cassandra-java-driver]
absurdfarce merged PR #1930: URL: https://github.com/apache/cassandra-java-driver/pull/1930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
(cassandra-java-driver) branch 4.x updated: Initial fix to unit tests
This is an automated email from the ASF dual-hosted git repository. absurdfarce pushed a commit to branch 4.x in repository https://gitbox.apache.org/repos/asf/cassandra-java-driver.git The following commit(s) were added to refs/heads/4.x by this push: new 07265b4a6 Initial fix to unit tests 07265b4a6 is described below commit 07265b4a6830a47752bf31eb4f631b9917863da2 Author: absurdfarce AuthorDate: Tue Apr 23 00:38:48 2024 -0500 Initial fix to unit tests patch by Bret McGuire; reviewed by Bret McGuire for PR 1930 --- .../oss/driver/internal/core/session/DefaultSession.java | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/core/src/main/java/com/datastax/oss/driver/internal/core/session/DefaultSession.java b/core/src/main/java/com/datastax/oss/driver/internal/core/session/DefaultSession.java index cb1271c9c..6f063ae9a 100644 --- a/core/src/main/java/com/datastax/oss/driver/internal/core/session/DefaultSession.java +++ b/core/src/main/java/com/datastax/oss/driver/internal/core/session/DefaultSession.java @@ -39,6 +39,7 @@ import com.datastax.oss.driver.internal.core.metadata.MetadataManager; import com.datastax.oss.driver.internal.core.metadata.MetadataManager.RefreshSchemaResult; import com.datastax.oss.driver.internal.core.metadata.NodeStateEvent; import com.datastax.oss.driver.internal.core.metadata.NodeStateManager; +import com.datastax.oss.driver.internal.core.metrics.NodeMetricUpdater; import com.datastax.oss.driver.internal.core.metrics.SessionMetricUpdater; import com.datastax.oss.driver.internal.core.pool.ChannelPool; import com.datastax.oss.driver.internal.core.util.Loggers; @@ -549,10 +550,11 @@ public class DefaultSession implements CqlSession { // clear metrics to prevent memory leak for (Node n : metadataManager.getMetadata().getNodes().values()) { -((DefaultNode) n).getMetricUpdater().clearMetrics(); +NodeMetricUpdater updater = ((DefaultNode) n).getMetricUpdater(); +if (updater != null) updater.clearMetrics(); } - DefaultSession.this.metricUpdater.clearMetrics(); + if (metricUpdater != null) metricUpdater.clearMetrics(); List> childrenCloseStages = new ArrayList<>(); for (AsyncAutoCloseable closeable : internalComponentsToClose()) { @@ -575,10 +577,11 @@ public class DefaultSession implements CqlSession { // clear metrics to prevent memory leak for (Node n : metadataManager.getMetadata().getNodes().values()) { -((DefaultNode) n).getMetricUpdater().clearMetrics(); +NodeMetricUpdater updater = ((DefaultNode) n).getMetricUpdater(); +if (updater != null) updater.clearMetrics(); } - DefaultSession.this.metricUpdater.clearMetrics(); + if (metricUpdater != null) metricUpdater.clearMetrics(); if (closeWasCalled) { // onChildrenClosed has already been scheduled - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[PR] Minor fix to unit test [cassandra-java-driver]
absurdfarce opened a new pull request, #1930: URL: https://github.com/apache/cassandra-java-driver/pull/1930 The recent metrics changes to prevent session leakage ([this PR](https://github.com/apache/cassandra-java-driver/pull/1916)) introduced a small issue in one of the unit tests. This PR addresses that issue. A combo branch containing this fix + [the fix for CASSANDRA-19292](https://github.com/apache/cassandra-java-driver/pull/1924) passed all unit and integration tests in a local run using Cassandra 4.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org