[jira] [Commented] (CASSANDRA-17804) AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically on missing stdout contents
[ https://issues.apache.org/jira/browse/CASSANDRA-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598340#comment-17598340 ] Andres de la Peña commented on CASSANDRA-17804: --- Committed to 4.1 as [1ee5df02b1f98cf38f126d47a7f3fb153f790d52|https://github.com/apache/cassandra/commit/1ee5df02b1f98cf38f126d47a7f3fb153f790d52] and merged up to [trunk|https://github.com/apache/cassandra/commit/b39596b65c6bdbed5f2be35008291c69b41f1675]. > AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically > on missing stdout contents > --- > > Key: CASSANDRA-17804 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17804 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Caleb Rackliffe >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1-beta, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > See > [https://app.circleci.com/pipelines/github/maedhroz/cassandra/487/workflows/0ad42210-2979-4c5d-a4e8-d8cedf9285a7/jobs/4686/tests#failed-test-0] > > My guess is that in some resource constrained environment, even the first > {{nodeool listsnapshots}} invocation doesn’t have “dropped” in the stdout. In > other words, we skip to the state of the world the last assertion in the test > is checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17804) AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically on missing stdout contents
[ https://issues.apache.org/jira/browse/CASSANDRA-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598301#comment-17598301 ] Andres de la Peña commented on CASSANDRA-17804: --- Sure, starting commit. > AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically > on missing stdout contents > --- > > Key: CASSANDRA-17804 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17804 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Caleb Rackliffe >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1-beta, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > See > [https://app.circleci.com/pipelines/github/maedhroz/cassandra/487/workflows/0ad42210-2979-4c5d-a4e8-d8cedf9285a7/jobs/4686/tests#failed-test-0] > > My guess is that in some resource constrained environment, even the first > {{nodeool listsnapshots}} invocation doesn’t have “dropped” in the stdout. In > other words, we skip to the state of the world the last assertion in the test > is checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17804) AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically on missing stdout contents
[ https://issues.apache.org/jira/browse/CASSANDRA-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597984#comment-17597984 ] Paulo Motta commented on CASSANDRA-17804: - Thanks for the test runs [~adelapena]. Would you mind committing this? Unfortunately I'm not able to commit this before a few days. > AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically > on missing stdout contents > --- > > Key: CASSANDRA-17804 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17804 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Caleb Rackliffe >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1-beta, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > See > [https://app.circleci.com/pipelines/github/maedhroz/cassandra/487/workflows/0ad42210-2979-4c5d-a4e8-d8cedf9285a7/jobs/4686/tests#failed-test-0] > > My guess is that in some resource constrained environment, even the first > {{nodeool listsnapshots}} invocation doesn’t have “dropped” in the stdout. In > other words, we skip to the state of the world the last assertion in the test > is checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17804) AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically on missing stdout contents
[ https://issues.apache.org/jira/browse/CASSANDRA-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597983#comment-17597983 ] Andres de la Peña commented on CASSANDRA-17804: --- It seems that all the runs above have succeeded :) > AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically > on missing stdout contents > --- > > Key: CASSANDRA-17804 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17804 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Caleb Rackliffe >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1-beta, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > See > [https://app.circleci.com/pipelines/github/maedhroz/cassandra/487/workflows/0ad42210-2979-4c5d-a4e8-d8cedf9285a7/jobs/4686/tests#failed-test-0] > > My guess is that in some resource constrained environment, even the first > {{nodeool listsnapshots}} invocation doesn’t have “dropped” in the stdout. In > other words, we skip to the state of the world the last assertion in the test > is checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17804) AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically on missing stdout contents
[ https://issues.apache.org/jira/browse/CASSANDRA-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597969#comment-17597969 ] Andres de la Peña commented on CASSANDRA-17804: --- Here are some repeated runs of the fixed flaky test, still running: ||branch||vnodes||CI|| |4.1|false|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2023/workflows/942564a7-5d9f-4c12-b2f0-4ca5cc490749] [j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/2023/workflows/978ca93d-b464-478c-a3df-6628f0f58545]| |4.1|true|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2026/workflows/32cd730b-e299-4889-8835-770156956ac7] [j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/2026/workflows/9702e9e6-fc1f-4981-8d83-4ce7d3ef65f2]| |trunk|false|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2024/workflows/8e78ad06-c453-4bb4-ad32-4ebce1ddc2be] [j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/2024/workflows/bdd5cf85-5766-4352-9a20-05b01feb0aa2]| |trunk|true|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2024/workflows/8e78ad06-c453-4bb4-ad32-4ebce1ddc2be] [j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/2025/workflows/f3ef33b3-37ee-44fb-b5c3-274dc24a338b]| The CircleCI config file for these runs has been generated with: {code:java} .circleci/generate.sh -m \ -e REPEATED_UTEST_TARGET=test-jvm-dtest-some \ -e REPEATED_UTEST_COUNT=500 \ -e REPEATED_UTEST_CLASS=org.apache.cassandra.distributed.test.AutoSnapshotTtlTest \ -e REPEATED_UTEST_METHODS=testAutoSnapshotTTlOnDropAfterRestart \ -e REPEATED_UTEST_VNODES=true {code} > AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically > on missing stdout contents > --- > > Key: CASSANDRA-17804 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17804 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Caleb Rackliffe >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1-beta, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > See > [https://app.circleci.com/pipelines/github/maedhroz/cassandra/487/workflows/0ad42210-2979-4c5d-a4e8-d8cedf9285a7/jobs/4686/tests#failed-test-0] > > My guess is that in some resource constrained environment, even the first > {{nodeool listsnapshots}} invocation doesn’t have “dropped” in the stdout. In > other words, we skip to the state of the world the last assertion in the test > is checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17804) AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically on missing stdout contents
[ https://issues.apache.org/jira/browse/CASSANDRA-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597287#comment-17597287 ] Caleb Rackliffe commented on CASSANDRA-17804: - +1 > AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically > on missing stdout contents > --- > > Key: CASSANDRA-17804 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17804 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Caleb Rackliffe >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1-beta, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > See > [https://app.circleci.com/pipelines/github/maedhroz/cassandra/487/workflows/0ad42210-2979-4c5d-a4e8-d8cedf9285a7/jobs/4686/tests#failed-test-0] > > My guess is that in some resource constrained environment, even the first > {{nodeool listsnapshots}} invocation doesn’t have “dropped” in the stdout. In > other words, we skip to the state of the world the last assertion in the test > is checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17804) AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically on missing stdout contents
[ https://issues.apache.org/jira/browse/CASSANDRA-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585331#comment-17585331 ] Benjamin Lerer commented on CASSANDRA-17804: [~maedhroz] are you happy with the latest changes? > AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically > on missing stdout contents > --- > > Key: CASSANDRA-17804 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17804 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Caleb Rackliffe >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1-beta, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > See > [https://app.circleci.com/pipelines/github/maedhroz/cassandra/487/workflows/0ad42210-2979-4c5d-a4e8-d8cedf9285a7/jobs/4686/tests#failed-test-0] > > My guess is that in some resource constrained environment, even the first > {{nodeool listsnapshots}} invocation doesn’t have “dropped” in the stdout. In > other words, we skip to the state of the world the last assertion in the test > is checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17804) AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically on missing stdout contents
[ https://issues.apache.org/jira/browse/CASSANDRA-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581029#comment-17581029 ] Paulo Motta commented on CASSANDRA-17804: - Thanks for the feedback. Addressed nit comments and resubmitted CI for 4.1/trunk: |[trunk|https://github.com/apache/cassandra/pull/1596]|[CI|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1651/]| |[4.1|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-17804-4.1]|[CI|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1884/]| > AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically > on missing stdout contents > --- > > Key: CASSANDRA-17804 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17804 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Caleb Rackliffe >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1-beta, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > See > [https://app.circleci.com/pipelines/github/maedhroz/cassandra/487/workflows/0ad42210-2979-4c5d-a4e8-d8cedf9285a7/jobs/4686/tests#failed-test-0] > > My guess is that in some resource constrained environment, even the first > {{nodeool listsnapshots}} invocation doesn’t have “dropped” in the stdout. In > other words, we skip to the state of the world the last assertion in the test > is checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17804) AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically on missing stdout contents
[ https://issues.apache.org/jira/browse/CASSANDRA-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579888#comment-17579888 ] Caleb Rackliffe commented on CASSANDRA-17804: - I dropped a couple nits in the PR, but LGTM overall. I guess the difficulty here is that you can't strictly guess how long the restart will take, and therefore can't know when the first check on {{listsnapshots}} is irrelevant. The only alternative I could see is inspecting the stdout, pulling a timestamp of some kind from it, and ignoring the check on the drop prefix in the first assert if too much time had passed. > AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically > on missing stdout contents > --- > > Key: CASSANDRA-17804 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17804 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Caleb Rackliffe >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1-beta, 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > See > [https://app.circleci.com/pipelines/github/maedhroz/cassandra/487/workflows/0ad42210-2979-4c5d-a4e8-d8cedf9285a7/jobs/4686/tests#failed-test-0] > > My guess is that in some resource constrained environment, even the first > {{nodeool listsnapshots}} invocation doesn’t have “dropped” in the stdout. In > other words, we skip to the state of the world the last assertion in the test > is checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17804) AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically on missing stdout contents
[ https://issues.apache.org/jira/browse/CASSANDRA-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579469#comment-17579469 ] Paulo Motta commented on CASSANDRA-17804: - This test makes the following assertions to ensure snapshots of dropped tables are properly expired after a node restart when auto_snapshot_ttl=20s: a) that a snapshot of a dropped table exists after a node restart b) that the snapshot is gone after expiration (20s) This test assumes that the restart will take less than auto_snapshot_ttl=20s. If a restart takes longer than that then the snapshot is cleared as soon as the node starts and the first assertion fails. The trivial fix is to increase auto_snapshot_ttl=60s, in the hope that a restart will never take longer than a minute and the first assertion will never fail. Unfortunately this will make the test longer but a bit more deterministic. * PR: https://github.com/apache/cassandra/pull/1789 * CI: https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1871/ > AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically > on missing stdout contents > --- > > Key: CASSANDRA-17804 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17804 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Caleb Rackliffe >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1-beta, 4.x > > > See > [https://app.circleci.com/pipelines/github/maedhroz/cassandra/487/workflows/0ad42210-2979-4c5d-a4e8-d8cedf9285a7/jobs/4686/tests#failed-test-0] > > My guess is that in some resource constrained environment, even the first > {{nodeool listsnapshots}} invocation doesn’t have “dropped” in the stdout. In > other words, we skip to the state of the world the last assertion in the test > is checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17804) AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically on missing stdout contents
[ https://issues.apache.org/jira/browse/CASSANDRA-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577541#comment-17577541 ] Paulo Motta commented on CASSANDRA-17804: - Thanks for checking [~maedhroz], I'll take a look. > AutoSnapshotTtlTest#testAutoSnapshotTTlOnDropAfterRestart fails sporadically > on missing stdout contents > --- > > Key: CASSANDRA-17804 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17804 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Caleb Rackliffe >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1-beta, 4.x > > > See > [https://app.circleci.com/pipelines/github/maedhroz/cassandra/487/workflows/0ad42210-2979-4c5d-a4e8-d8cedf9285a7/jobs/4686/tests#failed-test-0] > > My guess is that in some resource constrained environment, even the first > {{nodeool listsnapshots}} invocation doesn’t have “dropped” in the stdout. In > other words, we skip to the state of the world the last assertion in the test > is checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org