[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15481241#comment-15481241 ] Stefania commented on CASSANDRA-11465: -- Will do, thanks! :) > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug > Components: Observability >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > Fix For: 2.2.8, 3.0.9, 3.8 > > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15479205#comment-15479205 ] mck commented on CASSANDRA-11465: - Only just found and read this today! Feel free to shout out for tracing related stuff :-) > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug > Components: Observability >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > Fix For: 2.2.8, 3.0.9, 3.8 > > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1539#comment-1539 ] Stefania commented on CASSANDRA-11465: -- Committed to 2.2 as 7bd65a129c63091d6885f92afe77a41c4fc46a6f and merged upwards. Thank you for the review and running the tests Paulo! > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398808#comment-15398808 ] Stefania commented on CASSANDRA-11465: -- 3.0 and 3.9 dtest runs just completed and they are both good. I will commit this soon. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398607#comment-15398607 ] Paulo Motta commented on CASSANDRA-11465: - The latest 3.9 run finished, but there are couple of additional [failures|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-3.9-dtest/4/testReport/] on {{materialized_views_test}} :(, but they were already present in a previous [3.9 branch run|http://cassci.datastax.com/view/cassandra-3.9/job/cassandra-3.9_dtest/17/testReport/junit/materialized_views_test/] so I'm quite confident they're unrelated to this ticket. I submitted a [new run|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-3.9-dtest/5/] just in case. Feel free to mark as ready to commit if remaining 3.0/3.9 CI results look good. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398511#comment-15398511 ] Stefania commented on CASSANDRA-11465: -- I've restarted the 3.0 dtests as well, the failure addressed by CASSANDRA-11811 is quite different than ours. Also, I forgot to mention yesterday that the tests do pass locally. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398441#comment-15398441 ] Paulo Motta commented on CASSANDRA-11465: - bq. There are a number of failures, all accounted for except for this in 3.0 and this in 3.9. {{snapshot_test.TestArchiveCommitlog.test_archive_commitlog}} is being addressed on CASSANDRA-11811 but {{delete_insert_search_test}} in fact looks like a new failure, but it seems unrelated. bq. perhaps to be on the safe side we should set the default value of WAIT_FOR_PENDING_EVENTS_TIMEOUT_SECS to zero rather than 1 second so that this patch only affects the tests that set -Dcassandra.wait_for_tracing_events_timeout_secs? I think we can keep it, since this issue may also happen in ordinary tracing sessions so it may help to make this a bit more robust and AFAIK tracing is only used for debugging/troubleshooting so it's not particularly critical. In any case, I submitted an additional [3.9 dtest run|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-3.9-dtest/4/] to verify the previous failures were indeed unrelated. I will mark as ready to commit if this last test run looks good. Thanks! > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397245#comment-15397245 ] Stefania commented on CASSANDRA-11465: -- This makes sense, thanks for the explanation! > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397244#comment-15397244 ] Stefania commented on CASSANDRA-11465: -- Thanks for restarting the tests. There are a number of failures, all accounted for except for [this|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-3.0-dtest/lastCompletedBuild/testReport/snapshot_test/TestArchiveCommitlog/test_archive_commitlog] in 3.0 and [this|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-3.9-dtest/lastCompletedBuild/testReport/delete_insert_test/DeleteInsertTest/delete_insert_search_test/] in 3.9. I'm not sure they are related given that both commitlog and secondary indexes are local, but perhaps to be on the safe side we should set the default value of {{WAIT_FOR_PENDING_EVENTS_TIMEOUT_SECS}} to zero rather than 1 second so that this patch only affects the tests that set {{-Dcassandra.wait_for_tracing_events_timeout_secs}}? > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396736#comment-15396736 ] Paulo Motta commented on CASSANDRA-11465: - The previous runs failed with the same reason. I resubmitted them separately (with a delay between them as to not overload the apache repos), and they seem to be running. Please mark as ready to commit if CI executes successfully this time. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395781#comment-15395781 ] Joel Knighton commented on CASSANDRA-11465: --- I think you're right regarding the build time out - this has surfaced a few times in the last few days. On the Apache infra page, you can see that [repository.apache.org|https://status.apache.org/] (under the Nexus section) has been down several times in the last few days. Local builds probably work because of your cached m2. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395328#comment-15395328 ] Stefania commented on CASSANDRA-11465: -- I managed to get the 2.2 dtests to run but the other dtests are still failing, see an example [here|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-dtest/2/console]. I think the problem is that the build is timing out: {{Build timed out (after 20 minutes). Marking the build as aborted.}} Locally I can build these branches without problems however. They were all rebased this morning BTW. I've relaunched the aborted dtests one more time. There were also many timeouts for the trunk utests, but the same problem happened in 3.9 [yesterday|http://cassci.datastax.com/view/cassandra-3.9/job/cassandra-3.9_testall/41/testReport/]. I've rebased and relaunched the trunk utests as well. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395031#comment-15395031 ] Stefania commented on CASSANDRA-11465: -- I've tried following [~mshuler] advice on IRC and renamed the branches. They should be running on Openstack now. ||2.2||3.0||3.9||trunk|| |[patch|https://github.com/stef1927/cassandra/commits/11465-2.2]|[patch|https://github.com/stef1927/cassandra/commits/11465-3.0]|[patch|https://github.com/stef1927/cassandra/commits/11465-3.9]|[patch|https://github.com/stef1927/cassandra/commits/11465]| |[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-3.9-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-testall/]| |[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-3.9-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-dtest/]| > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394971#comment-15394971 ] Paulo Motta commented on CASSANDRA-11465: - patch and multiplexer results look good, but for some reason dtest doesn't seem to be running, tried resubmitting a few times without success. wasn't able to find out the root cause from a quick glance at the logs. could you maybe trying rebase the dtest repo? > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394014#comment-15394014 ] Paulo Motta commented on CASSANDRA-11465: - Submitted new multiplexer runs: * [2.2|https://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/188/] * [trunk|https://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/187/] > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393116#comment-15393116 ] Stefania commented on CASSANDRA-11465: -- Thanks for the additional commit. I've incorporated it into the trunk patch and discarded the futures commits on all branches except for trunk. I've also rebased, squashed, prepared the patches for commit and relaunched the tests. The multiplexer runs were aborted unfortunately, and I am not able to restart them with the up-to-date commits, so would you mind doing that as well as taking a another quick look at the final patches? > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392611#comment-15392611 ] Paulo Motta commented on CASSANDRA-11465: - New future-based approach LGTM. I added a [new commit|https://github.com/pauloricardomg/cassandra/commit/7246dfbffcb99a6a3f54fef9f84e84072fa90e3c] to the trunk version simplifying waitForPendingEvents to wait on a list of {{CompletableFuture}} instead. bq. Launched the tests again, the latest changes are in a separate commit so we can choose which approach to use. I would prefer to use futures since it is more accurate. Since the no-op event solution is a bit simpler and less risky, I propose we commit that for 2.2+ and commit the future-based version only to trunk as an improvement. WDYT ? Submitted 20x multiplexer run of {{cql_tracing_test.py:TestCqlTracing}} for 2.2 (no-op version) and trunk (future version): * [2.2|https://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/186/] * [trunk|https://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/185/] > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391376#comment-15391376 ] Stefania commented on CASSANDRA-11465: -- There's a static method that we won't be able to capture with futures but even the approach based on adding a no op event could have the same race, see CASSANDRA-5668. Basically the state may be deleted before the coordinator receives all responses or even sends all requests but this is not applicable to CL.ALL. Also, the 3.9 patch becomes a little bit more involved due to the abstractions introduced by CASSANDRA-10392. Launched the tests again, the latest changes are in a separate commit so we can choose which approach to use. I would prefer to use futures since it is more accurate. ||2.2||3.0||3.9||trunk|| |[patch|https://github.com/stef1927/cassandra/commits/11465-cqlsh-2.2]|[patch|https://github.com/stef1927/cassandra/commits/11465-cqlsh-3.0]|[patch|https://github.com/stef1927/cassandra/commits/11465-cqlsh-3.9]|[patch|https://github.com/stef1927/cassandra/commits/11465-cqlsh]| |[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-3.9-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-testall/]| |[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-3.9-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-dtest/]| Only the 2.2 and 3.9 commits need reviewing, the 3.0 and trunk commits are identical to the 2.2. and 3.9 commits respectively. I've also created the pull request for dtests [here|https://github.com/riptano/cassandra-dtest/pull/1125]. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390264#comment-15390264 ] Paulo Motta commented on CASSANDRA-11465: - While this looks good and seems to solve the problem, it may introduce timing dependencies between concurrent tracing sessions and the timeout would still not guarantee all messages are persisted before the tracing session completes. While this is probably not a big deal and much better than what we currently have, I think a more robust and deterministic approach would be to keep tracing futures for each {{TraceStateImpl}} and wait for them to complete before stopping the session. WDYT of this alternative approach? If the alternative is not feasible for some reason I'm not aware of, adds too much complexity, or requires a change of interfaces, I'm also fine with keeping the current approach and perhaps opening another ticket for improvement later if feasible. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385171#comment-15385171 ] Stefania commented on CASSANDRA-11465: -- Thanks for running the tests. I've backported the patch to 2.2 and 3.0 as well, since {{tracing_simple_test}} may fail there too. Here is a recap: ||2.2||3.0||3.9||trunk|| |[patch|https://github.com/stef1927/cassandra/commits/11465-cqlsh-2.2]|[patch|https://github.com/stef1927/cassandra/commits/11465-cqlsh-3.0]|[patch|https://github.com/stef1927/cassandra/commits/11465-cqlsh-3.9]|[patch|https://github.com/stef1927/cassandra/commits/11465-cqlsh]| |[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-3.9-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-testall/]| |[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-3.9-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-dtest/]| Here are the [dtest changes|https://github.com/stef1927/cassandra-dtest/commits/11465]. Both dtest and c* patches are based on the CASSANDRA-11850 patch. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384761#comment-15384761 ] Jim Witschey commented on CASSANDRA-11465: -- I ran the following script: {code} for i in `seq 30` ; do echo iteration $i PRINT_DEBUG=true DEBUG=true CASSANDRA_VERSION=github:stef1927/11465-cqlsh-3.9 TRACE=org.apache.cassandra.tracing.TraceStateImpl nosetests cql_tracing_test.py killall -9 java done for i in `seq 30` ; do echo iteration $i PRINT_DEBUG=true DEBUG=true CASSANDRA_VERSION=github:stef1927/11465-cqlsh TRACE=org.apache.cassandra.tracing.TraceStateImpl nosetests cql_tracing_test.py killall -9 java done {code} and saw no failures: https://gist.github.com/072692334800285afc8d362165cb37a6 > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383715#comment-15383715 ] Stefania commented on CASSANDRA-11465: -- Latest test round looks good, can we run a batch of {{TRACE=org.apache.cassandra.tracing.TraceStateImpl nosetests cql_tracing_test.py}}, e.g. 20x, [~mambocab], [~philipthompson]? > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383505#comment-15383505 ] Stefania commented on CASSANDRA-11465: -- Yesterday's run reproduced the [same failure|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-3.9-dtest/lastCompletedBuild/testReport/cql_tracing_test/TestCqlTracing/tracing_default_impl_test/] unfortunately. After investigating, I discovered that the tracing executor was rejecting calls to submit with UnsupportedOperation exceptions. I've fixed this and I've also increased the maximum time we wait for a queued event to execute on the tracing executor when releasing the tracing state. By default it is still 1 second, and could be zero if required, but the tests can override it via a system property and are currently using up to 15 seconds. I've also added some trace messages to debug further if the problem persists, and relaunched the tests. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381669#comment-15381669 ] Stefania commented on CASSANDRA-11465: -- The last 2 runs of dtests were successful. I've rebased and launched both dtests and utests one more time. If the results are again successful, we can move on to repeating a few runs of only {{cqlsh_tracing_test.TestCqlTracing}} on Jenkins (i.e. 20x) and reviewing the patch. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379194#comment-15379194 ] Stefania commented on CASSANDRA-11465: -- The failures were caused by a compilation error in {{build-test}}, relaunched. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379017#comment-15379017 ] Stefania commented on CASSANDRA-11465: -- Many dtests failed, I will check on Monday what is going on. However {{cqlsh_tracing_test.TestCqlTracing}} passed in both runs, so you could multiplex it maybe 20x to see if at least we are on the right track. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378937#comment-15378937 ] Stefania commented on CASSANDRA-11465: -- I've also launched testall since we've modified java code: ||3.9||trunk|| |[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-3.9-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-testall/]| > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378934#comment-15378934 ] Stefania commented on CASSANDRA-11465: -- So, the problem is not just the consistency level but also the fact that {{TraceStateImpl.executeMutation()}} is completely asynchronous; I should have read more carefully what [~thobbs] wrote above. I have another version of the patch, in addition to reading at CL.ALL, we post a no-op event to the TRACING stage before closing the tracing state. Because the tracing executor only has one thread in its pool, this should guarantee that the tracing mutations for this query have been applied to at least one replica. I'm not sure if this is going to introduce slowness or deadlocks, and we may have to put it behind a system flag to play it safe. Let's see what the dtest results are first. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378857#comment-15378857 ] Stefania commented on CASSANDRA-11465: -- Test results are very disappointing, I reproduced [the issue|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-dtest/lastCompletedBuild/testReport/cql_tracing_test/TestCqlTracing/tracing_simple_test/] despite of the patch. Looking into it. BTW, the other 3 failing cqlsh tests are expected, there is a companion dtest PR that will be committed with CASSANDRA-11850 and will fix those failures. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378760#comment-15378760 ] Stefania commented on CASSANDRA-11465: -- Since Paulo is back on Monday and it is already Friday, I prefer to leave the test code as it is. I've prepared a patch based on 11850, and launched the tests: ||3.9||trunk|| |[patch|https://github.com/stef1927/cassandra/commits/11465-cqlsh-3.9]|[patch|https://github.com/stef1927/cassandra/commits/11465-cqlsh]| |[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-3.9-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-dtest/]| |[cqlsh dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-3.9-cqlsh-tests/]|[cqlsh dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11465-cqlsh-cqlsh-tests/]| If the tests are OK, could we arrange a multiplexed job? I think 20x of {{cql_tracing_test.py:TestCqlTracing}} should be sufficient. bq. FWIW, I spent a little time trying to make tracing more reliable for tests in CASSANDRA-11928 by doing synchronous CL.ALL writes when a system flag was present. Unfortunately, this appeared to cause some kind of deadlock, and it didn't seem worth it to investigate further. However, if this is a problem across many tests, we may want to spend more time looking into that. Let's see if we have more luck with the driver reading at CL.ALL. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378426#comment-15378426 ] Joshua McKenzie commented on CASSANDRA-11465: - I'm also fine w/us leaving it failing until Paulo gets back and reviews CASSANDRA-11850. This should be done well in time for 3.9 so shouldn't block release. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378125#comment-15378125 ] Jim Witschey commented on CASSANDRA-11465: -- bq. we may want to spend more time looking into [doing synchronous CL.ALL writes] This is worth looking into, but with this caveat: in the dtest environment, timeouts can make {{CL.ALL}} calls can actually make tests flakier. I don't have a concrete example of this, but that's my memory. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15377622#comment-15377622 ] Tyler Hobbs commented on CASSANDRA-11465: - FWIW, I spent a little time trying to make tracing more reliable for tests in CASSANDRA-11928 by doing synchronous CL.ALL writes when a system flag was present. Unfortunately, this appeared to cause some kind of deadlock, and it didn't seem worth it to investigate further. However, if this is a problem across many tests, we may want to spend more time looking into that. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15377102#comment-15377102 ] Philip Thompson commented on CASSANDRA-11465: - So, I would argue we aren't actually introducing "known pending another change", right? That seems to be the state that every failing test is in, where the cause of the failure is a C* limitation? As long as we do this in a way that doesn't accidentally lead to reverting the new coverage and forgetting to restore it, I don't feel too strongly. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376814#comment-15376814 ] Joshua McKenzie commented on CASSANDRA-11465: - I'd say revert and then re-commit w/11850. That way we can continue forward with just "working | flaky" as our two test states rather than potentially introducing "known pending another change". Seem reasonable/ > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376683#comment-15376683 ] Stefania commented on CASSANDRA-11465: -- I found the reason of the sudden failures, it's [PR #1052|https://github.com/riptano/cassandra-dtest/pull/1052], specifically [here|https://github.com/riptano/cassandra-dtest/pull/1052/files#diff-b866cba7cf982d53e6406cca014e659eR75], merged on June 23. Previously we were just checking if "127.0.0.1", etc were present, now we check if "/127.0.0.1", etc are present. Note the forward slash. So my analysis above is correct, the only reason it did not fail before is because we were not testing it. I don't think we can easily fix the race in tracing, it's probably a known limitation, what we can do is query with a higher consistency level in cqlsh. Since [PYTHON-435|https://datastax-oss.atlassian.net/browse/PYTHON-435], released as part of driver version 3.3.0, the consistency level for trace queries can be set by the caller. I propose to use the same consistency level that is used for executing the statement in cqlsh, in this case it would be CL.ALL. However, we need to wait for CASSANDRA-11850 to upgrade the driver first. [~JoshuaMcKenzie]: I don't think this is a show stopper for the release, we can either change the test back or leave the known failure, and once 11850 is available test this cqlsh patch [here|https://github.com/stef1927/cassandra/commit/8118c3dca34120b78f29a66c073208ee0de91063]. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376518#comment-15376518 ] Stefania commented on CASSANDRA-11465: -- The additional warnings are a problem with ccm code, here is an extract of the log file taken from the last seen [failing test|http://cassci.datastax.com/job/trunk_dtest/1276/testReport/junit/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test/] on June 15: {code} ERROR [ScheduledTasks:1] 2016-06-15 03:06:41,111 Tracing.java:106 - Cannot use class junk for tracing (Unable to find Tracing class 'junk'), ignoring by defaulting on normal tracing WARN [main] 2016-06-15 03:06:41,119 StartupChecks.java:123 - jemalloc shared library could not be preloaded to speed up memory allocations WARN [main] 2016-06-15 03:06:41,119 StartupChecks.java:156 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info. {code} PR [#518|https://github.com/pcmanus/ccm/pull/518] should have fixed this problem; it was merged on June 24. -- The missing {{'/127.0.0.1'}} happens at least on 3.8, 3.9 and trunk, I couldn't see any similar failures on 3.0 or 3.7. It seems the [oldest test|http://cassci.datastax.com/view/trunk/job/trunk_novnode_dtest/409/testReport/junit/cql_tracing_test/TestCqlTracing/tracing_default_impl_test/] that failed is on Jun 27. Locally, I can reproduce it about once every 10 times, so I am currently trying to bisect between cassandra-3.7 and cassandra-3.8. What's happening is that we are missing the tracing events of nodes 2 and 3, we only have the coordinator events. Looking at how tracing works, technically this can happen if the coordinator is very slow and pre-empted, because {{TraceStateImpl}} mutates at CL.ANY and the driver queries {{system_traces.events}} at CL.LOCAL_ONE. The coordinator writes a session-end entry in {{system_traces.sessions}} just before the response is sent to the client; the driver waits for this entry before querying {{system_tracing.events}}. Although the replicas send a tracing mutation request before sending the response to the actual request with tracing enabled, which in this case is at CL.ALL, since the coordinator is operating in a multi-threaded environment, there is no guarantee that the mutations of the tracing events will be inserted before inserting the session-end mutation. This is a race but I cannot find anything in the code that suggests that this was changed recently. It should only be a problem with a very slow C* coordinator, or if the tracing mutations were dropped due to overload, which clearly isn't the case. Therefore I am hoping the bisect may shed some light. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug >Reporter: Philip Thompson >Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375148#comment-15375148 ] Jim Witschey commented on CASSANDRA-11465: -- I believe this is a bug. Marking as such and unassigning to add it to the dev queue. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Test >Reporter: Philip Thompson >Assignee: Jim Witschey > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375145#comment-15375145 ] Jim Witschey commented on CASSANDRA-11465: -- Another failure we're seeing: http://cassci.datastax.com/job/trunk_dtest/1288/testReport/junit/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test/ {code} '/127.0.0.1' not found in "Consistency level set to ALL.\nNow Tracing is enabled\n\nTracing session: 54778be0-3c80-11e6-a333-ef4c703100c3\n\n activity | timestamp | source| source_elapsed | client\n--++---++---\n Execute CQL3 query | 2016-06-27 16:00:55.198000 | 127.0.0.1 | 0 | 127.0.0.1\n Parsing INSERT INTO ks.users (userid, firstname, lastname, age) VALUES (550e8400-e29b-41d4-a716-44665544, 'Frodo', 'Baggins', 32); [Native-Transport-Requests-2] | 2016-06-27 16:00:55.198000 | 127.0.0.1 | 462 | 127.0.0.1\n Preparing statement [Native-Transport-Requests-2] | 2016-06-27 16:00:55.199000 | 127.0.0.1 | 953 | 127.0.0.1\n Determining replicas for mutation [Native-Transport-Requests-2] | 2016-06-27 16:00:55.20 | 127.0.0.1 | 1773 | 127.0.0.1\n Sending MUTATION message to /127.0.0.3 [MessagingService-Outgoing-/127.0.0.3] | 2016-06-27 16:00:55.202000 | 127.0.0.1 | 4094 | 127.0.0.1\n Sending MUTATION message to /127.0.0.2 [MessagingService-Outgoing-/127.0.0.2] | 2016-06-27 16:00:55.202000 | 127.0.0.1 | 4117 | 127.0.0.1\n Appending to commitlog [Native-Transport-Requests-2] | 2016-06-27 16:00:55.202000 | 127.0.0.1 | 4270 | 127.0.0.1\n Adding to users memtable [Native-Transport-Requests-2] | 2016-06-27 16:00:55.203000 | 127.0.0.1 | 4758 | 127.0.0.1\n REQUEST_RESPONSE message received from /127.0.0.3 [MessagingService-Incoming-/127.0.0.3] | 2016-06-27 16:00:55.213000 | 127.0.0.1 | 14833 | 127.0.0.1\n Processing response from /127.0.0.3 [RequestResponseStage-4] | 2016-06-27 16:00:55.213000 | 127.0.0.1 | 15059 | 127.0.0.1\n REQUEST_RESPONSE message received from /127.0.0.2 [MessagingService-Incoming-/127.0.0.2] | 2016-06-27 16:00:55.217000 | 127.0.0.1 | 19203 | 127.0.0.1\n Processing response from /127.0.0.2 [RequestResponseStage-1] | 2016-06-27 16:00:55.217000 | 127.0.0.1 | 19379 | 127.0.0.1\n Request complete | 2016-06-27 16:00:55.217557 | 127.0.0.1 | 19557 | 127.0.0.1\n\n\n" >> begin captured logging << dtest: DEBUG: cluster ccm directory: /tmp/dtest-RJbUwP dtest: DEBUG: Custom init_config not found. Setting defaults. dtest: DEBUG: Done setting configuration options: { 'initial_token': None, 'num_tokens': '32', 'phi_convict_threshold': 5, 'range_request_timeout_in_ms': 1, 'read_request_timeout_in_ms': 1, 'request_timeout_in_ms': 1, 'truncate_request_timeout_in_ms': 1, 'write_request_timeout_in_ms': 1} dtest: DEBUG: Consistency level set to ALL. Now Tracing is enabled Tracing session: 54778be0-3c80-11e6-a333-ef4c703100c3 activity | timestamp
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346620#comment-15346620 ] Philip Thompson commented on CASSANDRA-11465: - Here's the problem: {code} dtest: DEBUG: Errors after attempted trace with unknown tracing class: [["ERROR [ScheduledTasks:1] 2016-06-15 03:06:41,111 Tracing.java:106 - Cannot use class junk for tracing (Unable to find Tracing class 'junk'), ignoring by defaulting on normal tracing", 'WARN [main] 2016-06-15 03:06:41,119 StartupChecks.java:123 - jemalloc shared library could not be preloaded to speed up memory allocations', 'WARN [main] 2016-06-15 03:06:41,119 StartupChecks.java:156 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.']] {code} We should be getting those WARNs > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Test >Reporter: Philip Thompson >Assignee: Jim Witschey > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226528#comment-15226528 ] Jim Witschey commented on CASSANDRA-11465: -- Filed https://github.com/riptano/cassandra-dtest/pull/905 for debugging help. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Test >Reporter: Philip Thompson >Assignee: Jim Witschey > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220227#comment-15220227 ] Philip Thompson commented on CASSANDRA-11465: - Also seeing the failure in tracing_default_impl_test http://cassci.datastax.com/job/trunk_novnode_dtest/337/testReport/cql_tracing_test/TestCqlTracing/tracing_default_impl_test/ > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Test >Reporter: Philip Thompson >Assignee: DS Test Eng > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)