[jira] [Comment Edited] (CASSANDRA-16843) List snapshots of dropped tables
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529609#comment-17529609 ] Paulo Motta edited comment on CASSANDRA-16843 at 4/28/22 11:12 PM: --- To provide some contextualization and recap before going into the implementation details, please find a little summary of what end-user changes will be made visible by this patch. This is the current output of {{nodetool listsnapshots}} before this patch: {noformat} Snapshot Details: Snapshot name Keyspace name Column family name True size Size on disk Creation timeExpiration time test ksindexed_table 9.83 KiB 21.22 KiB 2022-04-26T19:13:20.102Z test ksmy_table 9.83 KiB 10.76 KiB 2022-04-26T19:13:20.102Z Total TrueDiskSpaceUsed: 19.65 KiB {noformat} *The main problem being solved by this patch is that snapshots from dropped tables are omitted from this output.* In addition to this, there are 2 additional issues with the previous output: 1) Snapshot "true size" column does not include {{manifest.json}} and {{schema.cql}} file sizes. This can be observed by the mismatching numbers in the "true size" (9.83 KiB) and "size on disk" (10.76 KiB) columns of {{my_table}}. 2) Snapshot "true size" of table with secondary index ({{indexed_table}}) does not include secondary index files (CASSANDRA-17357). This can be observed by the "true size" being 9.83 KiB while the "size on disk" is 21.22 KiB. After this patch, the following output is displayed for the same data: {noformat} Snapshot Details: Snapshot name Keyspace name Column family name True size Size on disk Creation timeExpiration time test ksindexed_table 21.22 KiB 21.22 KiB2022-04-26T19:13:20.102Z test ksmy_table 10.76 KiB 10.76 KiB2022-04-26T19:13:20.102Z dropped-1650997415751-my_table ksmy_table 989 bytes 989 bytes2022-04-26T18:23:35.751Z Total TrueDiskSpaceUsed: 32.95 KiB {noformat} The new output after this patch shows the snapshot "true size" equal to the "size on disk" when there are no live sstables. (will follow-up with implementation details on next comment) was (Author: paulo): To provide some contextualization and recap before going into the implementation details, please find a little summary of what end-user changes will be made visible by this patch. This is the current output of {{nodetool listsnapshots}} before this patch: {noformat} Snapshot Details: Snapshot name Keyspace name Column family name True size Size on disk Creation timeExpiration time test ksindexed_table 9.83 KiB 21.22 KiB 2022-04-26T19:13:20.102Z test ksmy_table 9.83 KiB 10.76 KiB 2022-04-26T19:13:20.102Z Total TrueDiskSpaceUsed: 19.65 KiB {noformat} *The main problem being solved by this patch is that snapshots from dropped tables are omitted from this output.* In addition to this, there are 2 additional issues with the previous output: 1) Snapshot "true size" column does not include {{manifest.json}} and {{schema.cql}} file sizes. This can be observed by the mismatching numbers in the "true size" and "size on disk" columns of {{my_table}}. 2) Snapshot "true size" of table with secondary index ({{indexed_table}}) does not include secondary index files (CASSANDRA-17357). This can be observed by the "true size" being 9.83 KiB while the "size on disk" is 21.22 KiB. After this patch, the following output is displayed for the same data: {noformat} Snapshot Details: Snapshot name Keyspace name Column family name True size Size on disk Creation timeExpiration time test ksindexed_table 21.22 KiB 21.22 KiB2022-04-26T19:13:20.102Z test ksmy_table 10.76 KiB 10.76 KiB2022-04-26T19:13:20.102Z dropped-1650997415751-my_table ksmy_table 989 bytes 989 bytes2022-04-26T18:23:35.751Z Total TrueDiskSpaceUsed: 32.95 KiB {noformat} The new output after this patch shows the snapshot "true size" equal to the "size on disk" when there are no live sstables. (will follow-up with implementation details on next comment) > List snapshots of dropped tables > > > Key: CASSANDRA-16843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16843 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Jam
[jira] [Comment Edited] (CASSANDRA-16843) List snapshots of dropped tables
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529680#comment-17529680 ] Paulo Motta edited comment on CASSANDRA-16843 at 4/28/22 11:08 PM: --- The reason why snapshots of "dropped tables" are omitted from the "nodetool listsnapshots" output above is because the prior implementation relied on the mechanics of [ColumnFamilyStore|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2240=] and [Directories|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L964=] to list snapshots. Since dropped tables no longer have an associated {{ColumnFamilyStore}} object, it's not possible to list snapshots of dropped tables in the current implementation. [This patch|https://github.com/apache/cassandra/pull/1595] re-architects the snapshot listing logic to be fully decoupled from {{{}ColumnFamilyStore{}}}/{{{}Directories{}}} and rely solely on the snapshot directory structure, which currently has this format: * {{$data_dir/$ks_name/$table_name-$table_uuid/snapshots/$tag}} The new snapshot discovery logic is mostly contained in the [SnapshotLoader|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/SnapshotLoader.java] class, which traverses the data directory [looking for snapshot directories matching the pattern above|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/SnapshotLoader.java#L102=]. I updated [StorageService.getSnapshotDetails|https://github.com/apache/cassandra/pull/1595/files#diff-9bf2c26bc294ef9085e16bf287490223665eaa2eb8ec24bcf5bd8653c713644bR4131] which is used by {{nodetool listsnapshots}} to use new {{SnapshotLoader}} class to list snapshots. The snapshot true size computation was previously dependent on logic from [Directories|https://github.com/apache/cassandra/blob/bb3749f2bb8282f67375c67712d8e3ca1f085879/src/java/org/apache/cassandra/db/Directories.java#L1153], so in order to fully decouple snapshot listing from {{Directories}}, I [simplified the computation of snapshot true size|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/TableSnapshot.java#L282] to only include files which do not have a corresponding "live" file on {{{}$data_dir/$ks_name/$table_name-$table_uuid{}}}. This simplification to the snapshot true size computation fixed two additional issues with the previous implementation (illustrated with examples in the previous comment): 1) Snapshot true size did not include "schema.cql" and "manifest.json" sizes 2) Snapshot true size did not include secondary indexes (CASSANDRA-17357) I performed other simplifications and refactorings along the way, but given the proximity to the 4.1 freeze, I prepared a leaner version of the original patch to facilitate review. After this is merged I will prepare another set of follow-up patches (for next release) with refactorings and simplifications in the snapshot management module that will be enabled by this change. Testing: - [dtest to check if snapshot of dropped tables are included on listsnapshots|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/test/distributed/org/apache/cassandra/distributed/test/SnapshotsTest.java#L195=] - [SnapshotLoaderTest|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/test/unit/org/apache/cassandra/service/snapshot/SnapshotLoaderTest.java] - [Test to check that manifest and schema file sizes are included in true size computation|https://github.com/apache/cassandra/pull/1595/files#diff-ef5be0b69d0440b76021282c4b24bad69770ef9419be260df2169f49921db377R291] - [Update DirectoriesTest.testSecondaryIndexDirectories to include 2i on true size computation|https://github.com/apache/cassandra/pull/1595/files#diff-1948a455b59a97d8d1ab3d2cb5388190c1cbb8e8081e3ac97bfc0c51a7ef64e3R421] - [testGetLiveFileFromSnapshotFile (used by new true size computation)|https://github.com/apache/cassandra/pull/1595/files#diff-d349fb289ec10bece5531f1630cd2bcc55665b5cf3cd59cfcfb4dc93f288a571R233] was (Author: paulo): The reason why snapshots of "dropped tables" are omitted from the "nodetool listsnapshots" output above is because the prior implementation relied on the mechanics of [ColumnFamilyStore|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2240=] and [Directories|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L964=] to list snapshots. Since dropped tables no longer have
[jira] [Comment Edited] (CASSANDRA-16843) List snapshots of dropped tables
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529680#comment-17529680 ] Paulo Motta edited comment on CASSANDRA-16843 at 4/28/22 11:04 PM: --- The reason why snapshots of "dropped tables" are omitted from the "nodetool listsnapshots" output above is because the prior implementation relied on the mechanics of [ColumnFamilyStore|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2240=] and [Directories|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L964=] to list snapshots. Since dropped tables no longer have an associated {{ColumnFamilyStore}} object, it's not possible to list snapshots of dropped tables in the current implementation. [This patch|https://github.com/apache/cassandra/pull/1595] re-architects the snapshot listing logic to be fully decoupled from {{{}ColumnFamilyStore{}}}/{{{}Directories{}}} and rely solely on the snapshot directory structure, which currently has this format: * {{$data_dir/$ks_name/$table_name-$table_uuid/snapshots/$tag}} The new snapshot discovery logic is mostly contained in the [SnapshotLoader|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/SnapshotLoader.java] class, which traverses the data directory [looking for snapshot directories matching the pattern above|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/SnapshotLoader.java#L102=]. I updated [StorageService.getSnapshotDetails|https://github.com/apache/cassandra/pull/1595/files#diff-9bf2c26bc294ef9085e16bf287490223665eaa2eb8ec24bcf5bd8653c713644bR4131] which is used by {{nodetool listsnapshots}} to use new {{SnapshotLoader}} class to load snapshots. The snapshot true size computation was previously dependent on logic from [Directories|https://github.com/apache/cassandra/blob/bb3749f2bb8282f67375c67712d8e3ca1f085879/src/java/org/apache/cassandra/db/Directories.java#L1153], so in order to fully decouple snapshot listing from {{Directories}}, I [simplified the computation of snapshot true size|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/TableSnapshot.java#L282] to only include files which do not have a corresponding "live" file on {{{}$data_dir/$ks_name/$table_name-$table_uuid{}}}. This simplification to the snapshot true size computation fixed two additional issues with the previous implementation (illustrated with examples in the previous comment): 1) Snapshot true size did not include "schema.cql" and "manifest.json" sizes 2) Snapshot true size did not include secondary indexes (CASSANDRA-17357) I performed other simplifications and refactorings along the way, but given the proximity to the 4.1 freeze, I prepared a leaner version of the original patch to facilitate review. After this is merged I will prepare another set of follow-up patches (for next release) with refactorings and simplifications in the snapshot management module that will be enabled by this change. Testing: - [dtest to check if snapshot of dropped tables are included on listsnapshots|https://github.com/apache/cassandra/pull/1595/files#diff-35dcc7dbb180da51d4f548e79f31ba45fb7beb7dbeec27663053817619efff1bR195] - [SnapshotLoaderTest|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/test/unit/org/apache/cassandra/service/snapshot/SnapshotLoaderTest.java] - [Test to check that manifest and schema file sizes are included in true size computation|https://github.com/apache/cassandra/pull/1595/files#diff-ef5be0b69d0440b76021282c4b24bad69770ef9419be260df2169f49921db377R291] - [Update DirectoriesTest.testSecondaryIndexDirectories to include 2i on true size computation|https://github.com/apache/cassandra/pull/1595/files#diff-1948a455b59a97d8d1ab3d2cb5388190c1cbb8e8081e3ac97bfc0c51a7ef64e3R421] - [testGetLiveFileFromSnapshotFile (used by new true size computation)|https://github.com/apache/cassandra/pull/1595/files#diff-d349fb289ec10bece5531f1630cd2bcc55665b5cf3cd59cfcfb4dc93f288a571R233] was (Author: paulo): The reason why snapshots of "dropped tables" are omitted from the "nodetool listsnapshots" output above is because the prior implementation relied on the mechanics of [ColumnFamilyStore|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2240=] and [Directories|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L964=] to list snapshots. Since dropped tables no longer have an associated {{ColumnFamily
[jira] [Updated] (CASSANDRA-16843) List snapshots of dropped tables
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-16843: Status: Patch Available (was: In Progress) > List snapshots of dropped tables > > > Key: CASSANDRA-16843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16843 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: James Brown >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1 > > Time Spent: 10m > Remaining Estimate: 0h > > Auto snapshots from dropped tables don't seem to show up in {{nodetool > listsnapshots}} (even though they do get cleared by {{nodetool > clearsnapshot}}). This makes them kind of annoying to clean up, since you > need to muck about in the data directory to find them. > Erick on the mailing list said that this seems to be an oversight and that > clearsnapshot was fixed by > [CASSANDRA-6418|https://issues.apache.org/jira/browse/CASSANDRA-6418]. > I reproduced this both on 3.11.11 and 4.0.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16843) List snapshots of dropped tables
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-16843: Source Control Link: https://github.com/apache/cassandra/pull/1595 > List snapshots of dropped tables > > > Key: CASSANDRA-16843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16843 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: James Brown >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1 > > Time Spent: 10m > Remaining Estimate: 0h > > Auto snapshots from dropped tables don't seem to show up in {{nodetool > listsnapshots}} (even though they do get cleared by {{nodetool > clearsnapshot}}). This makes them kind of annoying to clean up, since you > need to muck about in the data directory to find them. > Erick on the mailing list said that this seems to be an oversight and that > clearsnapshot was fixed by > [CASSANDRA-6418|https://issues.apache.org/jira/browse/CASSANDRA-6418]. > I reproduced this both on 3.11.11 and 4.0.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16843) List snapshots of dropped tables
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529681#comment-17529681 ] Paulo Motta commented on CASSANDRA-16843: - [~brandon.williams] [~smiklosovic] This is finally ready for a final round of review and I apologize for the delay. Please check the 2 previous comments for context. Even though I'd like to get this in, I will understand if you're not able to get to this before the 4.1 freeze. |[trunk|https://github.com/apache/cassandra/pull/1595]|[tests|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1645/]| > List snapshots of dropped tables > > > Key: CASSANDRA-16843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16843 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: James Brown >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1 > > Time Spent: 10m > Remaining Estimate: 0h > > Auto snapshots from dropped tables don't seem to show up in {{nodetool > listsnapshots}} (even though they do get cleared by {{nodetool > clearsnapshot}}). This makes them kind of annoying to clean up, since you > need to muck about in the data directory to find them. > Erick on the mailing list said that this seems to be an oversight and that > clearsnapshot was fixed by > [CASSANDRA-6418|https://issues.apache.org/jira/browse/CASSANDRA-6418]. > I reproduced this both on 3.11.11 and 4.0.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16843) List snapshots of dropped tables
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529680#comment-17529680 ] Paulo Motta commented on CASSANDRA-16843: - The reason why snapshots of "dropped tables" are omitted from the "nodetool listsnapshots" output above is because the prior implementation relied on the mechanics of [ColumnFamilyStore|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2240=] and [Directories|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L964=] to list snapshots. Since dropped tables no longer have an associated {{ColumnFamilyStore}} object, it's not possible to list snapshots of dropped tables in the current implementation. [This patch|https://github.com/apache/cassandra/pull/1595] re-architects the snapshot listing logic to be fully decoupled from {{{}ColumnFamilyStore{}}}/{{{}Directories{}}} and rely solely on the snapshot directory structure, which currently has this format: * {{$data_dir/$ks_name/$table_name-$table_uuid/snapshots/$tag}} The new snapshot discovery logic is mostly contained in the [SnapshotLoader|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/SnapshotLoader.java] class, which traverses the data directory [looking for snapshot directories matching the pattern above|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/src/java/org/apache/cassandra/service/snapshot/SnapshotLoader.java#L102=]. I updated [StorageService.getSnapshotDetails|https://github.com/apache/cassandra/pull/1595/files#diff-9bf2c26bc294ef9085e16bf287490223665eaa2eb8ec24bcf5bd8653c713644bR4131] which is used by {{nodetool listsnapshots}} to use new {{SnapshotLoader}} class to load snapshots. The snapshot true size computation was previously dependent on logic from [Directories|https://github.com/apache/cassandra/blob/bb3749f2bb8282f67375c67712d8e3ca1f085879/src/java/org/apache/cassandra/db/Directories.java#L1153], so in order to fully decouple snapshot listing from {{{}Directories{}}}, I [simplified the computation of snapshot true size|https://github.com/apache/cassandra/pull/1595/files#diff-7d6d1bafcad95c5715c91c9065a4a8c58c3d5c98d0699d9c913717f5c0086bb7L114] to only include files which do not have a corresponding "live" file on {{{}$data_dir/$ks_name/$table_name-$table_uuid{}}}. This simplification to the snapshot true size computation fixed two additional issues with the previous implementation (illustrated with examples in the previous comment): 1) Snapshot true size did not include "schema.cql" and "manifest.json" sizes 2) Snapshot true size did not include secondary indexes (CASSANDRA-17357) I performed other simplifications and refactorings along the way, but given the proximity to the 4.1 freeze, I prepared a leaner version of the original patch to facilitate review. After this is merged I will prepare another set of follow-up patches (for next release) with refactorings and simplifications in the snapshot management module that will be enabled by this change. Testing: - [dtest to check if snapshot of dropped tables are included on listsnapshots|https://github.com/apache/cassandra/pull/1595/files#diff-35dcc7dbb180da51d4f548e79f31ba45fb7beb7dbeec27663053817619efff1bR195] - [SnapshotLoaderTest|https://github.com/apache/cassandra/blob/993190ada5b65b79c5b7ca707d436a6ceff7abcf/test/unit/org/apache/cassandra/service/snapshot/SnapshotLoaderTest.java] - [Test to check that manifest and schema file sizes are included in true size computation|https://github.com/apache/cassandra/pull/1595/files#diff-ef5be0b69d0440b76021282c4b24bad69770ef9419be260df2169f49921db377R291] - [Update DirectoriesTest.testSecondaryIndexDirectories to include 2i on true size computation|https://github.com/apache/cassandra/pull/1595/files#diff-1948a455b59a97d8d1ab3d2cb5388190c1cbb8e8081e3ac97bfc0c51a7ef64e3R421] - [testGetLiveFileFromSnapshotFile (used by new true size computation)|https://github.com/apache/cassandra/pull/1595/files#diff-d349fb289ec10bece5531f1630cd2bcc55665b5cf3cd59cfcfb4dc93f288a571R233] > List snapshots of dropped tables > > > Key: CASSANDRA-16843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16843 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: James Brown >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1 > > Time Spent: 10m > Remaining Estimate: 0h > > Auto snapshots from dropped tables don't seem to show up in {{nodetool > listsnapshots}} (even though th
[jira] [Commented] (CASSANDRA-16843) List snapshots of dropped tables
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529609#comment-17529609 ] Paulo Motta commented on CASSANDRA-16843: - To provide some contextualization and recap before going into the implementation details, please find a little summary of what end-user changes will be made visible by this patch. This is the current output of {{nodetool listsnapshots}} before this patch: {noformat} Snapshot Details: Snapshot name Keyspace name Column family name True size Size on disk Creation timeExpiration time test ksindexed_table 9.83 KiB 21.22 KiB 2022-04-26T19:13:20.102Z test ksmy_table 9.83 KiB 10.76 KiB 2022-04-26T19:13:20.102Z Total TrueDiskSpaceUsed: 19.65 KiB {noformat} *The main problem being solved by this patch is that snapshots from dropped tables are omitted from this output.* In addition to this, there are 2 additional issues with the previous output: 1) Snapshot "true size" column does not include {{manifest.json}} and {{schema.cql}} file sizes. This can be observed by the mismatching numbers in the "true size" and "size on disk" columns of {{my_table}}. 2) Snapshot "true size" of table with secondary index ({{indexed_table}}) does not include secondary index files (CASSANDRA-17357). This can be observed by the "true size" being 9.83 KiB while the "size on disk" is 21.22 KiB. After this patch, the following output is displayed for the same data: {noformat} Snapshot Details: Snapshot name Keyspace name Column family name True size Size on disk Creation timeExpiration time test ksindexed_table 21.22 KiB 21.22 KiB2022-04-26T19:13:20.102Z test ksmy_table 10.76 KiB 10.76 KiB2022-04-26T19:13:20.102Z dropped-1650997415751-my_table ksmy_table 989 bytes 989 bytes2022-04-26T18:23:35.751Z Total TrueDiskSpaceUsed: 32.95 KiB {noformat} The new output after this patch shows the snapshot "true size" equal to the "size on disk" when there are no live sstables. (will follow-up with implementation details on next comment) > List snapshots of dropped tables > > > Key: CASSANDRA-16843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16843 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: James Brown >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1 > > Time Spent: 10m > Remaining Estimate: 0h > > Auto snapshots from dropped tables don't seem to show up in {{nodetool > listsnapshots}} (even though they do get cleared by {{nodetool > clearsnapshot}}). This makes them kind of annoying to clean up, since you > need to muck about in the data directory to find them. > Erick on the mailing list said that this seems to be an oversight and that > clearsnapshot was fixed by > [CASSANDRA-6418|https://issues.apache.org/jira/browse/CASSANDRA-6418]. > I reproduced this both on 3.11.11 and 4.0.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data
[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529588#comment-17529588 ] Paulo Motta commented on CASSANDRA-17180: - I plan to take a final look at this today. > Implement startup check to prevent Cassandra start to spread zombie data > > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1 > > Time Spent: 11.5h > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-17586) keyspace nodetool compact does not compact 2i
Paulo Motta created CASSANDRA-17586: --- Summary: keyspace nodetool compact does not compact 2i Key: CASSANDRA-17586 URL: https://issues.apache.org/jira/browse/CASSANDRA-17586 Project: Cassandra Issue Type: Bug Components: Feature/2i Index, Tool/nodetool Reporter: Paulo Motta Not sure if this is a bug or working as intended, but at least on {{trunk}} {{nodetool compact ks}} does not compact secondary indexes. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16843) List snapshots of dropped tables
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529043#comment-17529043 ] Paulo Motta commented on CASSANDRA-16843: - It's in progress, not ready for review yet. > List snapshots of dropped tables > > > Key: CASSANDRA-16843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16843 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: James Brown >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1 > > Time Spent: 10m > Remaining Estimate: 0h > > Auto snapshots from dropped tables don't seem to show up in {{nodetool > listsnapshots}} (even though they do get cleared by {{nodetool > clearsnapshot}}). This makes them kind of annoying to clean up, since you > need to muck about in the data directory to find them. > Erick on the mailing list said that this seems to be an oversight and that > clearsnapshot was fixed by > [CASSANDRA-6418|https://issues.apache.org/jira/browse/CASSANDRA-6418]. > I reproduced this both on 3.11.11 and 4.0.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14582) Add a system property to set the cassandra hostId if not yet initialized
[ https://issues.apache.org/jira/browse/CASSANDRA-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-14582: Reviewers: Brandon Williams (was: beobal) > Add a system property to set the cassandra hostId if not yet initialized > > > Key: CASSANDRA-14582 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14582 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: vincent royer >Assignee: Stefan Miklosovic >Priority: Low > Labels: lhf > Fix For: 4.1 > > Time Spent: 4h > Remaining Estimate: 0h > > Add a system property *cassandra.host_id* to set the cassandra hostId if not > yet initialized. > This allow to push the cassandra host ID when provisioning new cassandra > nodes rather than to retreive it after the first start. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14582) Add a system property to set the cassandra hostId if not yet initialized
[ https://issues.apache.org/jira/browse/CASSANDRA-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-14582: Reviewers: beobal (was: Paulo Motta) > Add a system property to set the cassandra hostId if not yet initialized > > > Key: CASSANDRA-14582 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14582 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: vincent royer >Assignee: Stefan Miklosovic >Priority: Low > Labels: lhf > Fix For: 4.1 > > Time Spent: 4h > Remaining Estimate: 0h > > Add a system property *cassandra.host_id* to set the cassandra hostId if not > yet initialized. > This allow to push the cassandra host ID when provisioning new cassandra > nodes rather than to retreive it after the first start. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14582) Add a system property to set the cassandra hostId if not yet initialized
[ https://issues.apache.org/jira/browse/CASSANDRA-14582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529012#comment-17529012 ] Paulo Motta commented on CASSANDRA-14582: - I think you could have kept this ticket closed and created a new one for the revert, to keep the original history. I will not have time to review this revert soon so I'll assign to [~beobal]. > Add a system property to set the cassandra hostId if not yet initialized > > > Key: CASSANDRA-14582 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14582 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: vincent royer >Assignee: Stefan Miklosovic >Priority: Low > Labels: lhf > Fix For: 4.1 > > Time Spent: 4h > Remaining Estimate: 0h > > Add a system property *cassandra.host_id* to set the cassandra hostId if not > yet initialized. > This allow to push the cassandra host ID when provisioning new cassandra > nodes rather than to retreive it after the first start. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16456) Add Plugin Support for CQLSH
[ https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528408#comment-17528408 ] Paulo Motta commented on CASSANDRA-16456: - bq. [~paulo] if you want do to a final pass too. Thanks for the ping. Unfortunately I'll not be able to review this before merge, so don't wait on me. > Add Plugin Support for CQLSH > > > Key: CASSANDRA-16456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16456 > Project: Cassandra > Issue Type: New Feature > Components: Tool/cqlsh >Reporter: Brian Houser >Assignee: Brian Houser >Priority: Normal > Labels: gsoc2021, mentor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently the Cassandra drivers offer a plugin authenticator architecture for > the support of different authentication methods. This has been leveraged to > provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, > cqlsh, the included CLI tool, does not offer such support. Switching to a new > enhanced authentication scheme thus means being cut off from using cqlsh in > normal operation. > We should have a means of using the same plugins and authentication providers > as the Python Cassandra driver. > Here's a link to an initial draft of > [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16456) Add Plugin Support for CQLSH
[ https://issues.apache.org/jira/browse/CASSANDRA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-16456: Reviewers: Brandon Williams, Dinesh Joshi, Stefan Miklosovic (was: Brandon Williams, Dinesh Joshi, Paulo Motta, Stefan Miklosovic) > Add Plugin Support for CQLSH > > > Key: CASSANDRA-16456 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16456 > Project: Cassandra > Issue Type: New Feature > Components: Tool/cqlsh >Reporter: Brian Houser >Assignee: Brian Houser >Priority: Normal > Labels: gsoc2021, mentor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently the Cassandra drivers offer a plugin authenticator architecture for > the support of different authentication methods. This has been leveraged to > provide support for LDAP, Kerberos, and Sigv4 authentication. Unfortunately, > cqlsh, the included CLI tool, does not offer such support. Switching to a new > enhanced authentication scheme thus means being cut off from using cqlsh in > normal operation. > We should have a means of using the same plugins and authentication providers > as the Python Cassandra driver. > Here's a link to an initial draft of > [CEP|https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16843) List snapshots of dropped tables
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528209#comment-17528209 ] Paulo Motta commented on CASSANDRA-16843: - Submitted CI with intermediate patch to gather initial results: https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1631/ > List snapshots of dropped tables > > > Key: CASSANDRA-16843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16843 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: James Brown >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1 > > Time Spent: 10m > Remaining Estimate: 0h > > Auto snapshots from dropped tables don't seem to show up in {{nodetool > listsnapshots}} (even though they do get cleared by {{nodetool > clearsnapshot}}). This makes them kind of annoying to clean up, since you > need to muck about in the data directory to find them. > Erick on the mailing list said that this seems to be an oversight and that > clearsnapshot was fixed by > [CASSANDRA-6418|https://issues.apache.org/jira/browse/CASSANDRA-6418]. > I reproduced this both on 3.11.11 and 4.0.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-16325: Test and Documentation Plan: changes.txt Status: Patch Available (was: Open) > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Dejan Gvozdenac >Priority: Normal > Labels: lhf > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data
[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17180: Fix Version/s: 4.1 > Implement startup check to prevent Cassandra start to spread zombie data > > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1 > > Time Spent: 9.5h > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16843) List snapshots of dropped tables
[ https://issues.apache.org/jira/browse/CASSANDRA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-16843: Fix Version/s: 4.1 (was: 3.11.x) (was: 4.0.x) > List snapshots of dropped tables > > > Key: CASSANDRA-16843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16843 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: James Brown >Assignee: Paulo Motta >Priority: Normal > Fix For: 4.1 > > > Auto snapshots from dropped tables don't seem to show up in {{nodetool > listsnapshots}} (even though they do get cleared by {{nodetool > clearsnapshot}}). This makes them kind of annoying to clean up, since you > need to muck about in the data directory to find them. > Erick on the mailing list said that this seems to be an oversight and that > clearsnapshot was fixed by > [CASSANDRA-6418|https://issues.apache.org/jira/browse/CASSANDRA-6418]. > I reproduced this both on 3.11.11 and 4.0.0. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16790) Add auto_snapshot_ttl configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-16790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-16790: Fix Version/s: 4.1 > Add auto_snapshot_ttl configuration > --- > > Key: CASSANDRA-16790 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16790 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Config, Local/Snapshots >Reporter: Paulo Motta >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1 > > > This property should take a human readable parameter (ie. 6h, 3days). When > specified and {{auto_snapshot: true}}, auto snapshots created should use the > specified TTL. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data
[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526431#comment-17526431 ] Paulo Motta commented on CASSANDRA-17180: - {quote}Can we just use File.setLastModified and File.lastModified to read/write the heartbeat instead? {quote} alternatively we can just write a JSON similar to the snapshot manifest, since we can use existing JSON utilities to read/write the hearbeat file without needing to implement a custom parser. something like this: {noformat} {"last_heartbeat": "2022-04-22T13:33:41Z"} {noformat} we could later augment this json with more info if the need arises. WDYT? > Implement startup check to prevent Cassandra start to spread zombie data > > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 9.5h > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data
[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17180: Status: Open (was: Patch Available) > Implement startup check to prevent Cassandra start to spread zombie data > > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 9.5h > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data
[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526115#comment-17526115 ] Paulo Motta commented on CASSANDRA-17180: - bq. After spending more time on this, I identified an issue Nice catch! bq. I have not detected this by my unit tests because I was, more or less, mocking it but once I actually tried it on the running node, to my surprise it was not detecting the tables which should be causing violations. Can we create a (in-jvm or python) dtest to ensure this is being properly tested and any future regressions caught? bq. I think it is viable to do via "SchemaKeyspace.fetchNonSystemKeyspaces()". Sounds good to me. bq. I am not sure I can make this method publicly visible without any conseqencies yet. I think this should be fine. bq. On the other hand, it will check tables in "system_distributed" as well as "system_auth". These tables do not have gc = 0 and they are not excluded from fetchNonSystemKeyspaces call. that's ok, it's probably a good idea to check these tables anyway. > Implement startup check to prevent Cassandra start to spread zombie data > > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 9.5h > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data
[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526105#comment-17526105 ] Paulo Motta edited comment on CASSANDRA-17180 at 4/21/22 10:01 PM: --- Thanks for addressing initial comments. Finally found some time to look into this more deeply. Please find some follow-up comments below: * I think safety checks should be enabled by default, as long as people can disable it easily. Should we make this startup check enabled by default? We could improve the error message when the check fails to mention the properties to disable the check ({{{}startup_checks.check_data_resurrection.enabled=false{}}}) or ignore specific keyspace/tables ({{{}excluded_tables{}}}/{{{}excluded_keyspaces{}}})? * I didn't like [check-specific logic|https://github.com/apache/cassandra/pull/1351/files#diff-957f2fa6365cb92f19b74347fee7a9f310a07e32c3112f35196dc17462ec7269R511] on CassandraDaemon to schedule the heartbeat. I implemented this [suggestion|https://github.com/apache/cassandra/commit/0b3557dd43255538942a86f63dec4c36272f25e9] to move check post-action to StartupCheck class - what do you think? * Can we rename {{GcGraceSecondsOnStartupCheck}} class to {{CheckDataResurrection}} to be consistent with the check name ? * Can we make the default heartbeat file be stored on the storage directory (ie. {{{}DD.getLocalSystemKeyspacesDataFileLocations(){}}}) ? In some deployments the cassandra directory is non-writable. * I don't like adding [custom logic|https://github.com/apache/cassandra/pull/1351/files#diff-f375982492d2426d26da68e105a44d397568be76361e8156fe299e875b8041ffR214] to read/write the hearbeat file - since this is error-prone and we're just interested in the timestamp value and not the file format. Can we just use [File.setLastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#setLastModified(long)] and [File.lastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#lastModified()] to read/write the heartbeat instead? was (Author: paulo): Thanks for addressing initial comments. Finally found some time to look into this more deeply. Please find some follow-up comments below: * I think safety checks should be enabled by default, as long as people can disable it easily. Should we make this startup check enabled by default? We could improve the error message when the check fails to mention the properties to disable the check ({{startup_checks.check_data_resurrection.enabled=false}}) or ignore specific keyspace/tables ({{excluded_tables}}/{{excluded_keyspaces}})? * I didn't like check-specific logic on CassandraDaemon to schedule the heartbeat. I implemented this [suggestion|https://github.com/apache/cassandra/commit/0b3557dd43255538942a86f63dec4c36272f25e9] to move check post-action to StartupCheck class - what do you think? * Can we rename {{GcGraceSecondsOnStartupCheck}} class to {{CheckDataResurrection}} to be consistent with the check name ? * Can we make the default heartbeat file be stored on the storage directory (ie. {{DD.getLocalSystemKeyspacesDataFileLocations()}}) ? In some deployments the cassandra directory is non-writable. * I don't like adding [custom logic|https://github.com/apache/cassandra/pull/1351/files#diff-f375982492d2426d26da68e105a44d397568be76361e8156fe299e875b8041ffR214] to read/write the hearbeat file - since this is error-prone and we're just interested in the timestamp value and not the file format. Can we just use [File.setLastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#setLastModified(long)] and [File.lastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#lastModified()] to read/write the heartbeat instead? > Implement startup check to prevent Cassandra start to spread zombie data > > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 9.5h > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw
[jira] [Commented] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data
[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526105#comment-17526105 ] Paulo Motta commented on CASSANDRA-17180: - Thanks for addressing initial comments. Finally found some time to look into this more deeply. Please find some follow-up comments below: * I think safety checks should be enabled by default, as long as people can disable it easily. Should we make this startup check enabled by default? We could improve the error message when the check fails to mention the properties to disable the check ({{startup_checks.check_data_resurrection.enabled=false}}) or ignore specific keyspace/tables ({{excluded_tables}}/{{excluded_keyspaces}})? * I didn't like check-specific logic on CassandraDaemon to schedule the heartbeat. I implemented this [suggestion|https://github.com/apache/cassandra/commit/0b3557dd43255538942a86f63dec4c36272f25e9] to move check post-action to StartupCheck class - what do you think? * Can we rename {{GcGraceSecondsOnStartupCheck}} class to {{CheckDataResurrection}} to be consistent with the check name ? * Can we make the default heartbeat file be stored on the storage directory (ie. {{DD.getLocalSystemKeyspacesDataFileLocations()}}) ? In some deployments the cassandra directory is non-writable. * I don't like adding [custom logic|https://github.com/apache/cassandra/pull/1351/files#diff-f375982492d2426d26da68e105a44d397568be76361e8156fe299e875b8041ffR214] to read/write the hearbeat file - since this is error-prone and we're just interested in the timestamp value and not the file format. Can we just use [File.setLastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#setLastModified(long)] and [File.lastModified|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#lastModified()] to read/write the heartbeat instead? > Implement startup check to prevent Cassandra start to spread zombie data > > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 9.5h > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17180) Implement startup check to prevent Cassandra start to spread zombie data
[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525316#comment-17525316 ] Paulo Motta commented on CASSANDRA-17180: - I cannot review this today but I *hope* to follow-up this by tomorrow or Friday if nobody else gets to it before. > Implement startup check to prevent Cassandra start to spread zombie data > > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 9.5h > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17568) Tool to list data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525314#comment-17525314 ] Paulo Motta commented on CASSANDRA-17568: - Btw this can probably be worked in parallel with CASSANDRA-16843 given I don't expect much changes on the published patch. > Tool to list data directories > - > > Key: CASSANDRA-17568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17568 > Project: Cassandra > Issue Type: New Feature > Components: Tool/nodetool >Reporter: Tibor Repasi >Assignee: Tibor Repasi >Priority: Normal > Fix For: 4.x > > > When a table is created, dropped and re-created with the same name, > directories remain within data paths. Operators may be challenged finding out > which directories belong to existing tables and which may be subject to > removal. However, the information is available in CQL as well as in MBeans > via JMX, a convenient access to this information is still missing. > My proposal is a new nodetool subcommand allowing to list data paths of all > existing tables. > {code} > % bin/nodetool datapaths -- example > Keyspace : example > Table : test > Paths : > > /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301 > > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17568) Tool to list data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525313#comment-17525313 ] Paulo Motta commented on CASSANDRA-17568: - {quote}I am not completely sure we manage to get this one in in forseeable future. {quote} I don't see why we can't get this by 4.1 if [~rtib] addresses outstanding review comments and does not conflict with CASSANDRA-16843. Even though we have a feature freeze at May 1st we still have 10 days left to get things in. > Tool to list data directories > - > > Key: CASSANDRA-17568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17568 > Project: Cassandra > Issue Type: New Feature > Components: Tool/nodetool >Reporter: Tibor Repasi >Assignee: Tibor Repasi >Priority: Normal > Fix For: 4.x > > > When a table is created, dropped and re-created with the same name, > directories remain within data paths. Operators may be challenged finding out > which directories belong to existing tables and which may be subject to > removal. However, the information is available in CQL as well as in MBeans > via JMX, a convenient access to this information is still missing. > My proposal is a new nodetool subcommand allowing to list data paths of all > existing tables. > {code} > % bin/nodetool datapaths -- example > Keyspace : example > Table : test > Paths : > > /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301 > > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17568) Tool to list data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-17568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525290#comment-17525290 ] Paulo Motta commented on CASSANDRA-17568: - bq. The refactorisation he was doing was also done due to the fact that right now you can not list snapshots of dropped tables because Cassandra does not "see" it anymore when they are dropped. Hence I think we need to first move Paulo's work forward and once done, we would expose the information what tables are not meant to be there anymore - which would be your list. This logic is available on this [SnapshotFinder class|https://github.com/apache/cassandra/blob/2b1ec31885908b1199a93127668b2a4fd422a2c6/src/java/org/apache/cassandra/service/snapshot/SnapshotFinder.java] from CASSANDRA-16843 (planning to wrap this up soon for 4.1). Not sure if this is a blocker to this ticket or if both efforts do not conflict with each other. > Tool to list data directories > - > > Key: CASSANDRA-17568 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17568 > Project: Cassandra > Issue Type: New Feature > Components: Tool/nodetool >Reporter: Tibor Repasi >Assignee: Tibor Repasi >Priority: Normal > Fix For: 4.x > > > When a table is created, dropped and re-created with the same name, > directories remain within data paths. Operators may be challenged finding out > which directories belong to existing tables and which may be subject to > removal. However, the information is available in CQL as well as in MBeans > via JMX, a convenient access to this information is still missing. > My proposal is a new nodetool subcommand allowing to list data paths of all > existing tables. > {code} > % bin/nodetool datapaths -- example > Keyspace : example > Table : test > Paths : > > /var/lib/cassandra/data/example/test-02f5b8d0c0e311ecb327ff24df5ab301 > > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17493) Shutdown all ScheduledExecutors as part of node drainage
[ https://issues.apache.org/jira/browse/CASSANDRA-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520546#comment-17520546 ] Paulo Motta commented on CASSANDRA-17493: - Is this error [found here|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1590/testReport/org.apache.cassandra.distributed.test/CasCriticalSectionTest/criticalSectionTest/] related to this change? I don't think so but just checking. {code:none} ERROR 22:18:29 Exception in thread Thread[MutationStage-1,5,SharedPool] java.lang.RuntimeException: java.lang.IllegalStateException: HintsService is shut down and can't accept new hints at org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2577) at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81) at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47) at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:120) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalStateException: HintsService is shut down and can't accept new hints at org.apache.cassandra.hints.HintsService.write(HintsService.java:165) at org.apache.cassandra.service.StorageProxy$7.runMayThrow(StorageProxy.java:2656) at org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2573) ... 6 common frames omitted ERROR [MutationStage-1] node1 2022-04-09 22:18:29,622 JVMStabilityInspector.java:68 - Exception in thread Thread[MutationStage-1,5,SharedPool] java.lang.RuntimeException: java.lang.IllegalStateException: HintsService is shut down and can't accept new hints at org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2577) at org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:81) at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:47) at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:57) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:120) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalStateException: HintsService is shut down and can't accept new hints at org.apache.cassandra.hints.HintsService.write(HintsService.java:165) at org.apache.cassandra.service.StorageProxy$7.runMayThrow(StorageProxy.java:2656) at org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2573) ... 6 common frames omitted {code} > Shutdown all ScheduledExecutors as part of node drainage > > > Key: CASSANDRA-17493 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17493 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > We are currently shutting down only non-periodic executors in > StorageService#drain. We should shut down all of them. As of now, there does > not seem to be any reason why these executors should be active. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17493) Shutdown all ScheduledExecutors as part of node drainage
[ https://issues.apache.org/jira/browse/CASSANDRA-17493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520529#comment-17520529 ] Paulo Motta commented on CASSANDRA-17493: - Looks good to me - can you just clarify why this was changed? {code:java} if (isShutDown) -throw new IllegalStateException("HintsService has already been shut down"); +{ +logger.warn("HintsService has already been shut down"); +return; +} {code} > Shutdown all ScheduledExecutors as part of node drainage > > > Key: CASSANDRA-17493 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17493 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > We are currently shutting down only non-periodic executors in > StorageService#drain. We should shut down all of them. As of now, there does > not seem to be any reason why these executors should be active. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17499) Remove global Guardrails Enable flag
[ https://issues.apache.org/jira/browse/CASSANDRA-17499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17514636#comment-17514636 ] Paulo Motta commented on CASSANDRA-17499: - bq. However, we don't have that set of recommended defaults and, instead, every guardrail is disabled by default, so currently the global flag doesn't add that much value. This makes sense. Thanks for the detailed explanation! > Remove global Guardrails Enable flag > > > Key: CASSANDRA-17499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17499 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Guardrails >Reporter: Savni Nagarkar >Assignee: Savni Nagarkar >Priority: Normal > Fix For: 4.x > > Time Spent: 20m > Remaining Estimate: 0h > > This ticket removes the global Guardrails enable flag. Currently the flag > turns all Guardrails on and off regardless of the individual setting of the > guardrail property. This presents a problem for maximum replication factor > and minimum replication factor configurations which will soon be moved to > guardrails. Those configurations will always need to be enabled so no > problems arise as Cassandra users create keyspaces. This ensures all > Guardrail properties follow the same enable / disable procedure. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17499) Remove global Guardrails Enable flag
[ https://issues.apache.org/jira/browse/CASSANDRA-17499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17514371#comment-17514371 ] Paulo Motta commented on CASSANDRA-17499: - Can you elaborate on why we need to remove the ability of globally disabling guardrails? If someone choses to disable guardrails, then it would also disable the maximum replication factor settings. Why is this a problem? > Remove global Guardrails Enable flag > > > Key: CASSANDRA-17499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17499 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Guardrails >Reporter: Savni Nagarkar >Assignee: Savni Nagarkar >Priority: Normal > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket removes the global Guardrails enable flag. Currently the flag > turns all Guardrails on and off regardless of the individual setting of the > guardrail property. This presents a problem for maximum replication factor > and minimum replication factor configurations which will soon be moved to > guardrails. Those configurations will always need to be enabled so no > problems arise as Cassandra users create keyspaces. This ensures all > Guardrail properties follow the same enable / disable procedure. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17380) Add support for EXPLAIN statements
[ https://issues.apache.org/jira/browse/CASSANDRA-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511343#comment-17511343 ] Paulo Motta commented on CASSANDRA-17380: - ok, thanks for checking - I wasn't aware. Sent and invite to your email. > Add support for EXPLAIN statements > -- > > Key: CASSANDRA-17380 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17380 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Lerer >Priority: Normal > Labels: gsoc, gsoc2022 > > We should provide users a way to understand how their query will be executed > and some information on the amount of work that will be performed. > Explain statements are the most common way to do that. > A CEP Draft has been open for that: [(DRAFT) CEP-4: > Explain|https://docs.google.com/document/d/1s_gc4TDYdDbHnYHHVxxjqVVUn3MONUqG6W2JehnC11g/edit]. > This draft propose to add support for {{EXPLAIN}} and {{EXPLAIN ANALYZE}} > but I believe that we should split the work in 2 parts because a simple > {{EXPLAIN}} would already provide relevant information. > To complete this work I believe that the following steps will be required: > * Rework and submit the CEP > * Add missing statistics > * Implements the logic behind the EXPLAIN statements -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17380) Add support for EXPLAIN statements
[ https://issues.apache.org/jira/browse/CASSANDRA-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511337#comment-17511337 ] Paulo Motta commented on CASSANDRA-17380: - Did you try using a google account on [https://the-asf.slack.com/signup] ? > Add support for EXPLAIN statements > -- > > Key: CASSANDRA-17380 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17380 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Lerer >Priority: Normal > Labels: gsoc, gsoc2022 > > We should provide users a way to understand how their query will be executed > and some information on the amount of work that will be performed. > Explain statements are the most common way to do that. > A CEP Draft has been open for that: [(DRAFT) CEP-4: > Explain|https://docs.google.com/document/d/1s_gc4TDYdDbHnYHHVxxjqVVUn3MONUqG6W2JehnC11g/edit]. > This draft propose to add support for {{EXPLAIN}} and {{EXPLAIN ANALYZE}} > but I believe that we should split the work in 2 parts because a simple > {{EXPLAIN}} would already provide relevant information. > To complete this work I believe that the following steps will be required: > * Rework and submit the CEP > * Add missing statistics > * Implements the logic behind the EXPLAIN statements -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17380) Add support for EXPLAIN statements
[ https://issues.apache.org/jira/browse/CASSANDRA-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511326#comment-17511326 ] Paulo Motta commented on CASSANDRA-17380: - [~gimhana.ds] You can register on the-asf.slack.com, no invitation needed. Say hello on #cassandra-dev or #cassandra-gsoc. > Add support for EXPLAIN statements > -- > > Key: CASSANDRA-17380 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17380 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Lerer >Priority: Normal > Labels: gsoc, gsoc2022 > > We should provide users a way to understand how their query will be executed > and some information on the amount of work that will be performed. > Explain statements are the most common way to do that. > A CEP Draft has been open for that: [(DRAFT) CEP-4: > Explain|https://docs.google.com/document/d/1s_gc4TDYdDbHnYHHVxxjqVVUn3MONUqG6W2JehnC11g/edit]. > This draft propose to add support for {{EXPLAIN}} and {{EXPLAIN ANALYZE}} > but I believe that we should split the work in 2 parts because a simple > {{EXPLAIN}} would already provide relevant information. > To complete this work I believe that the following steps will be required: > * Rework and submit the CEP > * Add missing statistics > * Implements the logic behind the EXPLAIN statements -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16790) Add auto_snapshot_ttl configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-16790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-16790: Reviewers: Paulo Motta (was: Stefan Miklosovic) > Add auto_snapshot_ttl configuration > --- > > Key: CASSANDRA-16790 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16790 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Config, Local/Snapshots >Reporter: Paulo Motta >Assignee: Stefan Miklosovic >Priority: Normal > > This property should take a human readable parameter (ie. 6h, 3days). When > specified and {{auto_snapshot: true}}, auto snapshots created should use the > specified TTL. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17267) Snapshot true size is miscalculated
[ https://issues.apache.org/jira/browse/CASSANDRA-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507173#comment-17507173 ] Paulo Motta commented on CASSANDRA-17267: - [~jmckenzie] I've checked and don't think these failures are related to this change. Did you trigger a re-run? > Snapshot true size is miscalculated > --- > > Key: CASSANDRA-17267 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17267 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Normal > Fix For: 3.11.13, 4.1, 4.0.4 > > > As far as I understand, the snapshot "size on disk" is the total size of the > snapshot, while the "true size" is the (size_on_disk - size_of_live_sstables). > I created a snapshot on a 3.11 node without traffic and I expected the "true > size" to be 0KB since the original sstables were still present, but this > didn't seem to be the case: > {noformat} > $ nodetool listsnapshots > Snapshot Details: > Snapshot name Keyspace name Column family name True size Size on disk > test ks1 tbl1 4.86 KiB 5.69 KiB > Total TrueDiskSpaceUsed: 4.86 KiB > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17440) Report compacted bytes metric periodically
[ https://issues.apache.org/jira/browse/CASSANDRA-17440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506923#comment-17506923 ] Paulo Motta commented on CASSANDRA-17440: - I think the best place to implement this is in [ActiveCompactions|https://github.com/apache/cassandra/blob/bf96367f4d55692017e144980cf17963e31df127/src/java/org/apache/cassandra/db/compaction/ActiveCompactions.java]. Instead of updating compaction progress [only at the end|https://github.com/apache/cassandra/blob/bf96367f4d55692017e144980cf17963e31df127/src/java/org/apache/cassandra/db/compaction/ActiveCompactions.java#L48], we would schedule a periodic reporter task for each active compaction and stop the task when compaction is completed. Something along these lines: {code:java} class ActiveCompactions { void beginCompactions(CompactionInfo.Holder ci) { future = schedule(() -> reportInfo(ci), getProperty("cassandra.compacted_bytes_reporter_period", 1), TimeUnit.MINUTES) } void reportInfo(CompactionInfo.Holder ci) { CompactionInfo info = ci.getCompactionInfo(); long compactedBytes = computeDelta(info); CompactionManager.instance.getMetrics().bytesCompacted.inc(compactedBytes); } void endCompaction(CompactionInfo.Holder ci) { reportInfo(ci): future.cancel(); } } {code} Please let me know what do you think of this approach. > Report compacted bytes metric periodically > -- > > Key: CASSANDRA-17440 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17440 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Paulo Motta >Priority: Normal > > Currently the compacted bytes metrics are incremented only at the end of the > compaction, which can take a long time to update for long running compactions. > We should periodically (ie. every 1 minute, configurable) update the > compacted bytes metric to improve compaction throughput observability. > This is issue is analogous of CASSANDRA-16325 but for compaction instead of > streaming. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-17440) Report compacted bytes metric periodically
Paulo Motta created CASSANDRA-17440: --- Summary: Report compacted bytes metric periodically Key: CASSANDRA-17440 URL: https://issues.apache.org/jira/browse/CASSANDRA-17440 Project: Cassandra Issue Type: Improvement Components: Local/Compaction Reporter: Paulo Motta Currently the compacted bytes metrics are incremented only at the end of the compaction, which can take a long time to update for long running compactions. We should periodically (ie. every 1 minute, configurable) update the compacted bytes metric to improve compaction throughput observability. This is issue is analogous of CASSANDRA-16325 but for compaction instead of streaming. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17267) Snapshot true size is miscalculated
[ https://issues.apache.org/jira/browse/CASSANDRA-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17267: Reviewers: Benjamin Lerer, Brandon Williams (was: Benjamin Lerer) > Snapshot true size is miscalculated > --- > > Key: CASSANDRA-17267 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17267 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Normal > Fix For: 3.11.13, 4.1, 4.0.4 > > > As far as I understand, the snapshot "size on disk" is the total size of the > snapshot, while the "true size" is the (size_on_disk - size_of_live_sstables). > I created a snapshot on a 3.11 node without traffic and I expected the "true > size" to be 0KB since the original sstables were still present, but this > didn't seem to be the case: > {noformat} > $ nodetool listsnapshots > Snapshot Details: > Snapshot name Keyspace name Column family name True size Size on disk > test ks1 tbl1 4.86 KiB 5.69 KiB > Total TrueDiskSpaceUsed: 4.86 KiB > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17267) Snapshot true size is miscalculated
[ https://issues.apache.org/jira/browse/CASSANDRA-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17267: Fix Version/s: 3.11.13 4.1 4.0.4 Source Control Link: https://github.com/apache/cassandra/commit/95a622305722889c321204c4bca68a3517a29aab Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed to cassandra-3.11 branch and merged up to {{trunk}} as {{{}95a622305722889c321204c4bca68a3517a29aab{}}}. > Snapshot true size is miscalculated > --- > > Key: CASSANDRA-17267 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17267 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Normal > Fix For: 3.11.13, 4.1, 4.0.4 > > > As far as I understand, the snapshot "size on disk" is the total size of the > snapshot, while the "true size" is the (size_on_disk - size_of_live_sstables). > I created a snapshot on a 3.11 node without traffic and I expected the "true > size" to be 0KB since the original sstables were still present, but this > didn't seem to be the case: > {noformat} > $ nodetool listsnapshots > Snapshot Details: > Snapshot name Keyspace name Column family name True size Size on disk > test ks1 tbl1 4.86 KiB 5.69 KiB > Total TrueDiskSpaceUsed: 4.86 KiB > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] 01/01: Merge branch 'cassandra-4.0' into trunk
This is an automated email from the ASF dual-hosted git repository. paulo pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 951645a1fc4486f2b941c4eb72c429780badca07 Merge: 3b1ce69 5f50c79 Author: Paulo Motta AuthorDate: Mon Mar 14 17:21:51 2022 -0300 Merge branch 'cassandra-4.0' into trunk CHANGES.txt| 1 + src/java/org/apache/cassandra/db/Directories.java | 6 +-- .../apache/cassandra/db/ColumnFamilyStoreTest.java | 41 ++ .../apache/cassandra/index/sasi/SASIIndexTest.java | 48 +- 4 files changed, 74 insertions(+), 22 deletions(-) diff --cc CHANGES.txt index c57742f,86eae2b..ea1826a --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -133,26 -46,15 +133,27 @@@ Merged from 4.0 * Avoid rewriting all sstables during cleanup when transient replication is enabled (CASSANDRA-16966) * Prevent CQLSH from failure on Python 3.10 (CASSANDRA-16987) * Avoid trying to acquire 0 permits from the rate limiter when taking snapshot (CASSANDRA-16872) - * Upgrade Caffeine to 2.5.6 (CASSANDRA-15153) - * Include SASI components to snapshots (CASSANDRA-15134) - * Fix missed wait latencies in the output of `nodetool tpstats -F` (CASSANDRA-16938) * Remove all the state pollution between tests in SSTableReaderTest (CASSANDRA-16888) * Delay auth setup until after gossip has settled to avoid unavailables on startup (CASSANDRA-16783) - * Fix clustering order logic in CREATE MATERIALIZED VIEW (CASSANDRA-16898) * org.apache.cassandra.db.rows.ArrayCell#unsharedHeapSizeExcludingData includes data twice (CASSANDRA-16900) + * Fix clustering order logic in CREATE MATERIALIZED VIEW (CASSANDRA-16898) * Exclude Jackson 1.x transitive dependency of hadoop* provided dependencies (CASSANDRA-16854) + * Tolerate missing DNS entry when completing a host replacement (CASSANDRA-16873) + * Harden PrunableArrayQueue against Pruner implementations that might throw exceptions (CASSANDRA-16866) + * Move RepairedDataInfo to the execution controller rather than the ReadCommand to avoid unintended sharing (CASSANDRA-16721) + * Bump zstd-jni version to 1.5.0-4 (CASSANDRA-16884) + * Remove assumption that all urgent messages are small (CASSANDRA-16877) + * ArrayClustering.unsharedHeapSize does not include the data so undercounts the heap size (CASSANDRA-16845) + * Improve help, doc and error messages about sstabledump -k and -x arguments (CASSANDRA-16818) + * Add repaired/unrepaired bytes back to nodetool (CASSANDRA-15282) + * Upgrade lz4-java to 1.8.0 to add RH6 support back (CASSANDRA-16753) + * Improve DiagnosticEventService.publish(event) logging message of events (CASSANDRA-16749) + * Cleanup dependency scopes (CASSANDRA-16704) + * Make JmxHistogram#getRecentValues() and JmxTimer#getRecentValues() thread-safe (CASSANDRA-16707) Merged from 3.11: ++ * Fix snapshot true size calculation (CASSANDRA-17267) + * dropping of a materialized view creates a snapshot with dropped- prefix (CASSANDRA-17415) + * Validate existence of DCs when repairing (CASSANDRA-17407) * Add key validation to ssstablescrub (CASSANDRA-16969) * Update Jackson from 2.9.10 to 2.12.5 (CASSANDRA-16851) * Make assassinate more resilient to missing tokens (CASSANDRA-16847) diff --cc src/java/org/apache/cassandra/db/Directories.java index 5a9c563,f09cdae..1cc350d --- a/src/java/org/apache/cassandra/db/Directories.java +++ b/src/java/org/apache/cassandra/db/Directories.java @@@ -1224,7 -1162,7 +1224,7 @@@ public class Directorie SSTableSizeSummer(File path, List files) { super(path); - toSkip = new HashSet<>(files); -toSkip = files.stream().map(f -> f.getName()).collect(Collectors.toSet()); ++toSkip = files.stream().map(f -> f.name()).collect(Collectors.toSet()); } @Override @@@ -1235,7 -1173,7 +1235,7 @@@ return desc != null && desc.ksname.equals(metadata.keyspace) && desc.cfname.equals(metadata.name) - && !toSkip.contains(file); -&& !toSkip.contains(file.getName()); ++&& !toSkip.contains(file.name()); } } diff --cc test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java index e7c7e22,266b37d..d970b12 --- a/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java +++ b/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java @@@ -29,10 -31,16 +29,11 @@@ import org.junit.Before import org.junit.BeforeClass; import org.junit.Test; + import org.apache.cassandra.db.lifecycle.LifecycleTransaction; -import org.apache.cassandra.utils.Pair; -import org.json.simple.JSONArray; -import org.json.simple.JSONObject; -import org.json.simple.parser.JSONParser; - -import static org.assertj.core.api.Assertions.assertThat; -i
[cassandra] branch cassandra-4.0 updated (d9bd035 -> 5f50c79)
This is an automated email from the ASF dual-hosted git repository. paulo pushed a change to branch cassandra-4.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from d9bd035 Merge branch 'cassandra-3.11' into cassandra-4.0 new 95a6223 Fix snapshot true size calculation new 5f50c79 Merge branch 'cassandra-3.11' into cassandra-4.0 The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGES.txt| 1 + src/java/org/apache/cassandra/db/Directories.java | 8 ++-- .../apache/cassandra/db/ColumnFamilyStoreTest.java | 43 .../apache/cassandra/index/sasi/SASIIndexTest.java | 46 +- 4 files changed, 77 insertions(+), 21 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated (3b1ce69 -> 951645a)
This is an automated email from the ASF dual-hosted git repository. paulo pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 3b1ce69 Merge branch 'cassandra-4.0' into trunk new 95a6223 Fix snapshot true size calculation new 5f50c79 Merge branch 'cassandra-3.11' into cassandra-4.0 new 951645a Merge branch 'cassandra-4.0' into trunk The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGES.txt| 1 + src/java/org/apache/cassandra/db/Directories.java | 6 +-- .../apache/cassandra/db/ColumnFamilyStoreTest.java | 41 ++ .../apache/cassandra/index/sasi/SASIIndexTest.java | 48 +- 4 files changed, 74 insertions(+), 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] 01/01: Merge branch 'cassandra-3.11' into cassandra-4.0
This is an automated email from the ASF dual-hosted git repository. paulo pushed a commit to branch cassandra-4.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 5f50c797bc2746c6e5d37389fa020e708d55b0b4 Merge: d9bd035 95a6223 Author: Paulo Motta AuthorDate: Mon Mar 14 17:18:15 2022 -0300 Merge branch 'cassandra-3.11' into cassandra-4.0 CHANGES.txt| 1 + src/java/org/apache/cassandra/db/Directories.java | 8 ++-- .../apache/cassandra/db/ColumnFamilyStoreTest.java | 43 .../apache/cassandra/index/sasi/SASIIndexTest.java | 46 +- 4 files changed, 77 insertions(+), 21 deletions(-) diff --cc CHANGES.txt index e9b5b43,df9d4e1..86eae2b --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,12 -1,7 +1,13 @@@ -3.11.13 +4.0.4 + * Fix ObjectSizes implementation and usages (CASSANDRA-17402) + * Fix race condition bug during local session repair (CASSANDRA-17335) + * Fix ignored streaming encryption settings in sstableloader (CASSANDRA-17367) + * Streaming tasks handle empty SSTables correctly (CASSANDRA-16349) + * Prevent SSTableLoader from doing unnecessary work (CASSANDRA-16349) +Merged from 3.11: + * Fix snapshot true size calculation (CASSANDRA-17267) - * Validate existence of DCs when repairing (CASSANDRA-17407) * dropping of a materialized view creates a snapshot with dropped- prefix (CASSANDRA-17415) + * Validate existence of DCs when repairing (CASSANDRA-17407) Merged from 3.0: * Require ant >= 1.10 (CASSANDRA-17428) * Disallow CONTAINS for UPDATE and DELETE (CASSANDRA-15266) diff --cc src/java/org/apache/cassandra/db/Directories.java index aa2881f,b37afa5..f09cdae --- a/src/java/org/apache/cassandra/db/Directories.java +++ b/src/java/org/apache/cassandra/db/Directories.java @@@ -17,15 -17,22 +17,17 @@@ */ package org.apache.cassandra.db; -import java.io.File; -import java.io.FileFilter; -import java.io.IOError; -import java.io.IOException; -import java.nio.file.Files; -import java.nio.file.Path; -import java.nio.file.Paths; +import java.io.*; +import java.nio.file.*; import java.util.*; import java.util.concurrent.ThreadLocalRandom; -import java.util.function.BiFunction; +import java.util.function.BiPredicate; + import java.util.stream.Collectors; + -import com.google.common.annotations.VisibleForTesting; -import com.google.common.base.Predicate; import com.google.common.collect.ImmutableMap; import com.google.common.collect.Iterables; +import com.google.common.collect.Maps; +import com.google.common.util.concurrent.RateLimiter; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; @@@ -1154,9 -1037,24 +1156,9 @@@ public class Directorie return StringUtils.join(s, File.separator); } -@VisibleForTesting -static void overrideDataDirectoriesForTest(String loc) -{ -for (int i = 0; i < dataDirectories.length; ++i) -dataDirectories[i] = new DataDirectory(new File(loc)); -} - -@VisibleForTesting -static void resetDataDirectoriesAfterTest() -{ -String[] locations = DatabaseDescriptor.getAllDataFileLocations(); -for (int i = 0; i < locations.length; ++i) -dataDirectories[i] = new DataDirectory(new File(locations[i])); -} - private class SSTableSizeSummer extends DirectorySizeCalculator { - private final HashSet toSkip; + private final Set toSkip; SSTableSizeSummer(File path, List files) { super(path); @@@ -1167,39 -1065,11 +1169,39 @@@ public boolean isAcceptable(Path path) { File file = path.toFile(); -Pair pair = SSTable.tryComponentFromFilename(path.getParent().toFile(), file.getName()); -return pair != null -&& pair.left.ksname.equals(metadata.ksName) -&& pair.left.cfname.equals(metadata.cfName) -&& !toSkip.contains(file.getName()); +Descriptor desc = SSTable.tryDescriptorFromFilename(file); +return desc != null +&& desc.ksname.equals(metadata.keyspace) +&& desc.cfname.equals(metadata.name) - && !toSkip.contains(file); ++&& !toSkip.contains(file.getName()); +} +} + +public static class SnapshotSizeDetails +{ +public final long sizeOnDiskBytes; +public final long dataSizeBytes; + +private SnapshotSizeDetails(long sizeOnDiskBytes, long dataSizeBytes) +{ +this.sizeOnDiskBytes = sizeOnDiskBytes; +this.dataSizeBytes = dataSizeBytes; +} + +@Override +public final int hashCode() +{ +int hashCode = (int) sizeOnDiskBytes ^ (int) (sizeOnDiskBytes >>&
[cassandra] branch cassandra-3.11 updated: Fix snapshot true size calculation
This is an automated email from the ASF dual-hosted git repository. paulo pushed a commit to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/cassandra-3.11 by this push: new 95a6223 Fix snapshot true size calculation 95a6223 is described below commit 95a622305722889c321204c4bca68a3517a29aab Author: Paulo Motta AuthorDate: Mon Mar 14 17:17:37 2022 -0300 Fix snapshot true size calculation Patch by Paulo Motta; Reviewed by Brandon Williams and Benjamin Lerer for CASSANDRA-17267 --- CHANGES.txt| 1 + src/java/org/apache/cassandra/db/Directories.java | 7 +-- .../apache/cassandra/db/ColumnFamilyStoreTest.java | 46 +++ .../apache/cassandra/index/sasi/SASIIndexTest.java | 51 +- 4 files changed, 81 insertions(+), 24 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index a8954b7..df9d4e1 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.11.13 + * Fix snapshot true size calculation (CASSANDRA-17267) * Validate existence of DCs when repairing (CASSANDRA-17407) * dropping of a materialized view creates a snapshot with dropped- prefix (CASSANDRA-17415) Merged from 3.0: diff --git a/src/java/org/apache/cassandra/db/Directories.java b/src/java/org/apache/cassandra/db/Directories.java index b5be69b..b37afa5 100644 --- a/src/java/org/apache/cassandra/db/Directories.java +++ b/src/java/org/apache/cassandra/db/Directories.java @@ -27,6 +27,7 @@ import java.nio.file.Paths; import java.util.*; import java.util.concurrent.ThreadLocalRandom; import java.util.function.BiFunction; +import java.util.stream.Collectors; import com.google.common.annotations.VisibleForTesting; import com.google.common.base.Predicate; @@ -1053,11 +1054,11 @@ public class Directories private class SSTableSizeSummer extends DirectorySizeCalculator { -private final HashSet toSkip; +private final Set toSkip; SSTableSizeSummer(File path, List files) { super(path); -toSkip = new HashSet<>(files); +toSkip = files.stream().map(f -> f.getName()).collect(Collectors.toSet()); } @Override @@ -1068,7 +1069,7 @@ public class Directories return pair != null && pair.left.ksname.equals(metadata.ksName) && pair.left.cfname.equals(metadata.cfName) -&& !toSkip.contains(file); +&& !toSkip.contains(file.getName()); } } } diff --git a/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java b/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java index a3564bb..3987a29 100644 --- a/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java +++ b/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java @@ -30,10 +30,12 @@ import org.junit.BeforeClass; import org.junit.Test; import org.junit.runner.RunWith; +import org.apache.cassandra.db.lifecycle.LifecycleTransaction; import org.json.simple.JSONArray; import org.json.simple.JSONObject; import org.json.simple.parser.JSONParser; +import static org.assertj.core.api.Assertions.assertThat; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertTrue; @@ -347,6 +349,50 @@ public class ColumnFamilyStoreTest } @Test +public void testSnapshotSize() +{ +// cleanup any previous test gargbage +ColumnFamilyStore cfs = Keyspace.open(KEYSPACE1).getColumnFamilyStore(CF_STANDARD1); +cfs.clearSnapshot(""); + +// Add row +new RowUpdateBuilder(cfs.metadata, 0, "key1") +.clustering("Column1") +.add("val", "asdf") +.build() +.applyUnsafe(); +cfs.forceBlockingFlush(); + +// snapshot +cfs.snapshot("basic", null, false, false); + +// check snapshot was created +Map> snapshotDetails = cfs.getSnapshotDetails(); +assertThat(snapshotDetails).hasSize(1); +assertThat(snapshotDetails).containsKey("basic"); + +// check that sizeOnDisk > trueSize = 0 +Pair details = snapshotDetails.get("basic"); +long sizeOnDisk = details.left; +long trueSize = details.right; +assertThat(sizeOnDisk).isGreaterThan(trueSize); +assertThat(trueSize).isZero(); + +// compact base table to make trueSize > 0 +cfs.forceMajorCompaction(); +LifecycleTransaction.waitForDeletions(); + +// sizeOnDisk > trueSize because trueSize does not include manifest.json +// Check that truesize now is > 0 +snapshotDetails = cfs.getSnapshotDetails(); +details = snapshotDetails.get("basic"); +sizeOnDisk =
[jira] (CASSANDRA-17267) Snapshot true size is miscalculated
[ https://issues.apache.org/jira/browse/CASSANDRA-17267 ] Paulo Motta deleted comment on CASSANDRA-17267: - was (Author: paulo): [~smiklosovic] I would appreciate if you can merge this. > Snapshot true size is miscalculated > --- > > Key: CASSANDRA-17267 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17267 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Normal > > As far as I understand, the snapshot "size on disk" is the total size of the > snapshot, while the "true size" is the (size_on_disk - size_of_live_sstables). > I created a snapshot on a 3.11 node without traffic and I expected the "true > size" to be 0KB since the original sstables were still present, but this > didn't seem to be the case: > {noformat} > $ nodetool listsnapshots > Snapshot Details: > Snapshot name Keyspace name Column family name True size Size on disk > test ks1 tbl1 4.86 KiB 5.69 KiB > Total TrueDiskSpaceUsed: 4.86 KiB > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17267) Snapshot true size is miscalculated
[ https://issues.apache.org/jira/browse/CASSANDRA-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506513#comment-17506513 ] Paulo Motta commented on CASSANDRA-17267: - [~smiklosovic] I would appreciate if you can merge this. > Snapshot true size is miscalculated > --- > > Key: CASSANDRA-17267 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17267 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Normal > > As far as I understand, the snapshot "size on disk" is the total size of the > snapshot, while the "true size" is the (size_on_disk - size_of_live_sstables). > I created a snapshot on a 3.11 node without traffic and I expected the "true > size" to be 0KB since the original sstables were still present, but this > didn't seem to be the case: > {noformat} > $ nodetool listsnapshots > Snapshot Details: > Snapshot name Keyspace name Column family name True size Size on disk > test ks1 tbl1 4.86 KiB 5.69 KiB > Total TrueDiskSpaceUsed: 4.86 KiB > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17428) Fail build when Ant is not of a certain version
[ https://issues.apache.org/jira/browse/CASSANDRA-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506456#comment-17506456 ] Paulo Motta commented on CASSANDRA-17428: - +1 > Fail build when Ant is not of a certain version > --- > > Key: CASSANDRA-17428 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17428 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Low > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > As of writing this ticket, trunk is known to be buildable with Ant 1.10 and > users have reported build failures on Ant 1.9 (see CASSANDRA-16831). > There should be a check which fails the build if Ant used to build Cassanra > source code was not at least 1.10. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17428) Fail build when Ant is not of a certain version
[ https://issues.apache.org/jira/browse/CASSANDRA-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504226#comment-17504226 ] Paulo Motta edited comment on CASSANDRA-17428 at 3/10/22, 12:45 PM: Did you test this locally? Can you paste a snippet of the check running for documentation purposes? -LGTM after CI passes.- Edit: check does not work to higher versions different from 1.10. was (Author: paulo): Did you test this locally? Can you paste a snippet of the check running for documentation purposes? LGTM after CI passes. > Fail build when Ant is not of a certain version > --- > > Key: CASSANDRA-17428 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17428 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Low > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > As of writing this ticket, trunk is known to be buildable with Ant 1.10 and > users have reported build failures on Ant 1.9 (see CASSANDRA-16831). > There should be a check which fails the build if Ant used to build Cassanra > source code was not at least 1.10. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17428) Fail build when Ant is not of a certain version
[ https://issues.apache.org/jira/browse/CASSANDRA-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504229#comment-17504229 ] Paulo Motta commented on CASSANDRA-17428: - Can you try *lower-than* operator instead of {*}contains{*}? > Fail build when Ant is not of a certain version > --- > > Key: CASSANDRA-17428 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17428 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Low > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > As of writing this ticket, trunk is known to be buildable with Ant 1.10 and > users have reported build failures on Ant 1.9 (see CASSANDRA-16831). > There should be a check which fails the build if Ant used to build Cassanra > source code was not at least 1.10. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17428) Fail build when Ant is not of a certain version
[ https://issues.apache.org/jira/browse/CASSANDRA-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17428: Reviewers: Paulo Motta > Fail build when Ant is not of a certain version > --- > > Key: CASSANDRA-17428 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17428 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Low > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > As of writing this ticket, trunk is known to be buildable with Ant 1.10 and > users have reported build failures on Ant 1.9 (see CASSANDRA-16831). > There should be a check which fails the build if Ant used to build Cassanra > source code was not at least 1.10. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17428) Fail build when Ant is not of a certain version
[ https://issues.apache.org/jira/browse/CASSANDRA-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504226#comment-17504226 ] Paulo Motta commented on CASSANDRA-17428: - Did you test this locally? Can you paste a snippet of the check running for documentation purposes? LGTM after CI passes. > Fail build when Ant is not of a certain version > --- > > Key: CASSANDRA-17428 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17428 > Project: Cassandra > Issue Type: Improvement > Components: Build >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Low > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > As of writing this ticket, trunk is known to be buildable with Ant 1.10 and > users have reported build failures on Ant 1.9 (see CASSANDRA-16831). > There should be a check which fails the build if Ant used to build Cassanra > source code was not at least 1.10. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16831) Unable to build from source - generate-jflex-java task fails with exception regarding JFlex
[ https://issues.apache.org/jira/browse/CASSANDRA-16831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503864#comment-17503864 ] Paulo Motta commented on CASSANDRA-16831: - Upgrading from ant 1.9.16 to ant 1.10.11 fixed this issue. > Unable to build from source - generate-jflex-java task fails with exception > regarding JFlex > --- > > Key: CASSANDRA-16831 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16831 > Project: Cassandra > Issue Type: Bug > Components: Build >Reporter: Jonathon Henderson >Priority: Normal > > I'm unable to build Cassandra from source by running {{ant}} in the root > directory on Windows 10, using AdoptOpenJDK 8 and Ant 1.10.11. The build > fails with the following: > {code:bash} > generate-jflex-java: > BUILD FAILED > C:\Users\Jonathon\IdeaProjects\cassandra\build.xml:438: > java.lang.NoSuchMethodError: > jflex.core.LexParse.getSymbolFactory()Ljava_cup/runtime/SymbolFactory; > at > jflex.core.LexParse$CUP$LexParse$actions.CUP$LexParse$do_action_part(LexParse.java:1087) > at > jflex.core.LexParse$CUP$LexParse$actions.CUP$LexParse$do_action(LexParse.java:2257) > at jflex.core.LexParse.do_action(LexParse.java:598) > at java_cup.runtime.lr_parser.parse(lr_parser.java:569) > at jflex.generator.LexGenerator.generate(LexGenerator.java:74) > at jflex.anttask.JFlexTask.execute(JFlexTask.java:78) > at > org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:299) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:99) > at org.apache.tools.ant.Task.perform(Task.java:350) > at org.apache.tools.ant.Target.execute(Target.java:449) > at org.apache.tools.ant.Target.performTasks(Target.java:470) > at > org.apache.tools.ant.Project.executeSortedTargets(Project.java:1401) > at org.apache.tools.ant.Project.executeTarget(Project.java:1374) > at > org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41) > at org.apache.tools.ant.Project.executeTargets(Project.java:1264) > at org.apache.tools.ant.Main.runBuild(Main.java:818) > at org.apache.tools.ant.Main.startAnt(Main.java:223) > at org.apache.tools.ant.launch.Launcher.run(Launcher.java:284) > at org.apache.tools.ant.launch.Launcher.main(Launcher.java:101) > {code} > I've also tried doing it through Windows Subsystem for Linux (WSL 2) with the > same versions of Java and Ant. I get the same error. > My environment: > * Windows 10 Pro (Version 20H2, build 19042.1110) > * > OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10) > * > Apache Ant(TM) version 1.10.11 compiled on July 10 2021 > Note: I should probably mention that I'm new to Cassandra and I've never been > able to build it before. I'm trying to run an initial build before I run the > script to generate Intellij IDEA files (since generate-jflex-java is executed > during the generate-idea-files task, it fails there with the error above > too). If I'm doing something wrong, please let me know. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16325) Update streaming metrics incrementally
[ https://issues.apache.org/jira/browse/CASSANDRA-16325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503042#comment-17503042 ] Paulo Motta commented on CASSANDRA-16325: - Hi [~dgvozdenac]. Thanks for taking this up. It seems like a similar issue happens with compaction metrics. There was some code changes in these areas in recent versions. I think a good start would be to inspect the code in {{trunk}} to check where streaming and compaction size metrics are updated, and if they're being done incrementally or only at the end of the operation. > Update streaming metrics incrementally > -- > > Key: CASSANDRA-16325 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16325 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Paulo Motta >Assignee: Dejan Gvozdenac >Priority: Normal > Labels: lhf > > Currently the inbound and outbound streamed bytes metrics are incremented > after each file is streamed, what doesn't represent the current number of > bytes streamed since it can take a long time for a large file to be streamed. > We should update the metric incrementally as data is streamed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17415) dropping of a materialized view does not create a snapshot with dropped- prefix
[ https://issues.apache.org/jira/browse/CASSANDRA-17415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17415: Reviewers: Paulo Motta, Paulo Motta Paulo Motta, Paulo Motta (was: Paulo Motta) Status: Review In Progress (was: Patch Available) > dropping of a materialized view does not create a snapshot with dropped- > prefix > --- > > Key: CASSANDRA-17415 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17415 > Project: Cassandra > Issue Type: Bug > Components: Feature/Materialized Views >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.x > > > When auto_snapshot: true and MV is dropped, the name of the snapshot does not > start with "dropped-" prefix as a normal table would. This is an issue for > 3.11.x only. In 4.x, the code was refactored a lot and it does not happen > there. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17415) dropping of a materialized view does not create a snapshot with dropped- prefix
[ https://issues.apache.org/jira/browse/CASSANDRA-17415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17415: Status: Patch Available (was: In Progress) > dropping of a materialized view does not create a snapshot with dropped- > prefix > --- > > Key: CASSANDRA-17415 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17415 > Project: Cassandra > Issue Type: Bug > Components: Feature/Materialized Views >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.x > > > When auto_snapshot: true and MV is dropped, the name of the snapshot does not > start with "dropped-" prefix as a normal table would. This is an issue for > 3.11.x only. In 4.x, the code was refactored a lot and it does not happen > there. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17415) dropping of a materialized view does not create a snapshot with dropped- prefix
[ https://issues.apache.org/jira/browse/CASSANDRA-17415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17415: Status: Ready to Commit (was: Review In Progress) > dropping of a materialized view does not create a snapshot with dropped- > prefix > --- > > Key: CASSANDRA-17415 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17415 > Project: Cassandra > Issue Type: Bug > Components: Feature/Materialized Views >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.x > > > When auto_snapshot: true and MV is dropped, the name of the snapshot does not > start with "dropped-" prefix as a normal table would. This is an issue for > 3.11.x only. In 4.x, the code was refactored a lot and it does not happen > there. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17415) dropping of a materialized view does not create a snapshot with dropped- prefix
[ https://issues.apache.org/jira/browse/CASSANDRA-17415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502937#comment-17502937 ] Paulo Motta commented on CASSANDRA-17415: - +1 > dropping of a materialized view does not create a snapshot with dropped- > prefix > --- > > Key: CASSANDRA-17415 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17415 > Project: Cassandra > Issue Type: Bug > Components: Feature/Materialized Views >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.x > > > When auto_snapshot: true and MV is dropped, the name of the snapshot does not > start with "dropped-" prefix as a normal table would. This is an issue for > 3.11.x only. In 4.x, the code was refactored a lot and it does not happen > there. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17415) dropping of a materialized view does not create a snapshot with dropped- prefix
[ https://issues.apache.org/jira/browse/CASSANDRA-17415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17415: Status: In Progress (was: Patch Available) > dropping of a materialized view does not create a snapshot with dropped- > prefix > --- > > Key: CASSANDRA-17415 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17415 > Project: Cassandra > Issue Type: Bug > Components: Feature/Materialized Views >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.11.x > > > When auto_snapshot: true and MV is dropped, the name of the snapshot does not > start with "dropped-" prefix as a normal table would. This is an issue for > 3.11.x only. In 4.x, the code was refactored a lot and it does not happen > there. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17410) WEBSITE - March 2022 blog "Apache Cassandra’s Google Summer of Code program"
[ https://issues.apache.org/jira/browse/CASSANDRA-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502672#comment-17502672 ] Paulo Motta edited comment on CASSANDRA-17410 at 3/8/22, 2:32 AM: -- Added some suggestions on [this commit|https://github.com/pauloricardomg/cassandra-website/commit/7b1ee69e6424efffb9d564a009e005982cf3c563]. Can you take a look [~erickramirezau]? was (Author: paulo): Added some suggestions on [this commit|https://github.com/apache/cassandra-website/commit/0f8456601bf0b661b8e307b822962dfe730d383d]. Can you take a look [~erickramirezau]? > WEBSITE - March 2022 blog "Apache Cassandra’s Google Summer of Code program" > > > Key: CASSANDRA-17410 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17410 > Project: Cassandra > Issue Type: Task > Components: Documentation/Blog >Reporter: Diogenese Topper >Assignee: Paulo Motta >Priority: Normal > Labels: pull-request-available > > This ticket is to capture the work associated with publishing the March 2022 > blog "Apache Cassandra’s Google Summer of Code program" > If this blog cannot be published by the *March 7, 2022 publish date*, please > contact me, suggest changes, or correct the date when possible in the pull > request for the appropriate time that the blog will go live (on both the > blog.adoc and the blog post's file). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17410) WEBSITE - March 2022 blog "Apache Cassandra’s Google Summer of Code program"
[ https://issues.apache.org/jira/browse/CASSANDRA-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502672#comment-17502672 ] Paulo Motta edited comment on CASSANDRA-17410 at 3/8/22, 2:23 AM: -- Added some suggestions on [this commit|https://github.com/apache/cassandra-website/commit/0f8456601bf0b661b8e307b822962dfe730d383d]. Can you take a look [~erickramirezau]? was (Author: paulo): Added some suggestions on [this commit](https://github.com/apache/cassandra-website/commit/0f8456601bf0b661b8e307b822962dfe730d383d). Can you take a look [~erickramirezau]? > WEBSITE - March 2022 blog "Apache Cassandra’s Google Summer of Code program" > > > Key: CASSANDRA-17410 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17410 > Project: Cassandra > Issue Type: Task > Components: Documentation/Blog >Reporter: Diogenese Topper >Assignee: Paulo Motta >Priority: Normal > Labels: pull-request-available > > This ticket is to capture the work associated with publishing the March 2022 > blog "Apache Cassandra’s Google Summer of Code program" > If this blog cannot be published by the *March 7, 2022 publish date*, please > contact me, suggest changes, or correct the date when possible in the pull > request for the appropriate time that the blog will go live (on both the > blog.adoc and the blog post's file). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17410) WEBSITE - March 2022 blog "Apache Cassandra’s Google Summer of Code program"
[ https://issues.apache.org/jira/browse/CASSANDRA-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502672#comment-17502672 ] Paulo Motta commented on CASSANDRA-17410: - Added some suggestions on [this commit](https://github.com/apache/cassandra-website/commit/0f8456601bf0b661b8e307b822962dfe730d383d). Can you take a look [~erickramirezau]? > WEBSITE - March 2022 blog "Apache Cassandra’s Google Summer of Code program" > > > Key: CASSANDRA-17410 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17410 > Project: Cassandra > Issue Type: Task > Components: Documentation/Blog >Reporter: Diogenese Topper >Assignee: Paulo Motta >Priority: Normal > Labels: pull-request-available > > This ticket is to capture the work associated with publishing the March 2022 > blog "Apache Cassandra’s Google Summer of Code program" > If this blog cannot be published by the *March 7, 2022 publish date*, please > contact me, suggest changes, or correct the date when possible in the pull > request for the appropriate time that the blog will go live (on both the > blog.adoc and the blog post's file). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17410) WEBSITE - March 2022 blog "Apache Cassandra’s Google Summer of Code program"
[ https://issues.apache.org/jira/browse/CASSANDRA-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17410: Change Category: Semantic Complexity: Normal Status: Open (was: Triage Needed) > WEBSITE - March 2022 blog "Apache Cassandra’s Google Summer of Code program" > > > Key: CASSANDRA-17410 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17410 > Project: Cassandra > Issue Type: Task > Components: Documentation/Blog >Reporter: Diogenese Topper >Priority: Normal > > This ticket is to capture the work associated with publishing the March 2022 > blog "Apache Cassandra’s Google Summer of Code program" > If this blog cannot be published by the *March 7, 2022 publish date*, please > contact me, suggest changes, or correct the date when possible in the pull > request for the appropriate time that the blog will go live (on both the > blog.adoc and the blog post's file). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17410) WEBSITE - March 2022 blog "Apache Cassandra’s Google Summer of Code program"
[ https://issues.apache.org/jira/browse/CASSANDRA-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17410: Reviewers: Paulo Motta > WEBSITE - March 2022 blog "Apache Cassandra’s Google Summer of Code program" > > > Key: CASSANDRA-17410 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17410 > Project: Cassandra > Issue Type: Task > Components: Documentation/Blog >Reporter: Diogenese Topper >Priority: Normal > > This ticket is to capture the work associated with publishing the March 2022 > blog "Apache Cassandra’s Google Summer of Code program" > If this blog cannot be published by the *March 7, 2022 publish date*, please > contact me, suggest changes, or correct the date when possible in the pull > request for the appropriate time that the blog will go live (on both the > blog.adoc and the blog post's file). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17380) Add support for EXPLAIN statements
[ https://issues.apache.org/jira/browse/CASSANDRA-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17380: Labels: gsoc gsoc2022 (was: gsoc gsoc22) > Add support for EXPLAIN statements > -- > > Key: CASSANDRA-17380 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17380 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Lerer >Priority: Normal > Labels: gsoc, gsoc2022 > > We should provide users a way to understand how their query will be executed > and some information on the amount of work that will be performed. > Explain statements are the most common way to do that. > A CEP Draft has been open for that: [(DRAFT) CEP-4: > Explain|https://docs.google.com/document/d/1s_gc4TDYdDbHnYHHVxxjqVVUn3MONUqG6W2JehnC11g/edit]. > This draft propose to add support for {{EXPLAIN}} and {{EXPLAIN ANALYZE}} > but I believe that we should split the work in 2 parts because a simple > {{EXPLAIN}} would already provide relevant information. > To complete this work I believe that the following steps will be required: > * Rework and submit the CEP > * Add missing statistics > * Implements the logic behind the EXPLAIN statements -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17220) Make startup checks configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17220: Status: Review In Progress (was: Patch Available) > Make startup checks configurable > > > Key: CASSANDRA-17220 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17220 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > > This ticket was created from needs discovered in CASSANDRA-17180. We want to > be able to configure a startup check so we figured out that it is necessary > to treat all startup checks same - to be able to configure them. This ticket > is about making startup checks configurable. > Once this ticket is done, we can continue with the implementation of > CASSANDRA-17180 where the implementation of gc grace check will be done. > We have identified that there is one check currently in place which needs to > be changed to reflect this configuration implementation and that is > FileSystemOwnershipCheck. > Because startup checks were not configurable before via means of a > configuration file, they were configurable via system properties. This ticket > does not aim to get rid system properties configuration mechanism, system > properties will have precedence over settings in configuration file. Then, in > the next release, I am aiming to get rid of system properties configuration > mechanism. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17220) Make startup checks configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498298#comment-17498298 ] Paulo Motta commented on CASSANDRA-17220: - {quote} Thanks mate, I am for simple configs, it is obvious it is a check. {quote} Ok, even though I still prefer {{check_dc}} and check_rack ;) Can you attach some CI results? > Make startup checks configurable > > > Key: CASSANDRA-17220 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17220 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > > This ticket was created from needs discovered in CASSANDRA-17180. We want to > be able to configure a startup check so we figured out that it is necessary > to treat all startup checks same - to be able to configure them. This ticket > is about making startup checks configurable. > Once this ticket is done, we can continue with the implementation of > CASSANDRA-17180 where the implementation of gc grace check will be done. > We have identified that there is one check currently in place which needs to > be changed to reflect this configuration implementation and that is > FileSystemOwnershipCheck. > Because startup checks were not configurable before via means of a > configuration file, they were configurable via system properties. This ticket > does not aim to get rid system properties configuration mechanism, system > properties will have precedence over settings in configuration file. Then, in > the next release, I am aiming to get rid of system properties configuration > mechanism. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17220) Make startup checks configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17220: Status: Ready to Commit (was: Review In Progress) > Make startup checks configurable > > > Key: CASSANDRA-17220 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17220 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > > This ticket was created from needs discovered in CASSANDRA-17180. We want to > be able to configure a startup check so we figured out that it is necessary > to treat all startup checks same - to be able to configure them. This ticket > is about making startup checks configurable. > Once this ticket is done, we can continue with the implementation of > CASSANDRA-17180 where the implementation of gc grace check will be done. > We have identified that there is one check currently in place which needs to > be changed to reflect this configuration implementation and that is > FileSystemOwnershipCheck. > Because startup checks were not configurable before via means of a > configuration file, they were configurable via system properties. This ticket > does not aim to get rid system properties configuration mechanism, system > properties will have precedence over settings in configuration file. Then, in > the next release, I am aiming to get rid of system properties configuration > mechanism. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17220) Make startup checks configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17220: Status: Patch Available (was: Ready to Commit) > Make startup checks configurable > > > Key: CASSANDRA-17220 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17220 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > > This ticket was created from needs discovered in CASSANDRA-17180. We want to > be able to configure a startup check so we figured out that it is necessary > to treat all startup checks same - to be able to configure them. This ticket > is about making startup checks configurable. > Once this ticket is done, we can continue with the implementation of > CASSANDRA-17180 where the implementation of gc grace check will be done. > We have identified that there is one check currently in place which needs to > be changed to reflect this configuration implementation and that is > FileSystemOwnershipCheck. > Because startup checks were not configurable before via means of a > configuration file, they were configurable via system properties. This ticket > does not aim to get rid system properties configuration mechanism, system > properties will have precedence over settings in configuration file. Then, in > the next release, I am aiming to get rid of system properties configuration > mechanism. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17220) Make startup checks configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17220: Status: Needs Committer (was: Patch Available) > Make startup checks configurable > > > Key: CASSANDRA-17220 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17220 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > > This ticket was created from needs discovered in CASSANDRA-17180. We want to > be able to configure a startup check so we figured out that it is necessary > to treat all startup checks same - to be able to configure them. This ticket > is about making startup checks configurable. > Once this ticket is done, we can continue with the implementation of > CASSANDRA-17180 where the implementation of gc grace check will be done. > We have identified that there is one check currently in place which needs to > be changed to reflect this configuration implementation and that is > FileSystemOwnershipCheck. > Because startup checks were not configurable before via means of a > configuration file, they were configurable via system properties. This ticket > does not aim to get rid system properties configuration mechanism, system > properties will have precedence over settings in configuration file. Then, in > the next release, I am aiming to get rid of system properties configuration > mechanism. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17220) Make startup checks configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17220: Status: Review In Progress (was: Needs Committer) > Make startup checks configurable > > > Key: CASSANDRA-17220 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17220 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > > This ticket was created from needs discovered in CASSANDRA-17180. We want to > be able to configure a startup check so we figured out that it is necessary > to treat all startup checks same - to be able to configure them. This ticket > is about making startup checks configurable. > Once this ticket is done, we can continue with the implementation of > CASSANDRA-17180 where the implementation of gc grace check will be done. > We have identified that there is one check currently in place which needs to > be changed to reflect this configuration implementation and that is > FileSystemOwnershipCheck. > Because startup checks were not configurable before via means of a > configuration file, they were configurable via system properties. This ticket > does not aim to get rid system properties configuration mechanism, system > properties will have precedence over settings in configuration file. Then, in > the next release, I am aiming to get rid of system properties configuration > mechanism. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17220) Make startup checks configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498256#comment-17498256 ] Paulo Motta commented on CASSANDRA-17220: - We probably need to create a documentation ticket to explain what each startup check does on the documentation. > Make startup checks configurable > > > Key: CASSANDRA-17220 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17220 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > > This ticket was created from needs discovered in CASSANDRA-17180. We want to > be able to configure a startup check so we figured out that it is necessary > to treat all startup checks same - to be able to configure them. This ticket > is about making startup checks configurable. > Once this ticket is done, we can continue with the implementation of > CASSANDRA-17180 where the implementation of gc grace check will be done. > We have identified that there is one check currently in place which needs to > be changed to reflect this configuration implementation and that is > FileSystemOwnershipCheck. > Because startup checks were not configurable before via means of a > configuration file, they were configurable via system properties. This ticket > does not aim to get rid system properties configuration mechanism, system > properties will have precedence over settings in configuration file. Then, in > the next release, I am aiming to get rid of system properties configuration > mechanism. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17220) Make startup checks configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498255#comment-17498255 ] Paulo Motta commented on CASSANDRA-17220: - This looks great +1. If you agree can you incorporate [this commit|https://github.com/pauloricardomg/cassandra/commit/956f6def4f12622efa914f3f08d6747bfc278953] updating the {{cassandra.yaml}} wording? I tried to reduce the amount of text between the properties. Also, one final nitpicking before we merge: * Should we make startup check names be prefixed with {{check_}} ? Ie. {{{}check_dc{}}}, {{{}check_rack{}}}, etc (instead of simply {{dc}} or {{{}rack{}}})? This will make it easier to figure out what the startup check does. The renamed checks would be: {code:yaml} startup_checks: check_filesystem_ownership: enabled: false ownership_token: "sometoken" # (overriden by "CassandraOwnershipToken" system property) ownership_filename: ".cassandra_fs_ownership" # (overriden by "cassandra.fs_ownership_filename") check_dc: enabled: true # (overriden by cassandra.ignore_dc system property) check_rack: enabled: true # (overriden by cassandra.ignore_rack system property) {code} What do you think [~smiklosovic] [~dcapwell] ? > Make startup checks configurable > > > Key: CASSANDRA-17220 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17220 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > > This ticket was created from needs discovered in CASSANDRA-17180. We want to > be able to configure a startup check so we figured out that it is necessary > to treat all startup checks same - to be able to configure them. This ticket > is about making startup checks configurable. > Once this ticket is done, we can continue with the implementation of > CASSANDRA-17180 where the implementation of gc grace check will be done. > We have identified that there is one check currently in place which needs to > be changed to reflect this configuration implementation and that is > FileSystemOwnershipCheck. > Because startup checks were not configurable before via means of a > configuration file, they were configurable via system properties. This ticket > does not aim to get rid system properties configuration mechanism, system > properties will have precedence over settings in configuration file. Then, in > the next release, I am aiming to get rid of system properties configuration > mechanism. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17390) Expose streaming as a vtable
[ https://issues.apache.org/jira/browse/CASSANDRA-17390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17390: Status: Patch Available (was: Ready to Commit) > Expose streaming as a vtable > > > Key: CASSANDRA-17390 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17390 > Project: Cassandra > Issue Type: Sub-task > Components: Consistency/Streaming, Feature/Virtual Tables >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.x > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CASSANDRA-15399 is exposing repair state as a vtable, but repair relies on > streaming so needs streaming table as well. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17390) Expose streaming as a vtable
[ https://issues.apache.org/jira/browse/CASSANDRA-17390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17390: Status: In Progress (was: Patch Available) > Expose streaming as a vtable > > > Key: CASSANDRA-17390 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17390 > Project: Cassandra > Issue Type: Sub-task > Components: Consistency/Streaming, Feature/Virtual Tables >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.x > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CASSANDRA-15399 is exposing repair state as a vtable, but repair relies on > streaming so needs streaming table as well. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17390) Expose streaming as a vtable
[ https://issues.apache.org/jira/browse/CASSANDRA-17390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497396#comment-17497396 ] Paulo Motta commented on CASSANDRA-17390: - Hi David, can you paste a snippet of what the vtable output will look like? This would help to get a feel of what info will be displayed for future reference (ie. documentation). > Expose streaming as a vtable > > > Key: CASSANDRA-17390 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17390 > Project: Cassandra > Issue Type: Sub-task > Components: Consistency/Streaming, Feature/Virtual Tables >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.x > > Time Spent: 2h 20m > Remaining Estimate: 0h > > CASSANDRA-15399 is exposing repair state as a vtable, but repair relies on > streaming so needs streaming table as well. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts
[ https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496352#comment-17496352 ] Paulo Motta commented on CASSANDRA-17292: - {quote}I think the main point of contention then is incremental vs. non-incremental migration of existing configuration. {quote} I think we can support the new layout for new configurations added before 5.X. For existing (legacy) configurations I see the following options: a) Non-incrementally migrate all legacy properties to the new layout on 5.X b) Incrementally migrate on 4.x while allowing users to opt-in to the new configuration, and switch that to opt-out on 5.x. I'm slightly in favor of b) due to splitting the work into bite-sized chunks and making the new layout incrementally available earlier, but I'm also OK with a). {quote}I think the thought that's hard for me to escape around this is that we really want a coherent design for the whole configuration up-front, given the lack of one is at least partially to blame for the current mess. {quote} This is my main motivation for chiming in here with this feature-centric proposal, since it allows anyone to pretty easily decide where a particular configuration belongs using the following heuristic when adding a new configuration option: * Does this configuration belong to an existing {{{}FeatureConfiguration{}}}? ** If yes, add the new property to the existing {{{}FeatureConfiguration{}}}. ** If not, create a new {{FeatureConfiguration}} subclass for the particular feature that you're adding. No prior knowledge on the "domain model" is needed to use the heuristics above when deciding where a configuration should go. {quote}Then, if we have that, and we can work out whatever small inconsistencies exist, we can present operators with a clean v2 config file format in 5.0 (that requires us to do very little thinking about compatibility, outside checking the version element). {quote} The migration of "legacy configuration" to the new feature-centric layout is also straightforward using the same heuristics above, for whenever we decide to perform a "big bang" switch to the new configuration layout. > Move cassandra.yaml toward a nested structure around major database concepts > > > Key: CASSANDRA-17292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17292 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.x > > > Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new > features") has made it clear we will gravitate toward appropriately nested > structures for new parameters in {{cassandra.yaml}}, but from the scattered > conversation across a few Guardrails tickets (see CASSANDRA-17212 and > CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to > eventually extend this to the rest of {{cassandra.yaml}}. The benefits of > this change include those we gain by doing it for new features (single point > of interest for feature documentation, typed configuration objects, logical > grouping for additional parameters added over time, discoverability, etc.), > but one a larger scale. > This may overlap with ongoing work, including the Guardrails epic. Ideally, > even a rough cut of a design here would allow that to move forward in a > timely and coherent manner (with less long-term refactoring pain). > Current proposals: > From [~benedict] - > https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas > From [~maedhroz] - > https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a > From [~paulo] - > https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts
[ https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496330#comment-17496330 ] Paulo Motta commented on CASSANDRA-17292: - Added an example of new feature-centric layout mixed with legacy configuration on a single "cassandra.yaml" for illustration: https://gist.github.com/pauloricardomg/4369f4b0dd8b84421a11ae61bf2d2c7e > Move cassandra.yaml toward a nested structure around major database concepts > > > Key: CASSANDRA-17292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17292 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.x > > > Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new > features") has made it clear we will gravitate toward appropriately nested > structures for new parameters in {{cassandra.yaml}}, but from the scattered > conversation across a few Guardrails tickets (see CASSANDRA-17212 and > CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to > eventually extend this to the rest of {{cassandra.yaml}}. The benefits of > this change include those we gain by doing it for new features (single point > of interest for feature documentation, typed configuration objects, logical > grouping for additional parameters added over time, discoverability, etc.), > but one a larger scale. > This may overlap with ongoing work, including the Guardrails epic. Ideally, > even a rough cut of a design here would allow that to move forward in a > timely and coherent manner (with less long-term refactoring pain). > Current proposals: > From [~benedict] - > https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas > From [~maedhroz] - > https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a > From [~paulo] - > https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts
[ https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496307#comment-17496307 ] Paulo Motta commented on CASSANDRA-17292: - One additional thing I would like to note is that my proposal conciously abstains from attempting to pre-define a full domain model upfront, in favor of an incremental feature-centric approach, where we migrate the properties from the legacy flat format to the new feature-centric format gradually - while new features can already start using the new format based on the {{FeatureConfiguration}} abstraction - as exemplified above in the migration of the "hints" configuration from the old to the new model. > Move cassandra.yaml toward a nested structure around major database concepts > > > Key: CASSANDRA-17292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17292 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.x > > > Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new > features") has made it clear we will gravitate toward appropriately nested > structures for new parameters in {{cassandra.yaml}}, but from the scattered > conversation across a few Guardrails tickets (see CASSANDRA-17212 and > CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to > eventually extend this to the rest of {{cassandra.yaml}}. The benefits of > this change include those we gain by doing it for new features (single point > of interest for feature documentation, typed configuration objects, logical > grouping for additional parameters added over time, discoverability, etc.), > but one a larger scale. > This may overlap with ongoing work, including the Guardrails epic. Ideally, > even a rough cut of a design here would allow that to move forward in a > timely and coherent manner (with less long-term refactoring pain). > Current proposals: > From [~benedict] - > https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas > From [~maedhroz] - > https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a > From [~paulo] - > https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts
[ https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496302#comment-17496302 ] Paulo Motta edited comment on CASSANDRA-17292 at 2/22/22, 8:29 PM: --- Thanks for the additional context [~maedhroz], that is very helpful to understand the reasoning behind the proposed nesting. {quote}For a moment, let's ignore the fact that there's any kind of textual configuration file at all for the project, but we still have all the knobs/systems/etc. The very first thing I would do is create a "domain model" for C* configuration on the Java side, a hierarchy rooted in a Configuration container class, which would contain members w/ types like ClusterConfiguration, NetworkConfiguration, StorageConfiguration, etc. These would be easy to navigate, would provide reasonable points for inline documentation, could encapsulate validation logic for relationships between parameters within subsystems and features, and could be passed as little "kernels" of configuration around the codebase, allowing for better mocking, etc. {quote} I think we're not very far from what we want the end result to look like from the developer's perspective, my proposal is just a simplification of yours where instead of a multi-level hierarchy rooted on physical resources (cluster/network/storage), I'm proposing a feature-centric domain model hierachy with a single level - each feature define its own configuration subtree. The basic construct to create new feature configurations is the following class: {code:java} public abstract class FeatureConfiguration { // is the feature enabled by default? boolean enabled = false; // the feature name to be used in the YAML/JSON public abstract String getFeatureName(); // whether this feature can be disabled public boolean isOptional() { return true; } } {code} This would allow to easily create typed configuration for each feature: * CommitlogConfiguration * HintsConfiguration * MaterializedViewsConfiguration For example this is how "HintsConfiguration" would look like: {code:java} public class HintsConfiguration extends FeatureConfiguration { public HintsConfiguration() { this.enabled = true; } public String getFeatureName() { return "hinted_handoff"; } boolean auto_hints_cleanup = false Duration max_hint_window = "3h" Throttle hinted_handoff_throttle = "1024KiB" int max_hints_delivery_threads = 2 Duration hints_flush_period = "1ms" Size max_hints_file_size = "128MiB" } {code} And would be represented as following on {{cassandra.yaml}}: {code:yaml} # Commit log (cannot be disabled because isOptional()=false) commit_log: commitlog_sync: periodic commitlog_sync_period: 1ms commitlog_segment_size: 32MiB # Hinted Handoff hinted_handoff: enabled: true auto_hints_cleanup: false max_hint_window: 3h hinted_handoff_throttle: 1024KiB max_hints_delivery_threads: 2 hints_flush_period: 1ms max_hints_file_size: 128MiB # MVs are experimental and not recommended for production-use materialized_views: enabled: false {code} The approach above provides a very simple user experience while allowing typed configuration in the developer's side. I think that we can easily fit most database configurations in this feature-centric view, but if there are some that we cannot fit into an existing feature we could create a new type {{ResourceConfiguration}} which would allow to configure a resource not tied to a particular feature. {quote}I'm still pretty strongly in support of a versioned but intact single configuration file. {quote} Perhaps I should've made it clear but the split of configuration in multiple files is a mere optional convenience of my proposal, which also support configurations in a single file for backward-compatibility. For instance, moving the configuration from the {{features.yaml}} to {{core.yaml}} would still render the same global configuration. I think that the optional splitting of configuration in different files provide an organizational benefit of grouping together properties belonging to a similar category (ie. core-features which cannot be disabled, optional features and guardrails). My original proposal of starting with 3 initial categories (core.yaml/features.yaml/guardrails.yaml) is mostly to facilitate the transition to the new configuration model: - cassandra.yaml (previously core.yaml): all legacy configurations would initially go here separated by section headers - features.yaml: all configurations compatible with the new {{{}FeatureConfiguration{ model would go here (including new features and "migrated" legacy features) - guardrails.yaml: all guardrails are collocated in the same fi
[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts
[ https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496302#comment-17496302 ] Paulo Motta edited comment on CASSANDRA-17292 at 2/22/22, 8:23 PM: --- Thanks for the additional context [~maedhroz], that is very helpful to understand the reasoning behind the proposed nesting. {quote}For a moment, let's ignore the fact that there's any kind of textual configuration file at all for the project, but we still have all the knobs/systems/etc. The very first thing I would do is create a "domain model" for C* configuration on the Java side, a hierarchy rooted in a Configuration container class, which would contain members w/ types like ClusterConfiguration, NetworkConfiguration, StorageConfiguration, etc. These would be easy to navigate, would provide reasonable points for inline documentation, could encapsulate validation logic for relationships between parameters within subsystems and features, and could be passed as little "kernels" of configuration around the codebase, allowing for better mocking, etc. {quote} I think we're not very far from what we want the end result to look like from the developer's perspective, my proposal is just a simplification of yours where instead of a multi-level hierarchy rooted on physical resources (cluster/network/storage), I'm proposing a feature-centric domain model hierachy with a single level - each feature define its own configuration subtree. The basic construct to create new feature configurations is the following class: {code:java} public abstract class FeatureConfiguration { // is the feature enabled by default? boolean enabled = false; // the feature name to be used in the YAML/JSON public abstract String getFeatureName(); // whether this feature can be disabled public boolean isOptional() { return true; } } {code} This would allow to easily create typed configuration for each feature: * CommitlogConfiguration * HintsConfiguration * MaterializedViewsConfiguration For example this is how "HintsConfiguration" would look like: {code:java} public class HintsConfiguration extends FeatureConfiguration { public HintsConfiguration() { this.enabled = true; } public String getFeatureName() { return "hinted_handoff"; } boolean auto_hints_cleanup = false Duration max_hint_window = "3h" Throttle hinted_handoff_throttle = "1024KiB" int max_hints_delivery_threads = 2 Duration hints_flush_period = "1ms" Size max_hints_file_size = "128MiB" } {code} And would be represented as following on {{{}cassandra.yaml{}}}: {code:yaml} # Commit log (cannot be disabled because isOptional()=false) commit_log: commitlog_sync: periodic commitlog_sync_period: 1ms commitlog_segment_size: 32MiB # Hinted Handoff hinted_handoff: enabled: true auto_hints_cleanup: false max_hint_window: 3h hinted_handoff_throttle: 1024KiB max_hints_delivery_threads: 2 hints_flush_period: 1ms max_hints_file_size: 128MiB # MVs are experimental and not recommended for production-use materialized_views: enabled: false {code} The approach above provides a very simple user experience while allowing typed configuration in the developer's side. I think that we can easily fit most database configurations in this feature-centric view, but if there are some that we cannot fit into an existing feature we could create a new type {{ResourceConfiguration}} which would allow to configure a resource not tied to a particular feature. {quote}I'm still pretty strongly in support of a versioned but intact single configuration file. {quote} Perhaps I should've made it clear but the split of configuration in multiple files is a mere optional convenience of my proposal, which also support configurations in a single file for backward-compatibility. For instance, moving the configuration from the {{features.yaml}} to {{core.yaml}} would still render the same global configuration. I think that the optional splitting of configuration in different files provide an organizational benefit of grouping together properties belonging to a similar category (ie. core-features which cannot be disabled, optional features and guardrails). My original proposal of starting with 3 initial categories (core.yaml/features.yaml/guardrails.yaml) is mostly to facilitate the transition to the new configuration model: - cassandra.yaml (previously core.yaml): all legacy configurations would initially go here separated by section headers - features.yaml: all configurations compatible with the new {{{}FeatureConfiguration{ model would go here (including new features and "migrated" legacy features) - guardrails.yaml: all guardrails are collocated in the same fi
[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts
[ https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496302#comment-17496302 ] Paulo Motta commented on CASSANDRA-17292: - Thanks for the additional context [~maedhroz], that is very helpful to understand the reasoning behind the proposed nesting. {quote}For a moment, let's ignore the fact that there's any kind of textual configuration file at all for the project, but we still have all the knobs/systems/etc. The very first thing I would do is create a "domain model" for C* configuration on the Java side, a hierarchy rooted in a Configuration container class, which would contain members w/ types like ClusterConfiguration, NetworkConfiguration, StorageConfiguration, etc. These would be easy to navigate, would provide reasonable points for inline documentation, could encapsulate validation logic for relationships between parameters within subsystems and features, and could be passed as little "kernels" of configuration around the codebase, allowing for better mocking, etc. {quote} I think we're not very far from what we want the end result to look like from the developer's perspective, my proposal is just a simplification of yours where instead of a multi-level hierarchy rooted on physical resources (cluster/network/storage), I'm proposing a feature-centric domain model hierachy with a single level - each feature define its own configuration subtree. The basic construct to create new feature configurations is the following class: {code:java} public abstract class FeatureConfiguration { // is the feature enabled by default? boolean enabled = false; // the feature name to be used in the YAML/JSON public abstract String getFeatureName(); // whether this feature can be disabled public boolean isOptional() { return true; } } {code} This would allow to easily create typed configuration for each feature: * CommitlogConfiguration * HintsConfiguration * MaterializedViewsConfiguration For example this is how "HintsConfiguration" would look like: {code:java} public class HintsConfiguration extends FeatureConfiguration { public HintsConfiguration() { this.enabled = true; } public String getFeatureName() { return "hinted_handoff"; } boolean auto_hints_cleanup = false Duration max_hint_window = "3h" Throttle hinted_handoff_throttle = "1024KiB" int max_hints_delivery_threads = 2 Duration hints_flush_period = "1ms" Size max_hints_file_size = "128MiB" } {code} And would be represented as following on {{{}cassandra.yaml{}}}: {code:yaml} # Commit log (cannot be disabled because isOptional()=false) commit_log: commitlog_sync: periodic commitlog_sync_period: 1ms commitlog_segment_size: 32MiB # Hinted Handoff hinted_handoff: enabled: true auto_hints_cleanup: false max_hint_window: 3h hinted_handoff_throttle: 1024KiB max_hints_delivery_threads: 2 hints_flush_period: 1ms max_hints_file_size: 128MiB # MVs are experimental and not recommended for production-use materialized_views: enabled: false {code} The approach above provides a very simple user experience while allowing typed configuration in the developer's side. I think that we can easily fit most database configurations in this feature-centric view, but if there are some that we cannot fit into an existing feature we could create a new type {{ResourceConfiguration}} which would allow to configure a resource not tied to a particular feature. {quote}I'm still pretty strongly in support of a versioned but intact single configuration file. {quote} Perhaps I should've made it clear but the split of configuration in multiple files is a mere optional convenience of my proposal, which also support configurations in a single file for backward-compatibility. For instance, moving the configuration from the {{features.yaml}} to {{core.yaml}} would still render the same global configuration. I think that the optional splitting of configuration in different files provide an organizational benefit of grouping together properties belonging to a similar category (ie. core-features which cannot be disabled, optional features and guardrails). My original proposal of starting with 3 initial categories (core.yaml/features.yaml/guardrails.yaml) is mostly to facilitate the transition to the new configuration model: - cassandra.yaml (previously core.yaml): all legacy configurations would initially go here separated by section headers - features.yaml: all configurations compatible with the new {{{}FeatureConfiguration{ model would go here (including new features and "migrated" legacy features) - guardrails.yaml: all guardrails are collocated in the same file for operational simplicity For instance, the hints
[jira] [Commented] (CASSANDRA-17267) Snapshot true size is miscalculated
[ https://issues.apache.org/jira/browse/CASSANDRA-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496100#comment-17496100 ] Paulo Motta commented on CASSANDRA-17267: - In the [previous test run|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1422/] {{org.apache.cassandra.index.sasi.SASIIndexTest.testSASIComponentsAddedToSnapshot}} was getting stuck when running within the suite (worked when executed individually). I tracked down the reason to the {{ReadExecutionController}} not being closed properly on other tests, causing operations to block indefinitely on the {{{}OpOrder{}}}. Fixed [on this commit|https://github.com/apache/cassandra/commit/77f688e75ff403875755f34dc31ab75401bcaa3d] on all branches. I created CASSANDRA-17400 to add a checker to verify resources are being properly closed to avoid stuck tests in the future. Resubmitted CI: |[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...pauloricardomg:CASSANDRA-17267-3.11]|[tests|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1440/]| |[4.0|https://github.com/apache/cassandra/compare/cassandra-4.0...pauloricardomg:CASSANDRA-17267-4.0]|[tests|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1441/]| |[trunk|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:CASSANDRA-17267-trunk]|[tests|https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1442/]| > Snapshot true size is miscalculated > --- > > Key: CASSANDRA-17267 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17267 > Project: Cassandra > Issue Type: Bug > Components: Local/Snapshots >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Normal > > As far as I understand, the snapshot "size on disk" is the total size of the > snapshot, while the "true size" is the (size_on_disk - size_of_live_sstables). > I created a snapshot on a 3.11 node without traffic and I expected the "true > size" to be 0KB since the original sstables were still present, but this > didn't seem to be the case: > {noformat} > $ nodetool listsnapshots > Snapshot Details: > Snapshot name Keyspace name Column family name True size Size on disk > test ks1 tbl1 4.86 KiB 5.69 KiB > Total TrueDiskSpaceUsed: 4.86 KiB > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17400) Fail build or warn when closeable reference is not closed in tests
[ https://issues.apache.org/jira/browse/CASSANDRA-17400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17400: Labels: lhf (was: ) > Fail build or warn when closeable reference is not closed in tests > -- > > Key: CASSANDRA-17400 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17400 > Project: Cassandra > Issue Type: Task > Reporter: Paulo Motta >Priority: Normal > Labels: lhf > > I came across a recent test stuck issue which was caused by an > {{Autocloseable}} object not being closed properly, leaking some references > and ultimately causing a deadlock. > To prevent similar issues in the future we should add a check that fail or > warn when references are not closed during tests. > If such check already exists we should look into fixing violations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-17400) Fail build or warn when closeable reference is not closed in tests
Paulo Motta created CASSANDRA-17400: --- Summary: Fail build or warn when closeable reference is not closed in tests Key: CASSANDRA-17400 URL: https://issues.apache.org/jira/browse/CASSANDRA-17400 Project: Cassandra Issue Type: Task Reporter: Paulo Motta I came across a recent test stuck issue which was caused by an {{Autocloseable}} object not being closed properly, leaking some references and ultimately causing a deadlock. To prevent similar issues in the future we should add a check that fail or warn when references are not closed during tests. If such check already exists we should look into fixing violations. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17220) Make startup checks configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17220: Status: Changes Suggested (was: Review In Progress) > Make startup checks configurable > > > Key: CASSANDRA-17220 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17220 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > > This ticket was created from needs discovered in CASSANDRA-17180. We want to > be able to configure a startup check so we figured out that it is necessary > to treat all startup checks same - to be able to configure them. This ticket > is about making startup checks configurable. > Once this ticket is done, we can continue with the implementation of > CASSANDRA-17180 where the implementation of gc grace check will be done. > We have identified that there is one check currently in place which needs to > be changed to reflect this configuration implementation and that is > FileSystemOwnershipCheck. > Because startup checks were not configurable before via means of a > configuration file, they were configurable via system properties. This ticket > does not aim to get rid system properties configuration mechanism, system > properties will have precedence over settings in configuration file. Then, in > the next release, I am aiming to get rid of system properties configuration > mechanism. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17220) Make startup checks configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495695#comment-17495695 ] Paulo Motta commented on CASSANDRA-17220: - This is looking very good! I really like that we're making the legacy checks {{non_configurable_check}} while allowing new "configurable" checks to be easily added by registering a new {{{}StartupCheckType{}}}. I think in order for this to be ready we need to make the checks with deprecated properties configurable via the {{startup_checks}} yaml (ie. cassandra.ignore_dc, cassandra.ignore_rack). Also, can you please update {{cassandra.yaml}} with a commented out example on how to configure startup checks? It would be nice to have a definition on CASSANDRA-17292 before merging this to ensure it will be consistent with the new property grouping, even though I think {{startup_checks}} seems to make sense on it's own macro-group. > Make startup checks configurable > > > Key: CASSANDRA-17220 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17220 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > > This ticket was created from needs discovered in CASSANDRA-17180. We want to > be able to configure a startup check so we figured out that it is necessary > to treat all startup checks same - to be able to configure them. This ticket > is about making startup checks configurable. > Once this ticket is done, we can continue with the implementation of > CASSANDRA-17180 where the implementation of gc grace check will be done. > We have identified that there is one check currently in place which needs to > be changed to reflect this configuration implementation and that is > FileSystemOwnershipCheck. > Because startup checks were not configurable before via means of a > configuration file, they were configurable via system properties. This ticket > does not aim to get rid system properties configuration mechanism, system > properties will have precedence over settings in configuration file. Then, in > the next release, I am aiming to get rid of system properties configuration > mechanism. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts
[ https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495648#comment-17495648 ] Paulo Motta edited comment on CASSANDRA-17292 at 2/21/22, 5:52 PM: --- Migrating from the previous to the new configuration layout in the approach proposed above would be: * Decide what macro-categories to start with (ie. core.yaml, guardrails.yaml, features.yaml) * Assign existing properties to the corresponding macro-category "bucket" and group them in feature groups separated by a "section header". The above would already provide a good starting point for new features moving forward: * Any new feature must be added to {{features.yaml}} guarded by a feature-flag unless it's a core feature (must go on {{{}core.yaml{}}}) or a guardrail {{{}(must go on guardrails.yaml{}}}). After the new initial grouping is delivered, we can make incremental changes to the legacy properties via extraction and re-grouping while keeping most of other new configurations unchanged. was (Author: paulo): Migrating from the previous to the new configuration layout in the approach proposed above would be: * Decide what macro-categories to start with (ie. core.yaml, guardrails.yaml, features.yaml) * Assign existing properties to the corresponding macro-category "bucket" and group them in feature groups separated by a "section header". The above would already provide a good starting point for new features moving forward: * Any new feature must be added to {{features.yaml}} guarded by a feature-flag unless it's a core feature (must go on {{{}core.yaml{}}}) or a guardrail {{{}(must go on guardrails.yaml{}}}). After the new initial grouping is delivered, we can make incremental changes to the legacy categories via extraction and re-grouping while keeping most of other new configurations unchanged. > Move cassandra.yaml toward a nested structure around major database concepts > > > Key: CASSANDRA-17292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17292 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.x > > > Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new > features") has made it clear we will gravitate toward appropriately nested > structures for new parameters in {{cassandra.yaml}}, but from the scattered > conversation across a few Guardrails tickets (see CASSANDRA-17212 and > CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to > eventually extend this to the rest of {{cassandra.yaml}}. The benefits of > this change include those we gain by doing it for new features (single point > of interest for feature documentation, typed configuration objects, logical > grouping for additional parameters added over time, discoverability, etc.), > but one a larger scale. > This may overlap with ongoing work, including the Guardrails epic. Ideally, > even a rough cut of a design here would allow that to move forward in a > timely and coherent manner (with less long-term refactoring pain). > While these would have to be adjusted to CASSANDRA-15234 (probably after it > merges), there have been two proposals floated already for what this might > look like: > From [~maedhroz] - > https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a > From [~benedict] - > https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts
[ https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495648#comment-17495648 ] Paulo Motta commented on CASSANDRA-17292: - Migrating from the previous to the new configuration layout in the approach proposed above would be: * Decide what macro-categories to start with (ie. core.yaml, guardrails.yaml, features.yaml) * Assign existing properties to the corresponding macro-category "bucket" and group them in feature groups separated by a "section header". The above would already provide a good starting point for new features moving forward: * Any new feature must be added to {{features.yaml}} guarded by a feature-flag unless it's a core feature (must go on {{{}core.yaml{}}}) or a guardrail {{{}(must go on guardrails.yaml{}}}). After the new initial grouping is delivered, we can make incremental changes to the legacy categories via extraction and re-grouping while keeping most of other new configurations unchanged. > Move cassandra.yaml toward a nested structure around major database concepts > > > Key: CASSANDRA-17292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17292 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.x > > > Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new > features") has made it clear we will gravitate toward appropriately nested > structures for new parameters in {{cassandra.yaml}}, but from the scattered > conversation across a few Guardrails tickets (see CASSANDRA-17212 and > CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to > eventually extend this to the rest of {{cassandra.yaml}}. The benefits of > this change include those we gain by doing it for new features (single point > of interest for feature documentation, typed configuration objects, logical > grouping for additional parameters added over time, discoverability, etc.), > but one a larger scale. > This may overlap with ongoing work, including the Guardrails epic. Ideally, > even a rough cut of a design here would allow that to move forward in a > timely and coherent manner (with less long-term refactoring pain). > While these would have to be adjusted to CASSANDRA-15234 (probably after it > merges), there have been two proposals floated already for what this might > look like: > From [~maedhroz] - > https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a > From [~benedict] - > https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts
[ https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495629#comment-17495629 ] Paulo Motta edited comment on CASSANDRA-17292 at 2/21/22, 4:51 PM: --- I took a look at the proposed layout and while I think this is a great improvement from status quo I think that the intermingling of feature/subsystem/resource in the yaml structure can get a little counterintuitive and does not provide a consistent framework for extending the properties. Furthermore the too-many-levels nesting can get tricky pretty fast. Why do we have to encode the subsystem/resource information in the YAML hierarchy? I think we can achieve a similar effect of improving discoverability by grouping co-related properties in different files and subsections within the same file. I created an alternative proposal [on this gist|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05] that groups properties in two dimensions: category/feature group. The category axis is represented by the name of the property filename ("core.yaml", "guardrails.yaml", "features.yaml") and the feature group is represented by a comment header separating distinct feature groups within the same category. One initial example of categories [from the gist|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05] would be: * [core.yaml|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05#file-core-yaml]: core DB parameters * [guardrails.yaml|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05#file-guardrails-yaml]: any fail/warn thresholds * [features.yaml|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05#file-features-yaml]: any (experimental/prod-ready) feature that can be enabled/disabled. For instance adding new features is basically adding a new section to [features.yaml|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05#file-features-yaml]. This layout facilitates extracting subsections to a new file if the number of properties of that particular section grows too big. For instance, we could extract the {{encryption}} section of [core.yaml|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05#file-core-yaml] into a new file {{encryption.yaml}} if the need for more specialization arises. Other macro-categories that we can have if necessary: * {{{}repair.yaml{}}}: all things repair * {{{}network.yaml{}}}: all things network What do you guys think of this alternative? The proposed gist is by far a complete example, it's just an initial draft to get a feel of how it would look like. was (Author: paulo): I took a look at the proposed layout and while I think this is a great improvement from status quo I think that the intermingling of feature/subsystem/resource in the yaml structure can get a little counterintuitive and does not provide a consistent framework for extending the properties. Furthermore the too-many-levels nesting can get tricky pretty fast. Why do we have to encode the subsystem/resource information in the YAML hierarchy? I think we can achieve a similar effect of improving discoverability by grouping co-related properties in different files and subsections within the same file. I created an alternative proposal [on this gist|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05] that groups properties in two dimensions: category/feature group. The category axis is represented by the name of the property filename ("core.yaml", "guardrails.yaml", "features.yaml") and the feature group is represented by a comment header separating distinct feature groups within the same category. One initial example of categories [from the gist|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05] would be: * {{{}core.yaml{}}}: core DB parameters * {{{}guardrails.yaml{}}}: any fail/warn thresholds * {{{}features.yaml{}}}: any (experimental/prod-ready) feature that can be enabled/disabled. For instance adding new features is basically adding a new section to {{{}features.yaml{}}}. This layout facilitates extracting subsections to a new file if the number of properties of that particular section grows too big. For instance, we could extract the {{encryption}} section of {{core.yaml}} into a new file {{encryption.yaml}} if the need for more specialization arises. Other macro-categories that we can have if necessary: * {{{}repair.yaml{}}}: all things repair * {{{}network.yaml{}}}: all things network What do you guys think of this alternative? The proposed gist is by far a complete example, it's just an initial draft to get a feel of how it would look like. > Move cassandr
[jira] [Commented] (CASSANDRA-17292) Move cassandra.yaml toward a nested structure around major database concepts
[ https://issues.apache.org/jira/browse/CASSANDRA-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495629#comment-17495629 ] Paulo Motta commented on CASSANDRA-17292: - I took a look at the proposed layout and while I think this is a great improvement from status quo I think that the intermingling of feature/subsystem/resource in the yaml structure can get a little counterintuitive and does not provide a consistent framework for extending the properties. Furthermore the too-many-levels nesting can get tricky pretty fast. Why do we have to encode the subsystem/resource information in the YAML hierarchy? I think we can achieve a similar effect of improving discoverability by grouping co-related properties in different files and subsections within the same file. I created an alternative proposal [on this gist|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05] that groups properties in two dimensions: category/feature group. The category axis is represented by the name of the property filename ("core.yaml", "guardrails.yaml", "features.yaml") and the feature group is represented by a comment header separating distinct feature groups within the same category. One initial example of categories [from the gist|https://gist.github.com/pauloricardomg/e9e23feea1b172b4f084cb01d7a89b05] would be: * {{{}core.yaml{}}}: core DB parameters * {{{}guardrails.yaml{}}}: any fail/warn thresholds * {{{}features.yaml{}}}: any (experimental/prod-ready) feature that can be enabled/disabled. For instance adding new features is basically adding a new section to {{{}features.yaml{}}}. This layout facilitates extracting subsections to a new file if the number of properties of that particular section grows too big. For instance, we could extract the {{encryption}} section of {{core.yaml}} into a new file {{encryption.yaml}} if the need for more specialization arises. Other macro-categories that we can have if necessary: * {{{}repair.yaml{}}}: all things repair * {{{}network.yaml{}}}: all things network What do you guys think of this alternative? The proposed gist is by far a complete example, it's just an initial draft to get a feel of how it would look like. > Move cassandra.yaml toward a nested structure around major database concepts > > > Key: CASSANDRA-17292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17292 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Fix For: 5.x > > > Recent mailing list conversation (see "[DISCUSS] Nested YAML configs for new > features") has made it clear we will gravitate toward appropriately nested > structures for new parameters in {{cassandra.yaml}}, but from the scattered > conversation across a few Guardrails tickets (see CASSANDRA-17212 and > CASSANDRA-17148) and CASSANDRA-15234, there is also a general desire to > eventually extend this to the rest of {{cassandra.yaml}}. The benefits of > this change include those we gain by doing it for new features (single point > of interest for feature documentation, typed configuration objects, logical > grouping for additional parameters added over time, discoverability, etc.), > but one a larger scale. > This may overlap with ongoing work, including the Guardrails epic. Ideally, > even a rough cut of a design here would allow that to move forward in a > timely and coherent manner (with less long-term refactoring pain). > While these would have to be adjusted to CASSANDRA-15234 (probably after it > merges), there have been two proposals floated already for what this might > look like: > From [~maedhroz] - > https://github.com/maedhroz/cassandra/commit/450b920e0ac072cec635e0ebcb63538ee7f1fc5a > From [~benedict] - > https://github.com/belliottsmith/cassandra/commits/CASSANDRA-15234-grouping-ideas -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-17391) Allow specifying PARTITION and CLUSTERING keys explicitly on DDL
Paulo Motta created CASSANDRA-17391: --- Summary: Allow specifying PARTITION and CLUSTERING keys explicitly on DDL Key: CASSANDRA-17391 URL: https://issues.apache.org/jira/browse/CASSANDRA-17391 Project: Cassandra Issue Type: Improvement Reporter: Paulo Motta The distinction between primary key and partition/clustering key may be potentially confusing to users so I would like to propose an addition to the to *CREATE TABLE* DDL to specify partition and clustering keys explicitly: CREATE TABLE events ( sensor_id UUID, event_time TIMESTAMP, event TEXT, PARTITION KEY (sensor_id), CLUSTERING KEY (event_time) ) This would be optional and supported in addition to the current *PRIMARY KEY ((partition_key), clustering_key)* construct. One downside is it can get potentially confusing to users if we provide multiple ways to specify partition and clustering keys. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15399) Add ability to track state in repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494139#comment-17494139 ] Paulo Motta commented on CASSANDRA-15399: - +1 to a virtual table into the state of currently running repairs, this would be a great addition to improve repair observability - we can keep the {{repairHistory}} tables for historical data. > Add ability to track state in repair > > > Key: CASSANDRA-15399 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15399 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > To enhance the visibility in repair, we should expose internal state via > virtual tables; the state should include coordinator as well as participant > state (validation, sync, etc.) > I propose the following tables: > repairs - high level summary of the global state of repair; this should be > called on the coordinator. > {code:sql} > CREATE TABLE repairs ( > id uuid, > keyspace_name text, > table_names frozen>, > ranges frozen>, > coordinator text, > participants frozen>, > state text, > progress_percentage float, > last_updated_at_millis bigint, > duration_micro bigint, > failure_cause text, > PRIMARY KEY ( (id) ) > ) > {code} > repair_tasks - represents RepairJob and participants state. This will show > if validations are running on participants and the progress they are making; > this should be called on the coordinator. > {code:sql} > CREATE TABLE repair_tasks ( > id uuid, > session_id uuid, > keyspace_name text, > table_name text, > ranges frozen>, > coordinator text, > participant text, > state text, > state_description text, > progress_percentage float, -- between 0.0 and 100.0 > last_updated_at_millis bigint, > duration_micro bigint, > failure_cause text, > PRIMARY KEY ( (id), session_id, table_name, participant ) > ) > {code} > repair_validations - shows the state of the validation task and updated > periodically while validation is running; this should be called on the > participants. > {code:sql} > CREATE TABLE repair_validations ( > id uuid, > session_id uuid, > ranges frozen>, > keyspace_name text, > table_name text, > initiator text, > state text, > progress_percentage float, > queue_duration_ms bigint, > runtime_duration_ms bigint, > total_duration_ms bigint, > estimated_partitions bigint, > partitions_processed bigint, > estimated_total_bytes bigint, > failure_cause text, > PRIMARY KEY ( (id), session_id, table_name ) > ) > {code} > The main reason for exposing virtual tables rather than exposing through > durable tables is to make sure what is exposed is accurate. In cases of > write failures or node failures, the durable tables could become in-accurate > and could add edge cases where the repair is not running but the tables say > it is; by relying on repair's internal in-memory bookkeeping, these problems > go away. > This jira does not try to solve the following: > 1) repair resiliency - there are edge cases where repair hits an error and > runs forever (at least from nodetool's perspective). > 2) repair stream tracking - I have not learned the streaming side yet and > what I see is multiple implementations exist, so seems like high scope. My > hope is to punt from this jira and tackle separately. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-5108) expose overall progress of cleanup tasks in jmx
[ https://issues.apache.org/jira/browse/CASSANDRA-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492708#comment-17492708 ] Paulo Motta commented on CASSANDRA-5108: [~tejavadali] I think it's displaying the operation completion percentage on {{nodetool compactionstats}}, which keeps a percentage count per operation but apparently it's not working for {{CLEANUP}}. Perhaps you can try reproducing this by running a cleanup manually on {{ccm}} after bootstrap to check how the progress is displayed on {{nodetool compactionstats}}? > expose overall progress of cleanup tasks in jmx > --- > > Key: CASSANDRA-5108 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5108 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability, Local/Compaction >Affects Versions: 1.2.0 >Reporter: Michael Kjellman >Assignee: Krishna Vadali >Priority: Low > Labels: lhf > Fix For: 4.x > > > it would be nice if, upon starting a cleanup operation, cassandra could > maintain a Set (i assume this already exists as we have to know which file to > act on next) and a new set of "completed" sstables. When each is compacted > remove it from the pending list. That way C* could give an overall completion > of the long running and pending cleanup tasks. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17180) Implement heartbeat service to know last time Cassandra node was up
[ https://issues.apache.org/jira/browse/CASSANDRA-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17180: Status: Patch Available (was: Review In Progress) > Implement heartbeat service to know last time Cassandra node was up > --- > > Key: CASSANDRA-17180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17180 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Observability >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > As already discussed on ML, it would be nice to have a service which would > periodically write timestamp to a file signalling it is up / running. > Then, on the startup, we would read this file and we would determine if there > is some table which gc grace is behind this time and we would fail the start > so we would prevent zombie data to be likely spread around a cluster. > https://lists.apache.org/thread/w4w5t2hlcrvqhgdwww61hgg58qz13glw -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17380) Add support for EXPLAIN statements
[ https://issues.apache.org/jira/browse/CASSANDRA-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-17380: Labels: gsoc gsoc22 (was: gsoc) > Add support for EXPLAIN statements > -- > > Key: CASSANDRA-17380 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17380 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Lerer >Priority: Normal > Labels: gsoc, gsoc22 > > We should provide users a way to understand how their query will be executed > and some information on the amount of work that will be performed. > Explain statements are the most common way to do that. > A CEP Draft has been open for that: [(DRAFT) CEP-4: > Explain|https://docs.google.com/document/d/1s_gc4TDYdDbHnYHHVxxjqVVUn3MONUqG6W2JehnC11g/edit]. > This draft propose to add support for {{EXPLAIN}} and {{EXPLAIN ANALYZE}} > but I believe that we should split the work in 2 parts because a simple > {{EXPLAIN}} would already provide relevant information. > To complete this work I believe that the following steps will be required: > * Rework and submit the CEP > * Add missing statistics > * Implements the logic behind the EXPLAIN statements -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17381) Produce and verify BoundedReadCompactionStrategy as a unified general purpose compaction algorithm
[ https://issues.apache.org/jira/browse/CASSANDRA-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492105#comment-17492105 ] Paulo Motta commented on CASSANDRA-17381: - Hi [~gimhana.ds], I think Joey added some starting instructions in his previous comment: > Warm up tasks include pulling the branch, rebasing it against 3.0, getting it > to compile if there are issues, and starting up a local Cassandra node with a > table configured to use the new compaction strategy with DEBUG logging on to > observe the choices. > Produce and verify BoundedReadCompactionStrategy as a unified general purpose > compaction algorithm > -- > > Key: CASSANDRA-17381 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17381 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > Labels: gsoc, gsoc2022 > > The existing compaction strategies have a number of drawbacks that make all > three unsuitable as a general use compaction strategy, for example STCS > creates giant files that are hard to back up, mess with read performance and > the page cache, and led to many of the early re-open bugs. LCS improved > dramatically on this but also has various issues e.g. lack of performant full > compaction or due to the strict leveling with e.g. bulk loading when writes > exceed the rate we can do the L0 - L1 promotion. > In this > [talk|https://github.com/ngcc/ngcc2019/blob/master/NextGenerationCassandraCompactionGoingBeyondLCS.pdf] > I proposed a novel compaction strategy that aims to expose a single tunable > that the user can control for the read amplification. Raise the > min_threshold_levels and you tradeoff read/space performance for write > performance. Since then a proof of concept [patch > |https://github.com/jolynch/cassandra/tree/jolynch_bounded_read_final]has > been published along with some rudimentary [documentation > |https://gist.github.com/jolynch/9118465f32ad5298b4e39d03ccd4370e] but this > is still not tracked in Jira. > The remaining work here is > 1. Validate the algorithm is correct via test suites and performance testing > stress testing and benchmarking with OSS tools (e.g. cassandra-stress, > [tlp-stress|https://github.com/thelastpickle/tlp-stress], or > [ndbench|https://github.com/Netflix/ndbench]). When issues are found (there > likely will be issues as the patch is a PoC), devise how to adjust the > algorithm and implementation appropriately. Key metric of success is we can > run Cassandra stably for more than 24 hours while applying sustained load, > with minimal compaction load (and also compaction can keep up). > 2. Do more in depth experiments measuring performance across a wide range of > workloads (e.g. write heavy, read heavy, balanced, time series, register > update, etc ...) and in comparison with LCS (leveled), STCS (size tiered), > and TWCS (time window). Key metrics of success are establishing that as we > tune max_read_per_read we should get more predictable read latency under low > system load (ρ < 30%) while not degrading at high system load (ρ > 70%), and > we should match LCS performance under low load while doing better at high > load. > Once this is validated a Cassandra blog post reporting on the findings > (positive or negative) may be advisable. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org