[jira] [Comment Edited] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time
[ https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822210#comment-17822210 ] Jeremy Hanna edited comment on CASSANDRA-19448 at 2/29/24 4:28 PM: --- It's currently unassigned so feel free to take a look. Thanks! I've thought about it as a bug just because Cassandra stores update times as milliseconds or microseconds and there is nothing in the description that says that you can't use that granularity. It's just that the example is in seconds. It's not clear and there's no warning or error if you give it something with a granularity greater than seconds - it just ignores it. What to do about that could be either to: # be clearer in the docs and have a warning/error when users try to use a granularity greater than seconds. # make it respect greater granularities which aligns more with the C* write timestamp formats I think 2 is the better outcome. So I think it could be argued as a bug or an improvement. [~brandon.williams] do you have any thoughts on bug or improvement designation? was (Author: jeromatron): It's currently unassigned so feel free to take a look. Thanks! I've thought about it as a bug just because Cassandra stores update times as milliseconds or microseconds and there is nothing in the description that says that you can't use that granularity. It's just that the example is in seconds. Since it's not clear and there's no warning or error if you give it something with a granularity greater than seconds - it just ignores it. What to do about that could be either to: # be clearer in the docs and have a warning/error when users try to use a granularity greater than seconds. # make it respect greater granularities which aligns more with the C* write timestamp formats I think 2 is the better outcome. So I think it could be argued as a bug or an improvement. [~brandon.williams] do you have any thoughts on bug or improvement designation? > CommitlogArchiver only has granularity to seconds for restore_point_in_time > --- > > Key: CASSANDRA-19448 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19448 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jeremy Hanna >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > Commitlog archiver allows users to backup commitlog files for the purpose of > doing point in time restores. The [configuration > file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties] > gives an example of down to the seconds granularity but then asks what > whether the timestamps are microseconds or milliseconds - defaulting to > microseconds. Because the [CommitLogArchiver uses a second based date > format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52], > if a user specifies to restore at something at a lower granularity like > milliseconds or microseconds, that means that the it will truncate everything > after the second and restore to that second. So say you specify a > restore_point_in_time like this: > restore_point_in_time=2024:01:18 17:01:01.623392 > it will silently truncate everything after the 01 seconds. So effectively to > the user, it is missing updates between 01 and 01.623392. > This appears to be a bug in the intent. We should allow users to specify > down to the millisecond or even microsecond level. If we allow them to > specify down to microseconds for the restore point in time, then it may > internally need to change from a long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time
[ https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822210#comment-17822210 ] Jeremy Hanna commented on CASSANDRA-19448: -- It's currently unassigned so feel free to take a look. Thanks! I've thought about it as a bug just because Cassandra stores update times as milliseconds or microseconds and there is nothing in the description that says that you can't use that granularity. It's just that the example is in seconds. Since it's not clear and there's no warning or error if you give it something with a granularity greater than seconds - it just ignores it. What to do about that could be either to: # be clearer in the docs and have a warning/error when users try to use a granularity greater than seconds. # make it respect greater granularities which aligns more with the C* write timestamp formats I think 2 is the better outcome. So I think it could be argued as a bug or an improvement. [~brandon.williams] do you have any thoughts on bug or improvement designation? > CommitlogArchiver only has granularity to seconds for restore_point_in_time > --- > > Key: CASSANDRA-19448 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19448 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jeremy Hanna >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > Commitlog archiver allows users to backup commitlog files for the purpose of > doing point in time restores. The [configuration > file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties] > gives an example of down to the seconds granularity but then asks what > whether the timestamps are microseconds or milliseconds - defaulting to > microseconds. Because the [CommitLogArchiver uses a second based date > format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52], > if a user specifies to restore at something at a lower granularity like > milliseconds or microseconds, that means that the it will truncate everything > after the second and restore to that second. So say you specify a > restore_point_in_time like this: > restore_point_in_time=2024:01:18 17:01:01.623392 > it will silently truncate everything after the 01 seconds. So effectively to > the user, it is missing updates between 01 and 01.623392. > This appears to be a bug in the intent. We should allow users to specify > down to the millisecond or even microsecond level. If we allow them to > specify down to microseconds for the restore point in time, then it may > internally need to change from a long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time
[ https://issues.apache.org/jira/browse/CASSANDRA-19448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-19448: - Description: Commitlog archiver allows users to backup commitlog files for the purpose of doing point in time restores. The [configuration file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties] gives an example of down to the seconds granularity but then asks what whether the timestamps are microseconds or milliseconds - defaulting to microseconds. Because the [CommitLogArchiver uses a second based date format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52], if a user specifies to restore at something at a lower granularity like milliseconds or microseconds, that means that the it will truncate everything after the second and restore to that second. So say you specify a restore_point_in_time like this: restore_point_in_time=2024:01:18 17:01:01.623392 it will silently truncate everything after the 01 seconds. So effectively to the user, it is missing updates between 01 and 01.623392. This appears to be a bug in the intent. We should allow users to specify down to the millisecond or even microsecond level. If we allow them to specify down to microseconds for the restore point in time, then it may internally need to change from a long. was: Commitlog archiver allows users to backup commitlog files for the purpose of doing point in time restores. The [configuration file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties] gives an example of down to the seconds granularity but then asks what whether the timestamps are microseconds or milliseconds - defaulting to microseconds. Because the [CommitLogArchiver uses a second based date format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52], if a user specifies to restore at something at a lower granularity like milliseconds or microseconds, that means that the it will truncate everything after the second and restore to that second. So say you specify a restore_point_in_time like this: restore_point_in_time=2024:01:18 17:01:01.623392 it will silently truncate everything after the 01 seconds. So effectively to the user, it is missing updates between 01 and 01.623392. This appears to be bug in the intent. We should allow users to specify down to the millisecond or even microsecond level. If we allow them to specify down to microseconds for the restore point in time, then it may internally need to change from a long. > CommitlogArchiver only has granularity to seconds for restore_point_in_time > --- > > Key: CASSANDRA-19448 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19448 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Jeremy Hanna >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > Commitlog archiver allows users to backup commitlog files for the purpose of > doing point in time restores. The [configuration > file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties] > gives an example of down to the seconds granularity but then asks what > whether the timestamps are microseconds or milliseconds - defaulting to > microseconds. Because the [CommitLogArchiver uses a second based date > format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52], > if a user specifies to restore at something at a lower granularity like > milliseconds or microseconds, that means that the it will truncate everything > after the second and restore to that second. So say you specify a > restore_point_in_time like this: > restore_point_in_time=2024:01:18 17:01:01.623392 > it will silently truncate everything after the 01 seconds. So effectively to > the user, it is missing updates between 01 and 01.623392. > This appears to be a bug in the intent. We should allow users to specify > down to the millisecond or even microsecond level. If we allow them to > specify down to microseconds for the restore point in time, then it may > internally need to change from a long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19448) CommitlogArchiver only has granularity to seconds for restore_point_in_time
Jeremy Hanna created CASSANDRA-19448: Summary: CommitlogArchiver only has granularity to seconds for restore_point_in_time Key: CASSANDRA-19448 URL: https://issues.apache.org/jira/browse/CASSANDRA-19448 Project: Cassandra Issue Type: Bug Reporter: Jeremy Hanna Commitlog archiver allows users to backup commitlog files for the purpose of doing point in time restores. The [configuration file|https://github.com/apache/cassandra/blob/trunk/conf/commitlog_archiving.properties] gives an example of down to the seconds granularity but then asks what whether the timestamps are microseconds or milliseconds - defaulting to microseconds. Because the [CommitLogArchiver uses a second based date format|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java#L52], if a user specifies to restore at something at a lower granularity like milliseconds or microseconds, that means that the it will truncate everything after the second and restore to that second. So say you specify a restore_point_in_time like this: restore_point_in_time=2024:01:18 17:01:01.623392 it will silently truncate everything after the 01 seconds. So effectively to the user, it is missing updates between 01 and 01.623392. This appears to be bug in the intent. We should allow users to specify down to the millisecond or even microsecond level. If we allow them to specify down to microseconds for the restore point in time, then it may internally need to change from a long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19362) An "include" is broken on the Storage Engine documentation page
Jeremy Hanna created CASSANDRA-19362: Summary: An "include" is broken on the Storage Engine documentation page Key: CASSANDRA-19362 URL: https://issues.apache.org/jira/browse/CASSANDRA-19362 Project: Cassandra Issue Type: Bug Components: Documentation Reporter: Jeremy Hanna The example code at the bottom of the "Storage Engine" page doesn't appear to be including the code properly. See https://cassandra.apache.org/doc/stable/cassandra/architecture/storage_engine.html#example-code -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812359#comment-17812359 ] Jeremy Hanna commented on CASSANDRA-9328: - See CASSANDRA-15350 for a separated out exception type in native protocol v5. > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms > > > Key: CASSANDRA-9328 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination >Reporter: Aaron Whiteside >Priority: Normal > Labels: LWT > Attachments: CassandraLWTTest.java, CassandraLWTTest2.java > > > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms. > Unit test attached, run against a 3 node cluster running 2.1.5. > If you reduce the threadCount to 1, you never see a WriteTimeoutException. If > the WTE is due to not being able to communicate with other nodes, why does > the concurrency >1 cause inter-node communication to fail? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-8110) Make streaming forward & backwards compatible
[ https://issues.apache.org/jira/browse/CASSANDRA-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770086#comment-17770086 ] Jeremy Hanna commented on CASSANDRA-8110: - [~yukim] is this done with what [~Bereng] did on CASSANDRA-14227? Specifically, the {{storage_compatibility_mode}} as described in the latest cassandra.yaml ([here|https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L2090-L2115] is the block from cassandra.yaml): {code:java} # This property indicates with what Cassandra major version the storage format will be compatible with. # # The chosen storage compatiblity mode will determine the versions of the written sstables, commitlogs, hints, # etc. Those storage elements will use the higher minor versions of the major version that corresponds to the # Cassandra version we want to stay compatible with. For example, if we want to stay compatible with Cassandra 4.0 # or 4.1, the value of this property should be 4, and that will make us use 'nc' sstables. # # This will also determine if certain features depending on newer formats are available. For example, extended TTLs # up to 2106 depend on the sstable, commitlog, hints and messaging versions that were introduced by Cassandra 5.0, # so that feature won't be available if this property is set to CASSANDRA_4. See upgrade guides for details. Currently # the only supported major is CASSANDRA_4. # # Possible values are in the StorageCompatibilityMode.java file accessible online. At the time of writing these are: # - CASSANDRA_4: Stays compatible with the 4.x line in features, formats and component versions. # - UPGRADING: The cluster monitors nodes versions during this interim stage. _This has a cost_ but ensures any new features, # formats, versions, etc are enabled safely. # - NONE: Start with all the new features and formats enabled. # # A typical upgrade would be: # - Do a rolling upgrade starting all nodes in CASSANDRA_Y compatibility mode. # - Once the new binary is rendered stable do a rolling restart with UPGRADING. The cluster will enable new features in a safe way # until all nodes are started in UPGRADING, then all new features are enabled. # - Do a rolling restart with all nodes starting with NONE. This sheds the extra cost of checking nodes versions and ensures # a stable cluster. If a node from a previous version was started by accident we won't any longer toggle behaviors as when UPGRADING. # storage_compatibility_mode: CASSANDRA_4 {code} > Make streaming forward & backwards compatible > - > > Key: CASSANDRA-8110 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8110 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Streaming and Messaging >Reporter: Marcus Eriksson >Priority: Normal > Labels: gsoc2016, mentor > > To be able to seamlessly upgrade clusters we need to make it possible to > stream files between nodes with different StreamMessage.CURRENT_VERSION -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18837) Tab complete datacenter values in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-18837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-18837: - Complexity: Low Hanging Fruit > Tab complete datacenter values in cqlsh > --- > > Key: CASSANDRA-18837 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18837 > Project: Cassandra > Issue Type: Task > Components: CQL/Interpreter >Reporter: Jeremy Hanna >Priority: Normal > > cqlsh has a number of great tab completions. For example, when creating a > keyspace it will tab complete the syntax for options and give you options for > the replication strategy. It doesn't show options for the data centers, > which would be nice to have. The server has access to the list of data > centers in the cluster. So there shouldn't be a reason why that couldn't tab > complete. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-18837) Tab complete datacenter values in cqlsh
Jeremy Hanna created CASSANDRA-18837: Summary: Tab complete datacenter values in cqlsh Key: CASSANDRA-18837 URL: https://issues.apache.org/jira/browse/CASSANDRA-18837 Project: Cassandra Issue Type: Task Components: CQL/Interpreter Reporter: Jeremy Hanna cqlsh has a number of great tab completions. For example, when creating a keyspace it will tab complete the syntax for options and give you options for the replication strategy. It doesn't show options for the data centers, which would be nice to have. The server has access to the list of data centers in the cluster. So there shouldn't be a reason why that couldn't tab complete. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18473) Storage Attached Indexes (Phase 2)
[ https://issues.apache.org/jira/browse/CASSANDRA-18473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-18473: - Labels: SAI (was: ) > Storage Attached Indexes (Phase 2) > -- > > Key: CASSANDRA-18473 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18473 > Project: Cassandra > Issue Type: Epic > Components: Feature/2i Index >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Labels: SAI > > At the completion of CASSANDRA-16052, we should be able to release the core > capabilities of SAI in a stable, production-ready package. Once that begins > to gain traction, we'll be able to make improvements and add features for the > next major release. The major initial theme of this epic is likely to be > performance, but it will likely expand to include features like basic text > analysis, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-8110) Make streaming forward & backwards compatible
[ https://issues.apache.org/jira/browse/CASSANDRA-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754185#comment-17754185 ] Jeremy Hanna commented on CASSANDRA-8110: - Is there any update on this since the approach or implementation as we're getting closer to finalizing 5.0? > Make streaming forward & backwards compatible > - > > Key: CASSANDRA-8110 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8110 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/Streaming and Messaging >Reporter: Marcus Eriksson >Priority: Normal > Labels: gsoc2016, mentor > > To be able to seamlessly upgrade clusters we need to make it possible to > stream files between nodes with different StreamMessage.CURRENT_VERSION -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-18269) Update the client drivers list
Jeremy Hanna created CASSANDRA-18269: Summary: Update the client drivers list Key: CASSANDRA-18269 URL: https://issues.apache.org/jira/browse/CASSANDRA-18269 Project: Cassandra Issue Type: Task Components: Documentation Reporter: Jeremy Hanna Currently, the docs has a page that lists client drivers by language. It's got a lot of entries that, on further investigation, haven't been updated in several years. It would be good to either indicate the activity on the driver/project or remove the older ones so that people don't get the wrong impression and use something that won't serve them well. https://cassandra.apache.org/doc/latest/cassandra/getting_started/drivers.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-11721) Have a per operation truncate ddl "no snapshot" option
[ https://issues.apache.org/jira/browse/CASSANDRA-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644885#comment-17644885 ] Jeremy Hanna edited comment on CASSANDRA-11721 at 12/8/22 4:06 PM: --- I think CASSANDRA-10383 solves the production use cases for this and I'm very happy that it got implemented there. There are cases in test and dev environments where I could still see a per operation setting being useful, but the majority of the use cases are covered by a table level setting. I'm happy to "won't do" this one as updating CQL is a pain for just those use cases. was (Author: jeromatron): I think CASSANDRA-10383 solves the production use cases for this and I'm very happy that it got implemented there. There are cases in test and dev environments where I could still see a per operation setting being useful, but the majority of the use cases are covered by a table level setting. I'm happy to "won't fix" this one as updating CQL is a pain for just those use cases. > Have a per operation truncate ddl "no snapshot" option > -- > > Key: CASSANDRA-11721 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11721 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/CQL, Local/Snapshots >Reporter: Jeremy Hanna >Priority: Low > Labels: AdventCalendar2021 > > Right now with truncate, it will always create a snapshot. That is the right > thing to do most of the time. 'auto_snapshot' exists as an option to disable > that but it is server wide and requires a restart to change. There are data > models, however, that require rotating through a handful of tables and > periodically truncating them. Currently you either have to operate with no > safety net (some actually do this) or manually clear those snapshots out > periodically. Both are less than optimal. > In HDFS, you generally delete something where it goes to the trash. If you > don't want that safety net, you can do something like 'rm -rf -skiptrash > /jeremy/stuff' in one command. > It would be nice to have something in the truncate ddl to skip the snapshot > on a per operation basis. Perhaps 'TRUNCATE solarsystem.earth NO SNAPSHOT'. > This might also be useful in those situations where you're just playing with > data and you don't want something to take a snapshot in a development system. > If that's the case, this would also be useful for the DROP operation, but > that convenience is not the main reason for this option. > +Additional information for newcomers:+ > This test is a bit more complex that normal LHF tickets but is still > reasonably easy. > The idea is to support disabling snapshots when performing a Truncate as > follow: > {code}TRUNCATE x WITH OPTIONS = { 'snapshot' : false }{code} > In order to implement that feature several changes are required: > * A new Class {{TruncateAttributes}} inheriting from {{PropertyDefinitions}} > must be create in a similar way to {{KeyspaceAttributes}} or > {{TableAttributes}} > * This class should be passed to the {{TruncateStatement}} constructor and > stored as a field > * The ANTLR parser logic should be change to retrieve the options and passe > them to the constructor (see {{createKeyspaceStatement}} for an example) > * The {{TruncateStatement}} will then need to be modified to take into > account the new option. Locally it will neeed to call > {{ColumnFamilyStore#truncateBlockingWithoutSnapshot}} if no snapshot should > be done instead of {{ColumnFamilyStore#truncateBlocking}}. For non local > call it will need to pass a new parameter to > {{StorageProxy#truncateBloking}}. That parameter will then need to be passed > to the other nodes through the {{TruncateRequest}}. > * As a new field need to be added to {{TruncateRequest}} this field will need > to be serialized and deserialized and a new {{MessagingService.Version}} will > need to be created and set as the current version the new version should be > 50 (and yes it means that the next release will be a major one 5.0) > * In {{TruncateVerbHandler}} the new field should be used to determine if > {{ColumnFamilyStore#truncateBlockingWithoutSnapshot}} or > {{ColumnFamilyStore#truncateBlocking}} should be called. > * An in-jvm test should be added in > {{test/distributed/org/apache/cassandra/distributed/test}} to test that > truncate does not generate snapshots when the new option is specified. > Do not hesitate to ping the mentor for more information. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11721) Have a per operation truncate ddl "no snapshot" option
[ https://issues.apache.org/jira/browse/CASSANDRA-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-11721: - Resolution: Won't Do Status: Resolved (was: Open) As discussed previously, CASSANDRA-10383 solves the majority of what this covers. > Have a per operation truncate ddl "no snapshot" option > -- > > Key: CASSANDRA-11721 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11721 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/CQL, Local/Snapshots >Reporter: Jeremy Hanna >Priority: Low > Labels: AdventCalendar2021 > > Right now with truncate, it will always create a snapshot. That is the right > thing to do most of the time. 'auto_snapshot' exists as an option to disable > that but it is server wide and requires a restart to change. There are data > models, however, that require rotating through a handful of tables and > periodically truncating them. Currently you either have to operate with no > safety net (some actually do this) or manually clear those snapshots out > periodically. Both are less than optimal. > In HDFS, you generally delete something where it goes to the trash. If you > don't want that safety net, you can do something like 'rm -rf -skiptrash > /jeremy/stuff' in one command. > It would be nice to have something in the truncate ddl to skip the snapshot > on a per operation basis. Perhaps 'TRUNCATE solarsystem.earth NO SNAPSHOT'. > This might also be useful in those situations where you're just playing with > data and you don't want something to take a snapshot in a development system. > If that's the case, this would also be useful for the DROP operation, but > that convenience is not the main reason for this option. > +Additional information for newcomers:+ > This test is a bit more complex that normal LHF tickets but is still > reasonably easy. > The idea is to support disabling snapshots when performing a Truncate as > follow: > {code}TRUNCATE x WITH OPTIONS = { 'snapshot' : false }{code} > In order to implement that feature several changes are required: > * A new Class {{TruncateAttributes}} inheriting from {{PropertyDefinitions}} > must be create in a similar way to {{KeyspaceAttributes}} or > {{TableAttributes}} > * This class should be passed to the {{TruncateStatement}} constructor and > stored as a field > * The ANTLR parser logic should be change to retrieve the options and passe > them to the constructor (see {{createKeyspaceStatement}} for an example) > * The {{TruncateStatement}} will then need to be modified to take into > account the new option. Locally it will neeed to call > {{ColumnFamilyStore#truncateBlockingWithoutSnapshot}} if no snapshot should > be done instead of {{ColumnFamilyStore#truncateBlocking}}. For non local > call it will need to pass a new parameter to > {{StorageProxy#truncateBloking}}. That parameter will then need to be passed > to the other nodes through the {{TruncateRequest}}. > * As a new field need to be added to {{TruncateRequest}} this field will need > to be serialized and deserialized and a new {{MessagingService.Version}} will > need to be created and set as the current version the new version should be > 50 (and yes it means that the next release will be a major one 5.0) > * In {{TruncateVerbHandler}} the new field should be used to determine if > {{ColumnFamilyStore#truncateBlockingWithoutSnapshot}} or > {{ColumnFamilyStore#truncateBlocking}} should be called. > * An in-jvm test should be added in > {{test/distributed/org/apache/cassandra/distributed/test}} to test that > truncate does not generate snapshots when the new option is specified. > Do not hesitate to ping the mentor for more information. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11721) Have a per operation truncate ddl "no snapshot" option
[ https://issues.apache.org/jira/browse/CASSANDRA-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644885#comment-17644885 ] Jeremy Hanna commented on CASSANDRA-11721: -- I think CASSANDRA-10383 solves the production use cases for this and I'm very happy that it got implemented there. There are cases in test and dev environments where I could still see a per operation setting being useful, but the majority of the use cases are covered by a table level setting. I'm happy to "won't fix" this one as updating CQL is a pain for just those use cases. > Have a per operation truncate ddl "no snapshot" option > -- > > Key: CASSANDRA-11721 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11721 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/CQL, Local/Snapshots >Reporter: Jeremy Hanna >Priority: Low > Labels: AdventCalendar2021 > > Right now with truncate, it will always create a snapshot. That is the right > thing to do most of the time. 'auto_snapshot' exists as an option to disable > that but it is server wide and requires a restart to change. There are data > models, however, that require rotating through a handful of tables and > periodically truncating them. Currently you either have to operate with no > safety net (some actually do this) or manually clear those snapshots out > periodically. Both are less than optimal. > In HDFS, you generally delete something where it goes to the trash. If you > don't want that safety net, you can do something like 'rm -rf -skiptrash > /jeremy/stuff' in one command. > It would be nice to have something in the truncate ddl to skip the snapshot > on a per operation basis. Perhaps 'TRUNCATE solarsystem.earth NO SNAPSHOT'. > This might also be useful in those situations where you're just playing with > data and you don't want something to take a snapshot in a development system. > If that's the case, this would also be useful for the DROP operation, but > that convenience is not the main reason for this option. > +Additional information for newcomers:+ > This test is a bit more complex that normal LHF tickets but is still > reasonably easy. > The idea is to support disabling snapshots when performing a Truncate as > follow: > {code}TRUNCATE x WITH OPTIONS = { 'snapshot' : false }{code} > In order to implement that feature several changes are required: > * A new Class {{TruncateAttributes}} inheriting from {{PropertyDefinitions}} > must be create in a similar way to {{KeyspaceAttributes}} or > {{TableAttributes}} > * This class should be passed to the {{TruncateStatement}} constructor and > stored as a field > * The ANTLR parser logic should be change to retrieve the options and passe > them to the constructor (see {{createKeyspaceStatement}} for an example) > * The {{TruncateStatement}} will then need to be modified to take into > account the new option. Locally it will neeed to call > {{ColumnFamilyStore#truncateBlockingWithoutSnapshot}} if no snapshot should > be done instead of {{ColumnFamilyStore#truncateBlocking}}. For non local > call it will need to pass a new parameter to > {{StorageProxy#truncateBloking}}. That parameter will then need to be passed > to the other nodes through the {{TruncateRequest}}. > * As a new field need to be added to {{TruncateRequest}} this field will need > to be serialized and deserialized and a new {{MessagingService.Version}} will > need to be created and set as the current version the new version should be > 50 (and yes it means that the next release will be a major one 5.0) > * In {{TruncateVerbHandler}} the new field should be used to determine if > {{ColumnFamilyStore#truncateBlockingWithoutSnapshot}} or > {{ColumnFamilyStore#truncateBlocking}} should be called. > * An in-jvm test should be added in > {{test/distributed/org/apache/cassandra/distributed/test}} to test that > truncate does not generate snapshots when the new option is specified. > Do not hesitate to ping the mentor for more information. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17352) CVE-2021-44521: Apache Cassandra: Remote code execution for scripted UDFs
[ https://issues.apache.org/jira/browse/CASSANDRA-17352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610206#comment-17610206 ] Jeremy Hanna commented on CASSANDRA-17352: -- [~marcuse] do you have any thoughts on the flags that were used here? Am I misunderstanding intent of having two flags? > CVE-2021-44521: Apache Cassandra: Remote code execution for scripted UDFs > - > > Key: CASSANDRA-17352 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17352 > Project: Cassandra > Issue Type: Bug > Components: Feature/UDF >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 3.0.26, 3.11.12, 4.0.2 > > > When running Apache Cassandra with the following configuration: > enable_user_defined_functions: true > enable_scripted_user_defined_functions: true > enable_user_defined_functions_threads: false > it is possible for an attacker to execute arbitrary code on the host. The > attacker would need to have enough permissions to create user defined > functions in the cluster to be able to exploit this. Note that this > configuration is documented as unsafe, and will continue to be considered > unsafe after this CVE. > This issue is being tracked as CASSANDRA-17352 > Mitigation: > Set `enable_user_defined_functions_threads: true` (this is default) > or > 3.0 users should upgrade to 3.0.26 > 3.11 users should upgrade to 3.11.12 > 4.0 users should upgrade to 4.0.2 > Credit: > This issue was discovered by Omer Kaspi of the JFrog Security vulnerability > research team. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-17352) CVE-2021-44521: Apache Cassandra: Remote code execution for scripted UDFs
[ https://issues.apache.org/jira/browse/CASSANDRA-17352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601997#comment-17601997 ] Jeremy Hanna commented on CASSANDRA-17352: -- I just want to make sure the settings have the practical outcomes that are intended. I can use UDFs with just the following setting: {{enable_user_defined_functions: true}} However if I want to enable multi-threaded behavior in the UDFs, I would need to set: {{enable_user_defined_functions: true}} {{enable_user_defined_functions_threads: false}} {{allow_insecure_udfs: true}} If I don't do the last one, {{allow_insecure_udfs: true}}, then the server doesn't start and it gives the warning/recommendation but also says that it would require that field to be set to true to continue. Once these fields are set, I can start the server (in my case 3.11.13). However according to the [code|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/security/ThreadAwareSecurityManager.java#L186], it looks like the {{allow_extra_insecure_udfs}} setting should also be set to true for the server to start up. Otherwise it should throw an AccessDenied exception. So my question is: is there a bug in the implementation where we allow it to start without setting {{allow_extra_insecure_udfs: true}}? Also if it does throw an AccessDenied exception, shouldn't it fail earlier when parsing the configuration with a log message that it is required? That leads to another question about this, if it does require both flags to start the server, why do we have two flags? Why not just {{allow_insecure_udfs}} if there is no effective difference between setting {{allow_insecure_udfs}} and setting both of them. I know the intent from the ticket was that the {{allow_extra_insecure_udfs}} was to further relax security for those wanting to use the java.lang.System package in the UDF, but the line of code from the ThreadAwareSecurityManager seems to suggest that there is no difference. > CVE-2021-44521: Apache Cassandra: Remote code execution for scripted UDFs > - > > Key: CASSANDRA-17352 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17352 > Project: Cassandra > Issue Type: Bug > Components: Feature/UDF >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 3.0.26, 3.11.12, 4.0.2 > > > When running Apache Cassandra with the following configuration: > enable_user_defined_functions: true > enable_scripted_user_defined_functions: true > enable_user_defined_functions_threads: false > it is possible for an attacker to execute arbitrary code on the host. The > attacker would need to have enough permissions to create user defined > functions in the cluster to be able to exploit this. Note that this > configuration is documented as unsafe, and will continue to be considered > unsafe after this CVE. > This issue is being tracked as CASSANDRA-17352 > Mitigation: > Set `enable_user_defined_functions_threads: true` (this is default) > or > 3.0 users should upgrade to 3.0.26 > 3.11 users should upgrade to 3.11.12 > 4.0 users should upgrade to 4.0.2 > Credit: > This issue was discovered by Omer Kaspi of the JFrog Security vulnerability > research team. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15803) Separate out allow filtering scanning through a partition versus scanning over the table
[ https://issues.apache.org/jira/browse/CASSANDRA-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17568052#comment-17568052 ] Jeremy Hanna commented on CASSANDRA-15803: -- I could see this getting added to the guardrails framework - separating out cluster scanning from partition scanning as two separate guardrails. > Separate out allow filtering scanning through a partition versus scanning > over the table > > > Key: CASSANDRA-15803 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15803 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Syntax >Reporter: Jeremy Hanna >Priority: Normal > > Currently allow filtering can mean two things in the spirit of "avoid > operations that don't seek to a specific row or sequential rows of data." > First, it can mean scanning across the entire table to meet the criteria of > the query. That's almost always a bad thing and should be discouraged or > disabled (see CASSANDRA-8303). Second, it can mean filtering within a > specific partition. For example, in a query you could specify the full > partition key and if you specify a criterion on a non-key field, it requires > allow filtering. > The second reason to require allow filtering is significantly less work to > scan through a partition. It is still extra work over seeking to a specific > row and getting N sequential rows though. So while an application developer > and/or operator needs to be cautious about this second type, it's not > necessarily a bad thing, depending on the table and the use case. > I propose that we separate the way to specify allow filtering across an > entire table from specifying allow filtering across a partition in a > backwards compatible way. One idea that was brought up in Slack in the > cassandra-dev room was to have allow filtering mean the superset - scanning > across the table. Then if you want to specify that you *only* want to scan > within a partition you would use something like > {{ALLOW FILTERING [WITHIN PARTITION]}} > So it will succeed if you specify non-key criteria within a single partition, > but fail with a message to say it requires the full allow filtering. This > would allow for a backwards compatible full allow filtering while allowing a > user to specify that they want to just scan within a partition, but error out > if trying to scan a full table. > This is potentially also related to the capability limitation framework by > which operators could more granularly specify what features are allowed or > disallowed per user, discussed in CASSANDRA-8303. This way an operator could > disallow the more general allow filtering while allowing the partition scan > (or disallow them both at their discretion). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17707) Clarify intent when replaying hint files "partially"
[ https://issues.apache.org/jira/browse/CASSANDRA-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-17707: - Description: As part of CASSANDRA-6230, hints were redesigned to come from files. As part of this, we log when the hint files are dispatched. See https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java#L318 {code} logger.info("Finished hinted handoff of file {} to endpoint {}: {}, partially", descriptor.fileName(), address, hostId); {code} This has caused some confusion among some users who wonder whether their files were only partially replayed and whether data is consistent. This ticket is to clarify in the log statement itself or document in the official docs what is meant by {{partially}}. My understanding is that it's really that sometimes when shutting down, all of the file metadata isn't written so it replays the file anyway. Is that right? I wasn't sure about the dispatch failure and what that means in practice. CC [~aleksey] was: As part of CASSANDRA-6230, hints were redesigned to come from files. As part of this, we log when the hint files are dispatched. See https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java#L318 {code} logger.info("Finished hinted handoff of file {} to endpoint {}: {}, partially", descriptor.fileName(), address, hostId); {code} This has caused some confusion among some users who wonder whether their files were only partially replayed and whether data is consistent. This ticket is to clarify in the log statement itself or document in the official docs what is meant by `partially`. My understanding is that it's really that sometimes when shutting down, all of the file metadata isn't written so it replays the file anyway. Is that right? I wasn't sure about the dispatch failure and what that means in practice. CC [~aleksey] > Clarify intent when replaying hint files "partially" > > > Key: CASSANDRA-17707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17707 > Project: Cassandra > Issue Type: Task >Reporter: Jeremy Hanna >Priority: Normal > > As part of CASSANDRA-6230, hints were redesigned to come from files. As part > of this, we log when the hint files are dispatched. > See > https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java#L318 > {code} > logger.info("Finished hinted handoff of file {} to endpoint {}: {}, > partially", descriptor.fileName(), address, hostId); > {code} > This has caused some confusion among some users who wonder whether their > files were only partially replayed and whether data is consistent. > This ticket is to clarify in the log statement itself or document in the > official docs what is meant by {{partially}}. > My understanding is that it's really that sometimes when shutting down, all > of the file metadata isn't written so it replays the file anyway. Is that > right? I wasn't sure about the dispatch failure and what that means in > practice. > CC [~aleksey] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-17707) Clarify intent when replaying hint files "partially"
Jeremy Hanna created CASSANDRA-17707: Summary: Clarify intent when replaying hint files "partially" Key: CASSANDRA-17707 URL: https://issues.apache.org/jira/browse/CASSANDRA-17707 Project: Cassandra Issue Type: Task Reporter: Jeremy Hanna As part of CASSANDRA-6230, hints were redesigned to come from files. As part of this, we log when the hint files are dispatched. See https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/hints/HintsDispatchExecutor.java#L318 {code} logger.info("Finished hinted handoff of file {} to endpoint {}: {}, partially", descriptor.fileName(), address, hostId); {code} This has caused some confusion among some users who wonder whether their files were only partially replayed and whether data is consistent. This ticket is to clarify in the log statement itself or document in the official docs what is meant by `partially`. My understanding is that it's really that sometimes when shutting down, all of the file metadata isn't written so it replays the file anyway. Is that right? I wasn't sure about the dispatch failure and what that means in practice. CC [~aleksey] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9753) LOCAL_QUORUM reads can block cross-DC if there is a digest mismatch
[ https://issues.apache.org/jira/browse/CASSANDRA-9753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412304#comment-17412304 ] Jeremy Hanna commented on CASSANDRA-9753: - Is it fair to say that temporarily disabling (dc_local_)read_repair_chance and speculative retry while adding a new data center will mean that all LOCAL_* consistency level based queries will stay in the origin data center? > LOCAL_QUORUM reads can block cross-DC if there is a digest mismatch > --- > > Key: CASSANDRA-9753 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9753 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: Richard Low >Priority: Normal > > When there is a digest mismatch during the initial read, a data read request > is sent to all replicas involved in the initial read. This can be more than > the initial blockFor if read repair was done and if speculative retry kicked > in. E.g. for RF 3 in two DCs, the number of reads could be 4: 2 for > LOCAL_QUORUM, 1 for read repair and 1 for speculative read if one replica was > slow. If there is then a digest mismatch, Cassandra will issue the data read > to all 4 and set blockFor=4. Now the read query is blocked on cross-DC > latency. The digest mismatch read blockFor should be capped at RF for the > local DC when using CL.LOCAL_*. > You can reproduce this behaviour by creating a keyspace with > NetworkTopologyStrategy, RF 3 per DC, dc_local_read_repair=1.0 and ALWAYS for > speculative read. If you force a digest mismatch (e.g. by deleting a replicas > SSTables and restarting) you can see in tracing that it is blocking for 4 > responses. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16730) Describe audit log categories in documentation
[ https://issues.apache.org/jira/browse/CASSANDRA-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-16730: - Resolution: (was: Fixed) Status: Open (was: Resolved) > Describe audit log categories in documentation > -- > > Key: CASSANDRA-16730 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16730 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: Jeremy Hanna >Priority: Normal > > With CASSANDRA-12151 we have a nice audit log functionality for the database > and it's [described in the > docs|https://cassandra.apache.org/doc/latest/operating/audit_logging.html] > with the associated options. One thing that's missing is a description of > the categories that can be enabled and disabled. The categories are found in > the code > [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/audit/AuditLogEntryCategory.java#L26]: > {{QUERY, DML, DDL, DCL, OTHER, AUTH, ERROR, PREPARE}} > So it would just be good to have those and a brief description in the docs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16730) Describe audit log categories in documentation
[ https://issues.apache.org/jira/browse/CASSANDRA-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-16730: - Resolution: Not A Problem Status: Resolved (was: Open) > Describe audit log categories in documentation > -- > > Key: CASSANDRA-16730 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16730 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: Jeremy Hanna >Priority: Normal > > With CASSANDRA-12151 we have a nice audit log functionality for the database > and it's [described in the > docs|https://cassandra.apache.org/doc/latest/operating/audit_logging.html] > with the associated options. One thing that's missing is a description of > the categories that can be enabled and disabled. The categories are found in > the code > [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/audit/AuditLogEntryCategory.java#L26]: > {{QUERY, DML, DDL, DCL, OTHER, AUTH, ERROR, PREPARE}} > So it would just be good to have those and a brief description in the docs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16730) Describe audit log categories in documentation
[ https://issues.apache.org/jira/browse/CASSANDRA-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-16730: - Resolution: Fixed Status: Resolved (was: Open) Will be unified in the updated docs with the more comprehensive explanation from the What's New in C* 4 section. > Describe audit log categories in documentation > -- > > Key: CASSANDRA-16730 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16730 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: Jeremy Hanna >Priority: Normal > > With CASSANDRA-12151 we have a nice audit log functionality for the database > and it's [described in the > docs|https://cassandra.apache.org/doc/latest/operating/audit_logging.html] > with the associated options. One thing that's missing is a description of > the categories that can be enabled and disabled. The categories are found in > the code > [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/audit/AuditLogEntryCategory.java#L26]: > {{QUERY, DML, DDL, DCL, OTHER, AUTH, ERROR, PREPARE}} > So it would just be good to have those and a brief description in the docs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16730) Describe audit log categories in documentation
[ https://issues.apache.org/jira/browse/CASSANDRA-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362672#comment-17362672 ] Jeremy Hanna commented on CASSANDRA-16730: -- Ah - I didn't see the other section. I'm glad we're putting them together to have a more comprehensive page. Thanks Ekaterina! > Describe audit log categories in documentation > -- > > Key: CASSANDRA-16730 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16730 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: Jeremy Hanna >Priority: Normal > > With CASSANDRA-12151 we have a nice audit log functionality for the database > and it's [described in the > docs|https://cassandra.apache.org/doc/latest/operating/audit_logging.html] > with the associated options. One thing that's missing is a description of > the categories that can be enabled and disabled. The categories are found in > the code > [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/audit/AuditLogEntryCategory.java#L26]: > {{QUERY, DML, DDL, DCL, OTHER, AUTH, ERROR, PREPARE}} > So it would just be good to have those and a brief description in the docs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16730) Describe audit log categories in documentation
[ https://issues.apache.org/jira/browse/CASSANDRA-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-16730: - Description: With CASSANDRA-12151 we have a nice audit log functionality for the database and it's [described in the docs|https://cassandra.apache.org/doc/latest/operating/audit_logging.html] with the associated options. One thing that's missing is a description of the categories that can be enabled and disabled. The categories are found in the code [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/audit/AuditLogEntryCategory.java#L26]: {{QUERY, DML, DDL, DCL, OTHER, AUTH, ERROR, PREPARE}} So it would just be good to have those and a brief description in the docs. was: With CASSANDRA-12151 we have a nice audit log functionality for the database and it's described in the docs with the associated options. One thing that's missing is a description of the categories that can be enabled and disabled. The categories are found in the code [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/audit/AuditLogEntryCategory.java#L26]: {{QUERY, DML, DDL, DCL, OTHER, AUTH, ERROR, PREPARE}} So it would just be good to have those and a brief description in the docs. > Describe audit log categories in documentation > -- > > Key: CASSANDRA-16730 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16730 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: Jeremy Hanna >Priority: Normal > > With CASSANDRA-12151 we have a nice audit log functionality for the database > and it's [described in the > docs|https://cassandra.apache.org/doc/latest/operating/audit_logging.html] > with the associated options. One thing that's missing is a description of > the categories that can be enabled and disabled. The categories are found in > the code > [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/audit/AuditLogEntryCategory.java#L26]: > {{QUERY, DML, DDL, DCL, OTHER, AUTH, ERROR, PREPARE}} > So it would just be good to have those and a brief description in the docs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16730) Describe audit log categories in documentation
[ https://issues.apache.org/jira/browse/CASSANDRA-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-16730: - Complexity: Low Hanging Fruit > Describe audit log categories in documentation > -- > > Key: CASSANDRA-16730 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16730 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: Jeremy Hanna >Priority: Normal > > With CASSANDRA-12151 we have a nice audit log functionality for the database > and it's described in the docs with the associated options. One thing that's > missing is a description of the categories that can be enabled and disabled. > The categories are found in the code > [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/audit/AuditLogEntryCategory.java#L26]: > {{QUERY, DML, DDL, DCL, OTHER, AUTH, ERROR, PREPARE}} > So it would just be good to have those and a brief description in the docs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-16730) Describe audit log categories in documentation
Jeremy Hanna created CASSANDRA-16730: Summary: Describe audit log categories in documentation Key: CASSANDRA-16730 URL: https://issues.apache.org/jira/browse/CASSANDRA-16730 Project: Cassandra Issue Type: Improvement Components: Documentation/Website Reporter: Jeremy Hanna With CASSANDRA-12151 we have a nice audit log functionality for the database and it's described in the docs with the associated options. One thing that's missing is a description of the categories that can be enabled and disabled. The categories are found in the code [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/audit/AuditLogEntryCategory.java#L26]: {{QUERY, DML, DDL, DCL, OTHER, AUTH, ERROR, PREPARE}} So it would just be good to have those and a brief description in the docs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16391) Migrate use of maven-ant-tasks to resolver-ant-tasks
[ https://issues.apache.org/jira/browse/CASSANDRA-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-16391: - Complexity: Low Hanging Fruit (was: Normal) > Migrate use of maven-ant-tasks to resolver-ant-tasks > > > Key: CASSANDRA-16391 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16391 > Project: Cassandra > Issue Type: Task > Components: Build, Dependencies >Reporter: Michael Semb Wever >Priority: High > Labels: gsoc2021, lhf, mentor > > Cassandra resolves dependencies and generates maven pom files through the use > of [maven-ant-tasks|http://maven.apache.org/ant-tasks/]. This is no longer a > supported project. > The recommended upgrade is to > [resolver-ant-tasks|http://maven.apache.org/resolver-ant-tasks/]. It follows > similar APIs so shouldn't be too impactful a change. > The existing maven-ant-tasks has caused [some headaches > already|https://issues.apache.org/jira/browse/CASSANDRA-16359] with internal > super poms referencing insecure http:// central maven repository URLs that > are no longer supported. > We should also take the opportunity to > - define the "test" scope (classpath) for those dependencies only used for > tests (currently we are packaging test dependencies into the release binary > artefact), > - remove the jar files stored in the git repo under the "lib/" folder. > These two above points have to happen in tandem, as the jar files under > {{lib/}} are those that get bundled into the {{build/dist/lib/}} and hence > the binary artefact. That is, all jar files under {{lib/}} are the project's > "compile" scope, and all other dependencies defined in build.xml are either > "provided" or "test" scope. These different scopes for dependencies are > currently configured in different maven-ant-tasks poms. See > https://github.com/apache/cassandra/commit/d43b9ce5092f8879a1a66afebab74d86e9e127fb#r45659668 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16429) cqlsh garbles column names with Japanese characters
[ https://issues.apache.org/jira/browse/CASSANDRA-16429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280703#comment-17280703 ] Jeremy Hanna edited comment on CASSANDRA-16429 at 2/8/21, 12:38 AM: Could this be related to the new code that exposes table schema directly to the drivers? CASSANDRA-14825 was (Author: jeromatron): Could this be related to the new code that exposes table schema directly to the drivers? > cqlsh garbles column names with Japanese characters > --- > > Key: CASSANDRA-16429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16429 > Project: Cassandra > Issue Type: Bug >Reporter: Yoshi Kimoto >Priority: Normal > Attachments: jptest.cql > > > Tables created with Japanese character name columns are working well in C* > 3.11.10 when doing a SELECT * in cqlsh but will show as garbled (shown as > "?") in 4.0-beta4. DESCRIBE shows the column names correctly in both cases. > Run the attached jptest.cql script in both envs with cqlsh -f. They will > yield different results. > My test env (MacOS 10.15.7): > C* 3.11.10 with > - OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_252-b09) > - Python 2.7.16 > C* 4.0-beta4 > - OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.9.1+1) > - Python 3.8.2 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16429) cqlsh garbles column names with Japanese characters
[ https://issues.apache.org/jira/browse/CASSANDRA-16429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280703#comment-17280703 ] Jeremy Hanna commented on CASSANDRA-16429: -- Could this be related to the new code that exposes table schema directly to the drivers? > cqlsh garbles column names with Japanese characters > --- > > Key: CASSANDRA-16429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16429 > Project: Cassandra > Issue Type: Bug >Reporter: Yoshi Kimoto >Priority: Normal > Attachments: jptest.cql > > > Tables created with Japanese character name columns are working well in C* > 3.11.10 when doing a SELECT * in cqlsh but will show as garbled (shown as > "?") in 4.0-beta4. DESCRIBE shows the column names correctly in both cases. > Run the attached jptest.cql script in both envs with cqlsh -f. They will > yield different results. > My test env (MacOS 10.15.7): > C* 3.11.10 with > - OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_252-b09) > - Python 2.7.16 > C* 4.0-beta4 > - OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.9.1+1) > - Python 3.8.2 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16315) Remove bad advice on concurrent compactors from cassandra.yaml
[ https://issues.apache.org/jira/browse/CASSANDRA-16315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-16315: - Description: Since CASSANDRA-7551, we gave the following advice for setting {{concurrent_compactors}}: {code} # If your data directories are backed by SSD, you should increase this # to the number of cores. {code} However in practice there are a number of problems with this. While it's true that one can increase {{concurrent_compactors}} to improve efficiency of compactions on machines with more cpu cores, the context switching with random IO and GC associated with bringing compaction data into the heap will work against the additional parallelism. This has caused problems for those who have taken this advice literally. I propose that we adjust this language to give a limit on number of {{concurrent_compactors}} for this setting both in the 3.x line and in trunk so that new users do not stumble when reviewing whether to change defaults. See also CASSANDRA-7139 for a discussion on considerations. I see two short-term options to avoid new user pain: 1. Change the language to say something like this: {quote} When using SSD based storage, you can increase the number of {{concurrent_compactors}}. However be aware that using too many concurrent compactors can have a detrimental effect such as GC pressure, more context switching among compactors and realtime operations, and more random IO pulling data for different compactions. It's best to test and measure with your workload and hardware. {quote} 2. Do some significant testing of compaction efficient and read/write latency/throughput targets to see where the tipping point is - considering some constants around memory and heap size and configuration to keep it simple. was: Since CASSANDRA-7551, we gave the following advice for setting concurrent_compactors: {code} # If your data directories are backed by SSD, you should increase this # to the number of cores. {code} However in practice there are a number of problems with this. While it's true that one can increase {{concurrent_compactors}} to improve efficiency of compactions on machines with more cpu cores, the context switching with random IO and GC associated with bringing compaction data into the heap will work against the additional parallelism. This has caused problems for those who have taken this advice literally. I propose that we adjust this language to give a limit on number of {{concurrent_compactors}} for this setting both in the 3.x line and in trunk so that new users do not stumble when reviewing whether to change defaults. See also CASSANDRA-7139 for a discussion on considerations. I see two short-term options to avoid new user pain: 1. Change the language to say something like this: {quote} When using SSD based storage, you can increase the number of {{concurrent_compactors}}. However be aware that using too many concurrent compactors can have a detrimental effect such as GC pressure, more context switching among compactors and realtime operations, and more random IO pulling data for different compactions. It's best to test and measure with your workload and hardware. {quote} 2. Do some significant testing of compaction efficient and read/write latency/throughput targets to see where the tipping point is - considering some constants around memory and heap size and configuration to keep it simple. > Remove bad advice on concurrent compactors from cassandra.yaml > -- > > Key: CASSANDRA-16315 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16315 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Jeremy Hanna >Priority: Normal > > Since CASSANDRA-7551, we gave the following advice for setting > {{concurrent_compactors}}: > {code} > # If your data directories are backed by SSD, you should increase this > # to the number of cores. > {code} > However in practice there are a number of problems with this. While it's > true that one can increase {{concurrent_compactors}} to improve efficiency of > compactions on machines with more cpu cores, the context switching with > random IO and GC associated with bringing compaction data into the heap will > work against the additional parallelism. > This has caused problems for those who have taken this advice literally. > I propose that we adjust this language to give a limit on number of > {{concurrent_compactors}} for this setting both in the 3.x line and in trunk > so that new users do not stumble when reviewing whether to change defaults. > See also CASSANDRA-7139 for a discussion on considerations. > I see two short-term options to avoid new user pain: > 1. Change the language to say something like
[jira] [Updated] (CASSANDRA-16315) Remove bad advice on concurrent compactors from cassandra.yaml
[ https://issues.apache.org/jira/browse/CASSANDRA-16315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-16315: - Description: Since CASSANDRA-7551, we gave the following advice for setting concurrent_compactors: {code} # If your data directories are backed by SSD, you should increase this # to the number of cores. {code} However in practice there are a number of problems with this. While it's true that one can increase {{concurrent_compactors}} to improve efficiency of compactions on machines with more cpu cores, the context switching with random IO and GC associated with bringing compaction data into the heap will work against the additional parallelism. This has caused problems for those who have taken this advice literally. I propose that we adjust this language to give a limit on number of {{concurrent_compactors}} for this setting both in the 3.x line and in trunk so that new users do not stumble when reviewing whether to change defaults. See also CASSANDRA-7139 for a discussion on considerations. I see two short-term options to avoid new user pain: 1. Change the language to say something like this: {quote} When using fast SSD, you can increase the number of {{concurrent_compactors}}. However be aware that using too many concurrent compactors can have a detrimental effect such as GC pressure, more context switching among compactors and realtime operations, and more random IO pulling data for different compactions. It's best to test and measure with your workload and hardware. {quote} 2. Do some significant testing of compaction efficient and read/write latency/throughput targets to see where the tipping point is - considering some constants around memory and heap size and configuration to keep it simple. was: Since CASSANDRA-7551, we gave the following advice for setting concurrent_compactors: {code} # If your data directories are backed by SSD, you should increase this # to the number of cores. {code} However in practice there are a number of problems with this. While it's true that one can increase concurrent_compactors to improve efficiency of compactions on machines with more cpu cores, the context switching with random IO and GC associated with bringing compaction data into the heap will work against the additional parallelism. This has caused problems for those who have taken this advice literally. I propose that we adjust this language to give a limit on number of concurrent_compactors for this setting both in the 3.x line and in trunk so that new users do not stumble when reviewing whether to change defaults. See also CASSANDRA-7139 for a discussion on considerations. I see two short-term options to avoid new user pain: 1. Change the language to say something like this: {quote} When using fast SSD, you can increase the number of {{concurrent_compactors}}. However be aware that using too many concurrent compactors can have a detrimental effect such as GC pressure, more context switching among compactors and realtime operations, and more random IO pulling data for different compactions. It's best to test and measure with your workload and hardware. {quote} 2. Do some significant testing of compaction efficient and read/write latency/throughput targets to see where the tipping point is - considering some constants around memory and heap size and configuration to keep it simple. > Remove bad advice on concurrent compactors from cassandra.yaml > -- > > Key: CASSANDRA-16315 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16315 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Jeremy Hanna >Priority: Normal > > Since CASSANDRA-7551, we gave the following advice for setting > concurrent_compactors: > {code} > # If your data directories are backed by SSD, you should increase this > # to the number of cores. > {code} > However in practice there are a number of problems with this. While it's > true that one can increase {{concurrent_compactors}} to improve efficiency of > compactions on machines with more cpu cores, the context switching with > random IO and GC associated with bringing compaction data into the heap will > work against the additional parallelism. > This has caused problems for those who have taken this advice literally. > I propose that we adjust this language to give a limit on number of > {{concurrent_compactors}} for this setting both in the 3.x line and in trunk > so that new users do not stumble when reviewing whether to change defaults. > See also CASSANDRA-7139 for a discussion on considerations. > I see two short-term options to avoid new user pain: > 1. Change the language to say something like this: > {quote} > When using fast SSD,
[jira] [Updated] (CASSANDRA-16315) Remove bad advice on concurrent compactors from cassandra.yaml
[ https://issues.apache.org/jira/browse/CASSANDRA-16315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-16315: - Description: Since CASSANDRA-7551, we gave the following advice for setting concurrent_compactors: {code} # If your data directories are backed by SSD, you should increase this # to the number of cores. {code} However in practice there are a number of problems with this. While it's true that one can increase {{concurrent_compactors}} to improve efficiency of compactions on machines with more cpu cores, the context switching with random IO and GC associated with bringing compaction data into the heap will work against the additional parallelism. This has caused problems for those who have taken this advice literally. I propose that we adjust this language to give a limit on number of {{concurrent_compactors}} for this setting both in the 3.x line and in trunk so that new users do not stumble when reviewing whether to change defaults. See also CASSANDRA-7139 for a discussion on considerations. I see two short-term options to avoid new user pain: 1. Change the language to say something like this: {quote} When using SSD based storage, you can increase the number of {{concurrent_compactors}}. However be aware that using too many concurrent compactors can have a detrimental effect such as GC pressure, more context switching among compactors and realtime operations, and more random IO pulling data for different compactions. It's best to test and measure with your workload and hardware. {quote} 2. Do some significant testing of compaction efficient and read/write latency/throughput targets to see where the tipping point is - considering some constants around memory and heap size and configuration to keep it simple. was: Since CASSANDRA-7551, we gave the following advice for setting concurrent_compactors: {code} # If your data directories are backed by SSD, you should increase this # to the number of cores. {code} However in practice there are a number of problems with this. While it's true that one can increase {{concurrent_compactors}} to improve efficiency of compactions on machines with more cpu cores, the context switching with random IO and GC associated with bringing compaction data into the heap will work against the additional parallelism. This has caused problems for those who have taken this advice literally. I propose that we adjust this language to give a limit on number of {{concurrent_compactors}} for this setting both in the 3.x line and in trunk so that new users do not stumble when reviewing whether to change defaults. See also CASSANDRA-7139 for a discussion on considerations. I see two short-term options to avoid new user pain: 1. Change the language to say something like this: {quote} When using fast SSD, you can increase the number of {{concurrent_compactors}}. However be aware that using too many concurrent compactors can have a detrimental effect such as GC pressure, more context switching among compactors and realtime operations, and more random IO pulling data for different compactions. It's best to test and measure with your workload and hardware. {quote} 2. Do some significant testing of compaction efficient and read/write latency/throughput targets to see where the tipping point is - considering some constants around memory and heap size and configuration to keep it simple. > Remove bad advice on concurrent compactors from cassandra.yaml > -- > > Key: CASSANDRA-16315 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16315 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Jeremy Hanna >Priority: Normal > > Since CASSANDRA-7551, we gave the following advice for setting > concurrent_compactors: > {code} > # If your data directories are backed by SSD, you should increase this > # to the number of cores. > {code} > However in practice there are a number of problems with this. While it's > true that one can increase {{concurrent_compactors}} to improve efficiency of > compactions on machines with more cpu cores, the context switching with > random IO and GC associated with bringing compaction data into the heap will > work against the additional parallelism. > This has caused problems for those who have taken this advice literally. > I propose that we adjust this language to give a limit on number of > {{concurrent_compactors}} for this setting both in the 3.x line and in trunk > so that new users do not stumble when reviewing whether to change defaults. > See also CASSANDRA-7139 for a discussion on considerations. > I see two short-term options to avoid new user pain: > 1. Change the language to say something like this: > {quote} >
[jira] [Created] (CASSANDRA-16315) Remove bad advice on concurrent compactors from cassandra.yaml
Jeremy Hanna created CASSANDRA-16315: Summary: Remove bad advice on concurrent compactors from cassandra.yaml Key: CASSANDRA-16315 URL: https://issues.apache.org/jira/browse/CASSANDRA-16315 Project: Cassandra Issue Type: Improvement Components: Local/Config Reporter: Jeremy Hanna Since CASSANDRA-7551, we gave the following advice for setting concurrent_compactors: {code} # If your data directories are backed by SSD, you should increase this # to the number of cores. {code} However in practice there are a number of problems with this. While it's true that one can increase concurrent_compactors to improve efficiency of compactions on machines with more cpu cores, the context switching with random IO and GC associated with bringing compaction data into the heap will work against the additional parallelism. This has caused problems for those who have taken this advice literally. I propose that we adjust this language to give a limit on number of concurrent_compactors for this setting both in the 3.x line and in trunk so that new users do not stumble when reviewing whether to change defaults. See also CASSANDRA-7139 for a discussion on considerations. I see two short-term options to avoid new user pain: 1. Change the language to say something like this: {quote} When using fast SSD, you can increase the number of {{concurrent_compactors}}. However be aware that using too many concurrent compactors can have a detrimental effect such as GC pressure, more context switching among compactors and realtime operations, and more random IO pulling data for different compactions. It's best to test and measure with your workload and hardware. {quote} 2. Do some significant testing of compaction efficient and read/write latency/throughput targets to see where the tipping point is - considering some constants around memory and heap size and configuration to keep it simple. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16205) Offline token allocation strategy generator tool
[ https://issues.apache.org/jira/browse/CASSANDRA-16205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222696#comment-17222696 ] Jeremy Hanna commented on CASSANDRA-16205: -- I think we still want to run the algorithm in the dtests if possible, at least the ones that have to do with cluster membership and consistency like bootstrap, replace, decommission, and tests involving range movements in general. Could we run with the new algorithm at least for those tests? Is the thought to use the algorithm to do that and then for the other tests use this script to pre-allocate the tokens? > Offline token allocation strategy generator tool > > > Key: CASSANDRA-16205 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16205 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config, Local/Scripts >Reporter: Michael Semb Wever >Assignee: Michael Semb Wever >Priority: Normal > > A command line tool to generate tokens (using the > allocate_tokens_for_local_replication_factor algorithm) for pre-configuration > of {{initial_tokens}} in cassandra.yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-13701: - Fix Version/s: (was: 4.0-triage) > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Alexander Dejanovski >Priority: Low > Fix For: 4.0-alpha > > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-16079) Improve dtest runtime
[ https://issues.apache.org/jira/browse/CASSANDRA-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197300#comment-17197300 ] Jeremy Hanna edited comment on CASSANDRA-16079 at 9/17/20, 12:22 AM: - Is it possible to implement some sort of "reset" operation in CCM so that it drops all non-system keyspaces so that the clusters that don't explicitly test cluster membership operations can just be reused as has been said? We could disable snapshotting on them as well so they wouldn't build up state over time too. In other words, it sounds like if we made the time for starting single node clusters essentially instant, that's 171 * single node startup time that we've reduced for the overall dtests. was (Author: jeromatron): Is it possible to implement some sort of "reset" operation in CCM so that it drops all non-system keyspaces so that the clusters that don't explicitly test cluster membership operations can just be reused as has been said? We could disable snapshotting on them as well so they wouldn't build up state over time too. > Improve dtest runtime > - > > Key: CASSANDRA-16079 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16079 > Project: Cassandra > Issue Type: Improvement > Components: CI >Reporter: Adam Holmberg >Priority: Normal > Fix For: 4.0-beta > > > A recent ticket, CASSANDRA-13701, changed the way dtests run, resulting in a > [30% increase in run > time|https://www.mail-archive.com/dev@cassandra.apache.org/msg15606.html]. > While that change was accepted, we wanted to spin out a ticket to optimize > dtests in an attempt to gain back some of that runtime. > At this time we don't have concrete improvements in mind, so the first order > of this ticket will be to analyze the state of things currently, and try to > ascertain some valuable optimizations. Once the problems are understood, we > will break down subtasks to divide the work. > Some areas to consider: > * cluster reuse > * C* startup optimizations > * Tests that should be ported to in-JVM dtest or even unit tests -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16079) Improve dtest runtime
[ https://issues.apache.org/jira/browse/CASSANDRA-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197300#comment-17197300 ] Jeremy Hanna commented on CASSANDRA-16079: -- Is it possible to implement some sort of "reset" operation in CCM so that it drops all non-system keyspaces so that the clusters that don't explicitly test cluster membership operations can just be reused as has been said? We could disable snapshotting on them as well so they wouldn't build up state over time too. > Improve dtest runtime > - > > Key: CASSANDRA-16079 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16079 > Project: Cassandra > Issue Type: Improvement > Components: CI >Reporter: Adam Holmberg >Priority: Normal > Fix For: 4.0-beta > > > A recent ticket, CASSANDRA-13701, changed the way dtests run, resulting in a > [30% increase in run > time|https://www.mail-archive.com/dev@cassandra.apache.org/msg15606.html]. > While that change was accepted, we wanted to spin out a ticket to optimize > dtests in an attempt to gain back some of that runtime. > At this time we don't have concrete improvements in mind, so the first order > of this ticket will be to analyze the state of things currently, and try to > ascertain some valuable optimizations. Once the problems are understood, we > will break down subtasks to divide the work. > Some areas to consider: > * cluster reuse > * C* startup optimizations > * Tests that should be ported to in-JVM dtest or even unit tests -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-8720) Provide tools for finding wide row/partition keys
[ https://issues.apache.org/jira/browse/CASSANDRA-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195146#comment-17195146 ] Jeremy Hanna edited comment on CASSANDRA-8720 at 9/14/20, 1:50 AM: --- We've had this in DataStax Enterprise's version of Cassandra for a couple of years now. Any chance we could just port that over to Cassandra at this point? It's an offline tool called sstablepartitions that gets a variety of information about partitions in an sstable or directory of sstables using the methodology discussed in this ticket. See https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/tools/toolsSStables/toolsSSTablepartitions.html. [~snazy] do you think it's a straightforward port at this point? was (Author: jeromatron): We've had this in DataStax Enterprise's version of Cassandra for a couple of years now. Any chance we could just port that over to Cassandra at this point? It's an offline tool called sstablepartitions that gets a variety of information about partitions in an sstable or directory sstables using the methodology discussed in this ticket. See https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/tools/toolsSStables/toolsSSTablepartitions.html. [~snazy] do you think it's a straightforward port at this point? > Provide tools for finding wide row/partition keys > - > > Key: CASSANDRA-8720 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8720 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Tools >Reporter: J.B. Langston >Priority: Normal > Fix For: 2.1.x, 2.2.x > > Attachments: 8720.txt > > > Multiple users have requested some sort of tool to help identify wide row > keys. They get into a situation where they know a wide row/partition has been > inserted and it's causing problems for them but they have no idea what the > row key is in order to remove it. > Maintaining the widest row key currently encountered and displaying it in > cfstats would be one possible approach. > Another would be an offline tool (possibly an enhancement to sstablekeys) to > show the number of columns/bytes per key in each sstable. If a tool to > aggregate the information at a CF-level could be provided that would be a > bonus, but it shouldn't be too hard to write a script wrapper to aggregate > them if not. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-8720) Provide tools for finding wide row/partition keys
[ https://issues.apache.org/jira/browse/CASSANDRA-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195146#comment-17195146 ] Jeremy Hanna commented on CASSANDRA-8720: - We've had this in DataStax Enterprise's version of Cassandra for a couple of years now. Any chance we could just port that over to Cassandra at this point? It's an offline tool called sstablepartitions that gets a variety of information about partitions in an sstable or directory sstables using the methodology discussed in this ticket. See https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/tools/toolsSStables/toolsSSTablepartitions.html. [~snazy] do you think it's a straightforward port at this point? > Provide tools for finding wide row/partition keys > - > > Key: CASSANDRA-8720 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8720 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Tools >Reporter: J.B. Langston >Priority: Normal > Fix For: 2.1.x, 2.2.x > > Attachments: 8720.txt > > > Multiple users have requested some sort of tool to help identify wide row > keys. They get into a situation where they know a wide row/partition has been > inserted and it's causing problems for them but they have no idea what the > row key is in order to remove it. > Maintaining the widest row key currently encountered and displaying it in > cfstats would be one possible approach. > Another would be an offline tool (possibly an enhancement to sstablekeys) to > show the number of columns/bytes per key in each sstable. If a tool to > aggregate the information at a CF-level could be provided that would be a > bonus, but it shouldn't be too hard to write a script wrapper to aggregate > them if not. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171556#comment-17171556 ] Jeremy Hanna edited comment on CASSANDRA-13701 at 8/5/20, 3:29 PM: --- I started down this road but don't think I can get through fixing all of the dtests. On this line in bootstrap-test.py I changed the time.sleep to 10 seconds and it appears to solve that problem - https://github.com/apache/cassandra-dtest/blob/master/bootstrap_test.py#L485 However there were many tests with replace_address that I'm not sure about. I don't know how or why replace address would be affected by the new token allocation algorithm. Dimitar said something about parallel bootstrap but I don't see that - sometimes no_wait or wait_other_notice is true or false so I thought it was that, but perhaps someone more familiar with ccm could see. I'm sorry - I really want this to get in for the release but I don't have the time to dedicate to learning dtest at a deeper level to fix all of these in time. was (Author: jeromatron): I started down this road but don't think I can get through fixing all of the dtests. On this line in bootstrap-test.py I changed the time.sleep to 10 seconds and it appears to solve that problem - https://github.com/apache/cassandra-dtest/blob/master/bootstrap_test.py#L485 However there were many tests with replace_address that I'm not sure about. I don't know how or why replace address would be affected by the new token allocation algorithm. Dmitri said something about parallel bootstrap but I don't see that - sometimes no_wait or wait_other_notice is true or false so I thought it was that, but perhaps someone more familiar with ccm could see. I'm sorry - I really want this to get in for the release but I don't have the time to dedicate to learning dtest at a deeper level to fix all of these in time. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Priority: Low > Fix For: 4.0-alpha > > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171556#comment-17171556 ] Jeremy Hanna commented on CASSANDRA-13701: -- I started down this road but don't think I can get through fixing all of the dtests. On this line in bootstrap-test.py I changed the time.sleep to 10 seconds and it appears to solve that problem - https://github.com/apache/cassandra-dtest/blob/master/bootstrap_test.py#L485 However there were many tests with replace_address that I'm not sure about. I don't know how or why replace address would be affected by the new token allocation algorithm. Dmitri said something about parallel bootstrap but I don't see that - sometimes no_wait or wait_other_notice is true or false so I thought it was that, but perhaps someone more familiar with ccm could see. I'm sorry - I really want this to get in for the release but I don't have the time to dedicate to learning dtest at a deeper level to fix all of these in time. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Priority: Low > Fix For: 4.0-alpha > > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna reassigned CASSANDRA-13701: Assignee: (was: Jeremy Hanna) > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Priority: Low > Fix For: 4.0-alpha > > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15961) Reference CASSANDRA-12607
[ https://issues.apache.org/jira/browse/CASSANDRA-15961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15961: - Resolution: Duplicate Status: Resolved (was: Triage Needed) > Reference CASSANDRA-12607 > - > > Key: CASSANDRA-15961 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15961 > Project: Cassandra > Issue Type: Bug >Reporter: Kapil Shewate >Assignee: Mattias W >Priority: Normal > > In cassandra 3.11.0 , the issue of commit logs being corrupted is still > observed. Will this be fixed in higher versions of Cassandra? > > 02 19:58:33,677 JVMStabilityInspector.java:82 - Exiting due to error while > processing commit log during > initialization.org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: > Mutation checksum failure at 191598541 in Next section at 191590263 in > CommitLog-6-1592895482005.log at > org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:344) > [apache-cassandra-3.11.0.jar:3.11.0] at > org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:201) > [apache-cassandra-3.11.0.jar:3.11.0] at > org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:84) > [apache-cassandra-3.11.0.jar:3.11.0] at > org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:140) > [apache-cassandra-3.11.0.jar:3.11.0] at > org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:177) > [apache-cassandra-3.11.0.jar:3.11.0] at > org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:158) > [apache-cassandra-3.11.0.jar:3.11.0] at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:325) > [apache-cassandra-3.11.0.jar:3.11.0] at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600) > [apache-cassandra-3.11.0.jar:3.11.0] at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) > [apache-cassandra-3.11.0.jar:3.11.0][WARN] [main] 2020-07-02 20:31:30,334 > DatabaseDescriptor.java:540 - Only 2.339GiB free across all data volumes. > Consider adding more capacity to your cluster or removing obsolete > snapshots[WARN] [main] 2020-07-02 20:31:30,763 NativeLibrary.java:187 - > Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being > swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or > run Cassandra as root.[WARN] [main] 2020-07-02 20:31:30,764 > StartupChecks.java:127 - jemalloc shared library could not be preloaded to > speed up memory allocations[WARN] [main] 2020-07-02 20:31:30,764 > StartupChecks.java:201 - Non-Oracle JVM detected. Some features, such as > immediate unmap of compacted SSTables, may not work as intended[WARN] [main] > 2020-07-02 20:31:30,786 SigarLibrary.java:174 - Cassandra server running in > degraded mode. Is swap disabled? : false, Address space adequate? : true, > nofile limit adequate? : false, nproc limit adequate? : true [WARN] [main] > 2020-07-02 20:31:30,789 StartupChecks.java:265 - Maximum number of memory map > areas per process (vm.max_map_count) 65530 is too low, recommended value: > 1048575, you can change it with sysctl. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15947) nodetool gossipinfo doc does not document the output
[ https://issues.apache.org/jira/browse/CASSANDRA-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15947: - Complexity: Low Hanging Fruit > nodetool gossipinfo doc does not document the output > > > Key: CASSANDRA-15947 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15947 > Project: Cassandra > Issue Type: Improvement >Reporter: Jens Rantil >Priority: Low > > [https://cassandra.apache.org/doc/latest/tools/nodetool/gossipinfo.html] does > not contain any sample output, nor does does it explain what the fields mean. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15947) nodetool gossipinfo doc does not document the output
[ https://issues.apache.org/jira/browse/CASSANDRA-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15947: - Priority: Normal (was: Low) > nodetool gossipinfo doc does not document the output > > > Key: CASSANDRA-15947 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15947 > Project: Cassandra > Issue Type: Improvement >Reporter: Jens Rantil >Priority: Normal > > [https://cassandra.apache.org/doc/latest/tools/nodetool/gossipinfo.html] does > not contain any sample output, nor does does it explain what the fields mean. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15947) nodetool gossipinfo doc does not document the output
[ https://issues.apache.org/jira/browse/CASSANDRA-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15947: - Priority: Low (was: Normal) > nodetool gossipinfo doc does not document the output > > > Key: CASSANDRA-15947 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15947 > Project: Cassandra > Issue Type: Improvement >Reporter: Jens Rantil >Priority: Low > > [https://cassandra.apache.org/doc/latest/tools/nodetool/gossipinfo.html] does > not contain any sample output, nor does does it explain what the fields mean. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168411#comment-17168411 ] Jeremy Hanna commented on CASSANDRA-13701: -- I'm finally getting things going with dtests on a server (after spending hours trying to run them on my laptop). One of the failures with the num_tokens update with bootstrap.py just needed to have a little more time.sleep - from 5 to 10 seconds. I'm going through all of them to see if I can fix them in some way and will then see if I can work with someone to get an updated set of dtests running on the jenkins server. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Jeremy Hanna >Priority: Low > Fix For: 4.0-alpha > > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155036#comment-17155036 ] Jeremy Hanna commented on CASSANDRA-13701: -- So if it's a matter of like Dimitar says, making the bootstraps sequential, then that isn't strictly an error as much as an unfortunate side effect of the new algorithm with dtest parallelism. So there appears to be two paths forward: 1) Use the randomized algorithm both in tests and in the defaults with a higher num_tokens count 2) Change the dtests with bootstrapping/joining to be sequential with the new defaults Is it possible to start by trying option 2 and see where that gets us in terms of dtest runtimes and errors? I don't want to go down a rabbit hole but it would be nice to quantify the trade-offs. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Jeremy Hanna >Priority: Low > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14902) Update the default for compaction_throughput_mb_per_sec
[ https://issues.apache.org/jira/browse/CASSANDRA-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153426#comment-17153426 ] Jeremy Hanna commented on CASSANDRA-14902: -- Dev mailing list discussion on this and {{num_tokens}} update to the defaults: https://lists.apache.org/thread.html/r3cdf12db175c3f49a7ecda7632c821c5ef37fd0d95ffdc0e28e2d120%40%3Cdev.cassandra.apache.org%3E > Update the default for compaction_throughput_mb_per_sec > --- > > Key: CASSANDRA-14902 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14902 > Project: Cassandra > Issue Type: Task > Components: Local/Compaction, Local/Config >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Low > > compaction_throughput_mb_per_sec has been at 16 since probably 0.6 or 0.7 > back when a lot of people had to deploy on spinning disks. It seems like it > would make sense to update the default to something more reasonable - > assuming a reasonably decent SSD and competing IO. One idea that could be > bikeshedded to death could be to just default it to 64 - simply to avoid > people from having to always change that any time they download a new version > as well as avoid problems with new users thinking that the defaults are sane. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15931) USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled
[ https://issues.apache.org/jira/browse/CASSANDRA-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15931: - Component/s: Local/Startup and Shutdown > USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled > > > Key: CASSANDRA-15931 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15931 > Project: Cassandra > Issue Type: Bug > Components: Local/Startup and Shutdown >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Normal > > {code} > echo $JVM_OPTS | grep -q UseG1GC > USING_G1=$? > {code} > This code will set {{USING_G1}} to {{0}} if G1 is explicitly enabled > ({{+UseG1GC}}) *or* explicitly disabled ({{-UseG1GC}}), as found on > CASSANDRA-15839. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15931) USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled
[ https://issues.apache.org/jira/browse/CASSANDRA-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15931: - Status: Triage Needed (was: Awaiting Feedback) > USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled > > > Key: CASSANDRA-15931 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15931 > Project: Cassandra > Issue Type: Bug >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Normal > > {code} > echo $JVM_OPTS | grep -q UseG1GC > USING_G1=$? > {code} > This code will set {{USING_G1}} to {{0}} if G1 is explicitly enabled > ({{+UseG1GC}}) *or* explicitly disabled ({{-UseG1GC}}), as found on > CASSANDRA-15839. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15931) USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled
[ https://issues.apache.org/jira/browse/CASSANDRA-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna reassigned CASSANDRA-15931: Assignee: Jeremy Hanna > USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled > > > Key: CASSANDRA-15931 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15931 > Project: Cassandra > Issue Type: Bug >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Normal > > {code} > echo $JVM_OPTS | grep -q UseG1GC > USING_G1=$? > {code} > This code will set {{USING_G1}} to {{0}} if G1 is explicitly enabled > ({{+UseG1GC}}) *or* explicitly disabled ({{-UseG1GC}}), as found on > CASSANDRA-15839. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15931) USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled
[ https://issues.apache.org/jira/browse/CASSANDRA-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15931: - Status: Awaiting Feedback (was: Triage Needed) > USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled > > > Key: CASSANDRA-15931 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15931 > Project: Cassandra > Issue Type: Bug >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Normal > > {code} > echo $JVM_OPTS | grep -q UseG1GC > USING_G1=$? > {code} > This code will set {{USING_G1}} to {{0}} if G1 is explicitly enabled > ({{+UseG1GC}}) *or* explicitly disabled ({{-UseG1GC}}), as found on > CASSANDRA-15839. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15931) USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled
[ https://issues.apache.org/jira/browse/CASSANDRA-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153201#comment-17153201 ] Jeremy Hanna commented on CASSANDRA-15931: -- PR after testing that it matched {{+UseG1GC}} but not {{-UseG1GC}}: https://github.com/apache/cassandra/pull/667 > USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled > > > Key: CASSANDRA-15931 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15931 > Project: Cassandra > Issue Type: Bug > Components: Local/Startup and Shutdown >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > {code} > echo $JVM_OPTS | grep -q UseG1GC > USING_G1=$? > {code} > This code will set {{USING_G1}} to {{0}} if G1 is explicitly enabled > ({{+UseG1GC}}) *or* explicitly disabled ({{-UseG1GC}}), as found on > CASSANDRA-15839. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15931) USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled
[ https://issues.apache.org/jira/browse/CASSANDRA-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15931: - Description: {code} echo $JVM_OPTS | grep -q UseG1GC USING_G1=$? {code} This code will set {{USING_G1}} to {{0}} if G1 is explicitly enabled ({{+UseG1GC}}) *or* explicitly disabled ({{-UseG1GC}}), as found on CASSANDRA-15839. was: {code} echo $JVM_OPTS | grep -q UseG1GC USING_G1=$? {code} This code will set {{USING_G1}} to {{0}} if G1 is explicitly enabled *or* explicitly disabled, as found on CASSANDRA-15839. > USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled > > > Key: CASSANDRA-15931 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15931 > Project: Cassandra > Issue Type: Bug >Reporter: Jeremy Hanna >Priority: Normal > > {code} > echo $JVM_OPTS | grep -q UseG1GC > USING_G1=$? > {code} > This code will set {{USING_G1}} to {{0}} if G1 is explicitly enabled > ({{+UseG1GC}}) *or* explicitly disabled ({{-UseG1GC}}), as found on > CASSANDRA-15839. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15839) Warn or fail to start server when G1 is used and Xmn is set
[ https://issues.apache.org/jira/browse/CASSANDRA-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153193#comment-17153193 ] Jeremy Hanna edited comment on CASSANDRA-15839 at 7/8/20, 2:49 AM: --- This is the code that *should* detect whether or not G1 is enabled. However you can either enable or disable it with {{+UseG1GC}} or {{-UseG1GC}} which are both matched by that {{grep}}. I have to admit I've never seen anyone in practice explicitly disable G1. {code:sh} echo $JVM_OPTS | grep -q UseG1GC USING_G1=$? {code} That also affects whether we calculate heap sizes automatically or whether we fail out when we don't set the heap and new size in pairs: {code:sh} # only calculate the size if it's not set manually if [ "x$MAX_HEAP_SIZE" = "x" ] && [ "x$HEAP_NEWSIZE" = "x" -o $USING_G1 -eq 0 ]; then calculate_heap_sizes elif [ "x$MAX_HEAP_SIZE" = "x" ] || [ "x$HEAP_NEWSIZE" = "x" -a $USING_G1 -ne 0 ]; then echo "please set or unset MAX_HEAP_SIZE and HEAP_NEWSIZE in pairs when using CMS GC (see cassandra-env.sh)" exit 1 fi {code} Created CASSANDRA-15931 to address this. was (Author: jeromatron): This is the code that *should* detect whether or not G1 is enabled. However since you can either enable or disable it with that string {{+UseG1GC}} or {{-UseG1GC}}. I have to admit I've never seen anyone in practice explicitly disable G1. {code:sh} echo $JVM_OPTS | grep -q UseG1GC USING_G1=$? {code} That also affects whether we calculate heap sizes automatically or whether we fail out when we don't set the heap and new size in pairs: {code:sh} # only calculate the size if it's not set manually if [ "x$MAX_HEAP_SIZE" = "x" ] && [ "x$HEAP_NEWSIZE" = "x" -o $USING_G1 -eq 0 ]; then calculate_heap_sizes elif [ "x$MAX_HEAP_SIZE" = "x" ] || [ "x$HEAP_NEWSIZE" = "x" -a $USING_G1 -ne 0 ]; then echo "please set or unset MAX_HEAP_SIZE and HEAP_NEWSIZE in pairs when using CMS GC (see cassandra-env.sh)" exit 1 fi {code} Created CASSANDRA-15931 to address this. > Warn or fail to start server when G1 is used and Xmn is set > --- > > Key: CASSANDRA-15839 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15839 > Project: Cassandra > Issue Type: Improvement > Components: Local/Startup and Shutdown >Reporter: Jeremy Hanna >Assignee: Anthony Grasso >Priority: Normal > Fix For: 4.0, 4.0-alpha5 > > > In jvm.options, we currently have a comment above where Xmn is set that says > that you shouldn't set Xmn with G1 GC. That isn't enough - we should either > warn in the logs or fail startup when they are set together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15839) Warn or fail to start server when G1 is used and Xmn is set
[ https://issues.apache.org/jira/browse/CASSANDRA-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153193#comment-17153193 ] Jeremy Hanna commented on CASSANDRA-15839: -- This is the code that *should* detect whether or not G1 is enabled. However since you can either enable or disable it with that string {{+UseG1GC}} or {{-UseG1GC}}. I have to admit I've never seen anyone in practice explicitly disable G1. {code:sh} echo $JVM_OPTS | grep -q UseG1GC USING_G1=$? {code} That also affects whether we calculate heap sizes automatically or whether we fail out when we don't set the heap and new size in pairs: {code:sh} # only calculate the size if it's not set manually if [ "x$MAX_HEAP_SIZE" = "x" ] && [ "x$HEAP_NEWSIZE" = "x" -o $USING_G1 -eq 0 ]; then calculate_heap_sizes elif [ "x$MAX_HEAP_SIZE" = "x" ] || [ "x$HEAP_NEWSIZE" = "x" -a $USING_G1 -ne 0 ]; then echo "please set or unset MAX_HEAP_SIZE and HEAP_NEWSIZE in pairs when using CMS GC (see cassandra-env.sh)" exit 1 fi {code} Created CASSANDRA-15931 to address this. > Warn or fail to start server when G1 is used and Xmn is set > --- > > Key: CASSANDRA-15839 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15839 > Project: Cassandra > Issue Type: Improvement > Components: Local/Startup and Shutdown >Reporter: Jeremy Hanna >Assignee: Anthony Grasso >Priority: Normal > Fix For: 4.0, 4.0-alpha5 > > > In jvm.options, we currently have a comment above where Xmn is set that says > that you shouldn't set Xmn with G1 GC. That isn't enough - we should either > warn in the logs or fail startup when they are set together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15931) USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled
Jeremy Hanna created CASSANDRA-15931: Summary: USING_G1 is incorrectly set in cassandra-env.sh if G1 is explicitly disabled Key: CASSANDRA-15931 URL: https://issues.apache.org/jira/browse/CASSANDRA-15931 Project: Cassandra Issue Type: Bug Reporter: Jeremy Hanna {code} echo $JVM_OPTS | grep -q UseG1GC USING_G1=$? {code} This code will set {{USING_G1}} to {{0}} if G1 is explicitly enabled *or* explicitly disabled, as found on CASSANDRA-15839. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15803) Separate out allow filtering scanning through a partition versus scanning over the table
[ https://issues.apache.org/jira/browse/CASSANDRA-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15803: - Description: Currently allow filtering can mean two things in the spirit of "avoid operations that don't seek to a specific row or sequential rows of data." First, it can mean scanning across the entire table to meet the criteria of the query. That's almost always a bad thing and should be discouraged or disabled (see CASSANDRA-8303). Second, it can mean filtering within a specific partition. For example, in a query you could specify the full partition key and if you specify a criterion on a non-key field, it requires allow filtering. The second reason to require allow filtering is significantly less work to scan through a partition. It is still extra work over seeking to a specific row and getting N sequential rows though. So while an application developer and/or operator needs to be cautious about this second type, it's not necessarily a bad thing, depending on the table and the use case. I propose that we separate the way to specify allow filtering across an entire table from specifying allow filtering across a partition in a backwards compatible way. One idea that was brought up in Slack in the cassandra-dev room was to have allow filtering mean the superset - scanning across the table. Then if you want to specify that you *only* want to scan within a partition you would use something like {{ALLOW FILTERING [WITHIN PARTITION]}} So it will succeed if you specify non-key criteria within a single partition, but fail with a message to say it requires the full allow filtering. This would allow for a backwards compatible full allow filtering while allowing a user to specify that they want to just scan within a partition, but error out if trying to scan a full table. This is potentially also related to the capability limitation framework by which operators could more granularly specify what features are allowed or disallowed per user, discussed in CASSANDRA-8303. This way an operator could disallow the more general allow filtering while allowing the partition scan (or disallow them both at their discretion). was: Currently allow filtering can mean two things in the spirit of "avoid operations that don't seek to a specific row or sequential rows of data." First, it can mean scanning across the entire table to meet the criteria of the query. That's almost always a bad thing and should be discouraged or disabled (see CASSANDRA-8303). Second, it can mean filtering within a specific partition. For example, in a query you could specify the full partition key and if you specify a criterion on a non-key field, it requires allow filtering. The second reason to require allow filtering is significantly less work to scan through a partition. It is still extra work over seeking to a specific row and getting N sequential rows though. So while an application developer and/or operator needs to be cautious about this second type, it's not necessarily a bad thing, depending on the table and the use case. I propose that we separate the way to specify allow filtering across an entire table (involving a scatter gather) from specifying allow filtering across a partition in a backwards compatible way. One idea that was brought up in Slack in the cassandra-dev room was to have allow filtering mean the superset - scanning across the table. Then if you want to specify that you *only* want to scan within a partition you would use something like {{ALLOW FILTERING [WITHIN PARTITION]}} So it will succeed if you specify non-key criteria within a single partition, but fail with a message to say it requires the full allow filtering. This would allow for a backwards compatible full allow filtering while allowing a user to specify that they want to just scan within a partition, but error out if trying to scan a full table. This is potentially also related to the capability limitation framework by which operators could more granularly specify what features are allowed or disallowed per user, discussed in CASSANDRA-8303. This way an operator could disallow the more general allow filtering while allowing the partition scan (or disallow them both at their discretion). > Separate out allow filtering scanning through a partition versus scanning > over the table > > > Key: CASSANDRA-15803 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15803 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Syntax >Reporter: Jeremy Hanna >Priority: Normal > > Currently allow filtering can mean two things in the spirit of "avoid > operations that don't seek to a specific row or
[jira] [Comment Edited] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152460#comment-17152460 ] Jeremy Hanna edited comment on CASSANDRA-13701 at 7/7/20, 3:28 AM: --- Can we also standardize the tests to use the default values - that is, from 32 to the new defaults (16 {{num_tokens}} with {{allocate_tokens_for_local_replication_factor=3}} uncommented). was (Author: jeromatron): Can we also standardize the tests to use the default values - that is, from 32 to the new defaults (16 {{num_tokens}} with {{allocate_tokens_for_local_replication_factor=3}} uncommented. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Jeremy Hanna >Priority: Low > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152460#comment-17152460 ] Jeremy Hanna commented on CASSANDRA-13701: -- Can we also standardize the tests to use the default values - that is, from 32 to the new defaults (16 {{num_tokens}} with {{allocate_tokens_for_local_replication_factor=3}} uncommented. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Jeremy Hanna >Priority: Low > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-13701: - Test and Documentation Plan: Associated documentation about num_tokens is in [https://cassandra.apache.org/doc/latest/getting_started/production.html#tokens] as part of CASSANDRA-15618 as well as upgrading information in NEWS.txt. Status: Patch Available (was: In Progress) Pull request: https://github.com/apache/cassandra/pull/663 > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Jeremy Hanna >Priority: Low > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14902) Update the default for compaction_throughput_mb_per_sec
[ https://issues.apache.org/jira/browse/CASSANDRA-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151826#comment-17151826 ] Jeremy Hanna commented on CASSANDRA-14902: -- Added a NEWS.txt entry in the upgrading section to the PR. > Update the default for compaction_throughput_mb_per_sec > --- > > Key: CASSANDRA-14902 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14902 > Project: Cassandra > Issue Type: Task > Components: Local/Compaction, Local/Config >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Low > > compaction_throughput_mb_per_sec has been at 16 since probably 0.6 or 0.7 > back when a lot of people had to deploy on spinning disks. It seems like it > would make sense to update the default to something more reasonable - > assuming a reasonably decent SSD and competing IO. One idea that could be > bikeshedded to death could be to just default it to 64 - simply to avoid > people from having to always change that any time they download a new version > as well as avoid problems with new users thinking that the defaults are sane. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14902) Update the default for compaction_throughput_mb_per_sec
[ https://issues.apache.org/jira/browse/CASSANDRA-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151696#comment-17151696 ] Jeremy Hanna commented on CASSANDRA-14902: -- I assumed that updating to 64 would be uncontroversial because that's what I know many change it to (including myself) as a first step/starting point. If we want to do more extensive comparison testing of different values, that's fine, but I think it would depend on the goal. IO is going to be different for every system and every workload/pattern is going to be somewhat unique. I thought 64 would at least make it not *required* to change it from the default as a first step. > Update the default for compaction_throughput_mb_per_sec > --- > > Key: CASSANDRA-14902 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14902 > Project: Cassandra > Issue Type: Task > Components: Local/Compaction, Local/Config >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Low > > compaction_throughput_mb_per_sec has been at 16 since probably 0.6 or 0.7 > back when a lot of people had to deploy on spinning disks. It seems like it > would make sense to update the default to something more reasonable - > assuming a reasonably decent SSD and competing IO. One idea that could be > bikeshedded to death could be to just default it to 64 - simply to avoid > people from having to always change that any time they download a new version > as well as avoid problems with new users thinking that the defaults are sane. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14902) Update the default for compaction_throughput_mb_per_sec
[ https://issues.apache.org/jira/browse/CASSANDRA-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-14902: - Test and Documentation Plan: Just updated the default value and comments so don't need much. Status: Patch Available (was: In Progress) The pull request: https://github.com/apache/cassandra/pull/662 > Update the default for compaction_throughput_mb_per_sec > --- > > Key: CASSANDRA-14902 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14902 > Project: Cassandra > Issue Type: Task > Components: Local/Compaction, Local/Config >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Low > > compaction_throughput_mb_per_sec has been at 16 since probably 0.6 or 0.7 > back when a lot of people had to deploy on spinning disks. It seems like it > would make sense to update the default to something more reasonable - > assuming a reasonably decent SSD and competing IO. One idea that could be > bikeshedded to death could be to just default it to 64 - simply to avoid > people from having to always change that any time they download a new version > as well as avoid problems with new users thinking that the defaults are sane. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14902) Update the default for compaction_throughput_mb_per_sec
[ https://issues.apache.org/jira/browse/CASSANDRA-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna reassigned CASSANDRA-14902: Assignee: Jeremy Hanna > Update the default for compaction_throughput_mb_per_sec > --- > > Key: CASSANDRA-14902 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14902 > Project: Cassandra > Issue Type: Task > Components: Local/Compaction, Local/Config >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Low > > compaction_throughput_mb_per_sec has been at 16 since probably 0.6 or 0.7 > back when a lot of people had to deploy on spinning disks. It seems like it > would make sense to update the default to something more reasonable - > assuming a reasonably decent SSD and competing IO. One idea that could be > bikeshedded to death could be to just default it to 64 - simply to avoid > people from having to always change that any time they download a new version > as well as avoid problems with new users thinking that the defaults are sane. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15860) Cannot change the number of tokens from 512 to 256 Fatal configuration error; unable to start server.
[ https://issues.apache.org/jira/browse/CASSANDRA-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15860: - Resolution: Not A Bug Status: Resolved (was: Triage Needed) As mentioned previously, it's simply not possible to change num_tokens after data is written to a data center. > Cannot change the number of tokens from 512 to 256 Fatal configuration error; > unable to start server. > - > > Key: CASSANDRA-15860 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15860 > Project: Cassandra > Issue Type: Bug >Reporter: Krishnakumar Jinka >Priority: Normal > > Hello, I was following this issue from jira: > https://issues.apache.org/jira/browse/CASSANDRA-11811?jql=text%20~%20%22CassandraDaemon.java%20Cannot%20change%22 > . We are using 3.11.2 and i see this error in the log while starting the > cassandra, and it fails. I read the jira and understood that mutation > happens, thereby doubling the number of tokens, and hence due to mismatch > INFO [main] [2020-05-28 11:05:14] OutboundTcpConnection.java:108 - > OutboundTcpConnection using coalescing strategy DISABLED > INFO [HANDSHAKE-/192.168.5.53] [2020-05-28 11:05:14] > OutboundTcpConnection.java:560 - Handshaking version with /192.168.5.53 > INFO [main] [2020-05-28 11:05:15] StorageService.java:707 - Loading persisted > ring state > INFO [main] [2020-05-28 11:05:15] StorageService.java:825 - Starting up > server gossip > INFO [main] [2020-05-28 11:05:15] TokenMetadata.java:479 - Updating topology > for /192.168.5.52 > INFO [main] [2020-05-28 11:05:15] TokenMetadata.java:479 - Updating topology > for /192.168.5.52 > Cannot change the number of tokens from 512 to 256 > Fatal configuration error; unable to start server. See log for stacktrace. > ERROR [main] [2020-05-28 11:05:15] CassandraDaemon.java:708 - Fatal > configuration error > org.apache.cassandra.exceptions.ConfigurationException: Cannot change the > number of tokens from 512 to 256 > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:989) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:682) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:613) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:379) > [apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:602) > [apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:691) > [apache-cassandra-3.11.2.jar:3.11.2] > INFO [StorageServiceShutdownHook] [2020-05-28 11:05:15] HintsService.java:220 > - Paused hints dispatch > INFO [StorageServiceShutdownHook] [2020-05-28 11:05:15] Gossiper.java:1540 - > Announcing shutdown > INFO [StorageServiceShutdownHook] [2020-05-28 11:05:15] > StorageService.java:2292 - Node /192.168.5.52 state jump to shutdown > INFO [HANDSHAKE-/192.168.5.53] [2020-05-28 11:05:15] > OutboundTcpConnection.java:560 - Handshaking version with /192.168.5.53 > I would like to know > # what would be the root cause of this error > # How to recover from this error. Because everytime i start the Cassandra, > it is blocked due to this. > /etc/cassandra/conf/cassandra.yaml > contains num_tokens as 256 , auto_bootstrap is not provided, i guess by > default it will be true. > INFO [main] [2020-05-28 11:05:13] StorageService.java:618 - Cassandra > version: 3.11.2 > INFO [main] [2020-05-28 11:05:13] StorageService.java:619 - Thrift API > version: 20.1.0 > INFO [main] [2020-05-28 11:05:13] StorageService.java:620 - CQL supported > versions: 3.4.4 (default: 3.4.4) > INFO [main] [2020-05-28 11:05:13] StorageService.java:622 - Native protocol > supported versions: 3/v3, 4/v4, 5/v5-beta (default: 4/v4) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15860) Cannot change the number of tokens from 512 to 256 Fatal configuration error; unable to start server.
[ https://issues.apache.org/jira/browse/CASSANDRA-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136328#comment-17136328 ] Jeremy Hanna commented on CASSANDRA-15860: -- You cannot change the num_tokens on a running cluster, you have to either # add another logical datacenter with a different num_tokens value (normal add datacenter procedure, add nodes without replication and then do a nodetool rebuild on each node after adding replication) or # create a new cluster with a different num_tokens value and sstable load data from the original cluster I would go with option 1 if you can. The reason why you can't change after data has been written is because data is already stored on the node for the 512 token ranges it has already claimed. You can't change that without rewriting the data around the cluster. So it's simplest to add a new DC where that data can be written again. See this [blog post|https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html] for setting up token allocation optimally. > Cannot change the number of tokens from 512 to 256 Fatal configuration error; > unable to start server. > - > > Key: CASSANDRA-15860 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15860 > Project: Cassandra > Issue Type: Bug >Reporter: Krishnakumar Jinka >Priority: Normal > > Hello, I was following this issue from jira: > https://issues.apache.org/jira/browse/CASSANDRA-11811?jql=text%20~%20%22CassandraDaemon.java%20Cannot%20change%22 > . We are using 3.11.2 and i see this error in the log while starting the > cassandra, and it fails. I read the jira and understood that mutation > happens, thereby doubling the number of tokens, and hence due to mismatch > INFO [main] [2020-05-28 11:05:14] OutboundTcpConnection.java:108 - > OutboundTcpConnection using coalescing strategy DISABLED > INFO [HANDSHAKE-/192.168.5.53] [2020-05-28 11:05:14] > OutboundTcpConnection.java:560 - Handshaking version with /192.168.5.53 > INFO [main] [2020-05-28 11:05:15] StorageService.java:707 - Loading persisted > ring state > INFO [main] [2020-05-28 11:05:15] StorageService.java:825 - Starting up > server gossip > INFO [main] [2020-05-28 11:05:15] TokenMetadata.java:479 - Updating topology > for /192.168.5.52 > INFO [main] [2020-05-28 11:05:15] TokenMetadata.java:479 - Updating topology > for /192.168.5.52 > Cannot change the number of tokens from 512 to 256 > Fatal configuration error; unable to start server. See log for stacktrace. > ERROR [main] [2020-05-28 11:05:15] CassandraDaemon.java:708 - Fatal > configuration error > org.apache.cassandra.exceptions.ConfigurationException: Cannot change the > number of tokens from 512 to 256 > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:989) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:682) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:613) > ~[apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:379) > [apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:602) > [apache-cassandra-3.11.2.jar:3.11.2] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:691) > [apache-cassandra-3.11.2.jar:3.11.2] > INFO [StorageServiceShutdownHook] [2020-05-28 11:05:15] HintsService.java:220 > - Paused hints dispatch > INFO [StorageServiceShutdownHook] [2020-05-28 11:05:15] Gossiper.java:1540 - > Announcing shutdown > INFO [StorageServiceShutdownHook] [2020-05-28 11:05:15] > StorageService.java:2292 - Node /192.168.5.52 state jump to shutdown > INFO [HANDSHAKE-/192.168.5.53] [2020-05-28 11:05:15] > OutboundTcpConnection.java:560 - Handshaking version with /192.168.5.53 > I would like to know > # what would be the root cause of this error > # How to recover from this error. Because everytime i start the Cassandra, > it is blocked due to this. > /etc/cassandra/conf/cassandra.yaml > contains num_tokens as 256 , auto_bootstrap is not provided, i guess by > default it will be true. > INFO [main] [2020-05-28 11:05:13] StorageService.java:618 - Cassandra > version: 3.11.2 > INFO [main] [2020-05-28 11:05:13] StorageService.java:619 - Thrift API > version: 20.1.0 > INFO [main] [2020-05-28 11:05:13] StorageService.java:620 - CQL supported > versions: 3.4.4 (default: 3.4.4) > INFO [main] [2020-05-28 11:05:13] StorageService.java:622 - Native protocol > supported versions: 3/v3, 4/v4, 5/v5-beta (default: 4/v4) > -- This message was sent by
[jira] [Updated] (CASSANDRA-15839) Warn or fail to start server when G1 is used and Xmn is set
[ https://issues.apache.org/jira/browse/CASSANDRA-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15839: - Reviewers: Jon Haddad > Warn or fail to start server when G1 is used and Xmn is set > --- > > Key: CASSANDRA-15839 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15839 > Project: Cassandra > Issue Type: Improvement > Components: Local/Startup and Shutdown >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Normal > > In jvm.options, we currently have a comment above where Xmn is set that says > that you shouldn't set Xmn with G1 GC. That isn't enough - we should either > warn in the logs or fail startup when they are set together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15839) Warn or fail to start server when G1 is used and Xmn is set
[ https://issues.apache.org/jira/browse/CASSANDRA-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15839: - Test and Documentation Plan: I did some basic testing around the startup options and it works as expected. Status: Patch Available (was: Open) https://github.com/apache/cassandra/pull/607 > Warn or fail to start server when G1 is used and Xmn is set > --- > > Key: CASSANDRA-15839 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15839 > Project: Cassandra > Issue Type: Improvement > Components: Local/Startup and Shutdown >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Normal > > In jvm.options, we currently have a comment above where Xmn is set that says > that you shouldn't set Xmn with G1 GC. That isn't enough - we should either > warn in the logs or fail startup when they are set together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15839) Warn or fail to start server when G1 is used and Xmn is set
[ https://issues.apache.org/jira/browse/CASSANDRA-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15839: - Change Category: Operability Complexity: Low Hanging Fruit Component/s: Local/Startup and Shutdown Status: Open (was: Triage Needed) > Warn or fail to start server when G1 is used and Xmn is set > --- > > Key: CASSANDRA-15839 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15839 > Project: Cassandra > Issue Type: Improvement > Components: Local/Startup and Shutdown >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Normal > > In jvm.options, we currently have a comment above where Xmn is set that says > that you shouldn't set Xmn with G1 GC. That isn't enough - we should either > warn in the logs or fail startup when they are set together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15839) Warn or fail to start server when G1 is used and Xmn is set
[ https://issues.apache.org/jira/browse/CASSANDRA-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15839: - Status: Triage Needed (was: Awaiting Feedback) > Warn or fail to start server when G1 is used and Xmn is set > --- > > Key: CASSANDRA-15839 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15839 > Project: Cassandra > Issue Type: Improvement >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Normal > > In jvm.options, we currently have a comment above where Xmn is set that says > that you shouldn't set Xmn with G1 GC. That isn't enough - we should either > warn in the logs or fail startup when they are set together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15839) Warn or fail to start server when G1 is used and Xmn is set
[ https://issues.apache.org/jira/browse/CASSANDRA-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15839: - Status: Awaiting Feedback (was: Triage Needed) > Warn or fail to start server when G1 is used and Xmn is set > --- > > Key: CASSANDRA-15839 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15839 > Project: Cassandra > Issue Type: Improvement >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Normal > > In jvm.options, we currently have a comment above where Xmn is set that says > that you shouldn't set Xmn with G1 GC. That isn't enough - we should either > warn in the logs or fail startup when they are set together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15521) Update default for num_tokens from 256 to something more reasonable
[ https://issues.apache.org/jira/browse/CASSANDRA-15521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15521: - Resolution: Duplicate Status: Resolved (was: Triage Needed) > Update default for num_tokens from 256 to something more reasonable > --- > > Key: CASSANDRA-15521 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15521 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Virtual Nodes >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Normal > > The default for num_tokens or the number of token ranges assigned to a node > using virtual nodes is way too high. 256 token ranges makes repair painful. > Since it's a default, someone new to Cassandra won't know better and if left > unchanged, they will have to live with it or perform a migration to a new > datacenter with a lower number. > At the same time, going too low with the default allocation algorithm can > hotspot nodes to have more tokens assigned than others. There is a new token > allocation algorithm introduced but it's not default. > The proposal of this ticket is to set the default to something more > reasonable to align with best practices without using the new token algorithm > or giving it specific token values as some do. 32 is a good compromise and > is what the project uses in a lot of the tests that are done. > So generally it would be good to move to a more sane value and to align with > testing so users are more confident that the defaults have a lot of testing > behind them. > As discussed on the dev mailing list, we want to make sure this change to the > default doesn't come as an unpleasant surprise to cluster operators. For > num_tokens specifically, if you were to upgrade to a version with the new > default and the user didn't change it to the existing value, the node would > not start, saying you can't change the num_tokens on an existing node. So we > will want to put a release note to indicate that when upgrading, make a note > of the num_tokens change when looking at the new configuration. > Along with not being able to start nodes, which is fail-fast, there is the > matter of adding new nodes to the cluster. You can certainly add a new node > to a cluster or datacenter with a different number of token ranges assigned. > It will give that node a different amount of data to be responsible for. For > example, if the nodes in a datacenter all have num_tokens=256 (current > default) and you add a node to that datacenter with num_tokens=32 (new > default), it will only claim 1/8th of the token ranges and data as the other > nodes in that datacenter. Fortunately, this is a property that is explicitly > defined rather than implicit like some of the table settings. Also most if > not all operators will upgrade the existing nodes to that new version before > trying to add a node with that new version. So if there is a different > number for num_tokens on the existing nodes, they'll be aware of it > immediately. > In any case, this is a long proposal for what will be a small change in the > cassandra.yaml and something in the release notes, that is, changing the > default num_tokens value from 256 to 32. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15839) Warn or fail to start server when G1 is used and Xmn is set
[ https://issues.apache.org/jira/browse/CASSANDRA-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna reassigned CASSANDRA-15839: Assignee: Jeremy Hanna > Warn or fail to start server when G1 is used and Xmn is set > --- > > Key: CASSANDRA-15839 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15839 > Project: Cassandra > Issue Type: Improvement >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Normal > > In jvm.options, we currently have a comment above where Xmn is set that says > that you shouldn't set Xmn with G1 GC. That isn't enough - we should either > warn in the logs or fail startup when they are set together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15839) Warn or fail to start server when G1 is used and Xmn is set
Jeremy Hanna created CASSANDRA-15839: Summary: Warn or fail to start server when G1 is used and Xmn is set Key: CASSANDRA-15839 URL: https://issues.apache.org/jira/browse/CASSANDRA-15839 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna In jvm.options, we currently have a comment above where Xmn is set that says that you shouldn't set Xmn with G1 GC. That isn't enough - we should either warn in the logs or fail startup when they are set together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15823) Support for networking via identity instead of IP
[ https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116489#comment-17116489 ] Jeremy Hanna edited comment on CASSANDRA-15823 at 5/26/20, 6:50 AM: Adding in related issues where host id (CASSANDRA-4120 for vnodes in 1.2) and previously token (CASSANDRA-1518 for 0.7) were put in place of IP address to identify nodes - for historical purposes. was (Author: jeromatron): Adding in related issues where host id (for vnodes in 1.2) and previously token (for 0.7) were put in place of IP address to identify nodes - for historical purposes. > Support for networking via identity instead of IP > - > > Key: CASSANDRA-15823 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15823 > Project: Cassandra > Issue Type: Improvement >Reporter: Christopher Bradford >Priority: Normal > Attachments: consul-mesh-gateways.png, > istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg > > > TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows > resolution to different IP addresses per DC that may then be forwarded to > nodes on remote networks without requiring node to node IP connectivity for > cross-dc links. > > This approach should not affect existing deployments as those could continue > to use IPs as the hostname and skip resolution. > > With orchestration platforms like Kubernetes and the usage of ephemeral > containers in environments today we should consider some changes to how we > handle the tracking of nodes and their network location. Currently we > maintain a mapping between host ids and IP addresses. > > With traditional infrastructure, if a node goes down it, usually, comes back > up with the same IP. In some environments this contract may be explicit with > virtual IPs that may move between hosts. In newer deployments, like on > Kubernetes, this contract is not possible. Pods (analogous to nodes) are > assigned an IP address at start time. Should the pod be restarted or > scheduled on a different host there is no guarantee we would have the same > IP. Cassandra is protected here as we already have logic in place to update > peers when we come up with the same host id, but a different IP address. > > There are ways to get Kubernetes to assign a specific IP per Pod. Most > recommendations involve the use of a service per pod. Communication with the > fixed service IP would automatically forward to the associated pod, > regardless of address. We _could_ use this approach, but it seems like this > would needlessly create a number of extra resources in our k8s cluster to get > around the problem. Which, to be fair, doesn't seem like much of a problem > with the aforementioned mitigations built into C*. > > So what is the _actual_ problem? *Cross-region, cross-cloud, > hybrid-deployment connectivity between pods is a pain.* This can be solved > with significant investment by those who want to deploy these types of > topologies. You can definitely configure connectivity between clouds over > dedicated connections, or VPN tunnels. With a big chunk of time insuring that > pod to pod connectivity just works even if those pods are managed by separate > control planes, but that again requires time and talent. There are a number > of edge cases to support between the ever so slight, but very important, > differences in cloud vendor networks. > > Recently there have been a number of innovations that aid in the deployment > and operation of these types of applications on Kubernetes. Service meshes > support distributed microservices running across multiple k8s cluster control > planes in disparate networks. Instead of directly connecting to IP addresses > of remote services instead they use a hostname. With this approach, hostname > traffic may then be routed to a proxy that sends traffic over the WAN > (sometimes with mTLS) to another proxy pod in the remote cluster which then > forwards the data along to the correct pod in that network. (See attached > diagrams) > > Which brings us to the point of this ticket. Instead of mapping host ids to > IPs, use hostnames (and update the underlying address periodically instead of > caching indefinitely). This allows resolution to different IP addresses per > DC (k8s cluster) that may then be forwarded to nodes (pods) on remote > networks (k8s clusters) without requiring node to node (pod to pod) IP > connectivity between them. Traditional deployments can still function like > they do today (even if operators opt to keep using IPs as identifiers instead > of hostnames). This proxy approach is then enabled like those we see in > service meshes. > > _Notes_ > C* already has the concept of broadcast
[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP
[ https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116489#comment-17116489 ] Jeremy Hanna commented on CASSANDRA-15823: -- Adding in related issues where host id (for vnodes in 1.2) and previously token (for 0.7) were put in place of IP address to identify nodes - for historical purposes. > Support for networking via identity instead of IP > - > > Key: CASSANDRA-15823 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15823 > Project: Cassandra > Issue Type: Improvement >Reporter: Christopher Bradford >Priority: Normal > Attachments: consul-mesh-gateways.png, > istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg > > > TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows > resolution to different IP addresses per DC that may then be forwarded to > nodes on remote networks without requiring node to node IP connectivity for > cross-dc links. > > This approach should not affect existing deployments as those could continue > to use IPs as the hostname and skip resolution. > > With orchestration platforms like Kubernetes and the usage of ephemeral > containers in environments today we should consider some changes to how we > handle the tracking of nodes and their network location. Currently we > maintain a mapping between host ids and IP addresses. > > With traditional infrastructure, if a node goes down it, usually, comes back > up with the same IP. In some environments this contract may be explicit with > virtual IPs that may move between hosts. In newer deployments, like on > Kubernetes, this contract is not possible. Pods (analogous to nodes) are > assigned an IP address at start time. Should the pod be restarted or > scheduled on a different host there is no guarantee we would have the same > IP. Cassandra is protected here as we already have logic in place to update > peers when we come up with the same host id, but a different IP address. > > There are ways to get Kubernetes to assign a specific IP per Pod. Most > recommendations involve the use of a service per pod. Communication with the > fixed service IP would automatically forward to the associated pod, > regardless of address. We _could_ use this approach, but it seems like this > would needlessly create a number of extra resources in our k8s cluster to get > around the problem. Which, to be fair, doesn't seem like much of a problem > with the aforementioned mitigations built into C*. > > So what is the _actual_ problem? *Cross-region, cross-cloud, > hybrid-deployment connectivity between pods is a pain.* This can be solved > with significant investment by those who want to deploy these types of > topologies. You can definitely configure connectivity between clouds over > dedicated connections, or VPN tunnels. With a big chunk of time insuring that > pod to pod connectivity just works even if those pods are managed by separate > control planes, but that again requires time and talent. There are a number > of edge cases to support between the ever so slight, but very important, > differences in cloud vendor networks. > > Recently there have been a number of innovations that aid in the deployment > and operation of these types of applications on Kubernetes. Service meshes > support distributed microservices running across multiple k8s cluster control > planes in disparate networks. Instead of directly connecting to IP addresses > of remote services instead they use a hostname. With this approach, hostname > traffic may then be routed to a proxy that sends traffic over the WAN > (sometimes with mTLS) to another proxy pod in the remote cluster which then > forwards the data along to the correct pod in that network. (See attached > diagrams) > > Which brings us to the point of this ticket. Instead of mapping host ids to > IPs, use hostnames (and update the underlying address periodically instead of > caching indefinitely). This allows resolution to different IP addresses per > DC (k8s cluster) that may then be forwarded to nodes (pods) on remote > networks (k8s clusters) without requiring node to node (pod to pod) IP > connectivity between them. Traditional deployments can still function like > they do today (even if operators opt to keep using IPs as identifiers instead > of hostnames). This proxy approach is then enabled like those we see in > service meshes. > > _Notes_ > C* already has the concept of broadcast addresses vs those which are bound on > the node. This approach _could_ be leveraged to provide the behavior we're > looking for, but then the broadcast values would need to be pre-computed > _*and match*_ across all k8s control planes. By using hostnames the > underlying IP
[jira] [Updated] (CASSANDRA-15803) Separate out allow filtering scanning through a partition versus scanning over the table
[ https://issues.apache.org/jira/browse/CASSANDRA-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15803: - Description: Currently allow filtering can mean two things in the spirit of "avoid operations that don't seek to a specific row or sequential rows of data." First, it can mean scanning across the entire table to meet the criteria of the query. That's almost always a bad thing and should be discouraged or disabled (see CASSANDRA-8303). Second, it can mean filtering within a specific partition. For example, in a query you could specify the full partition key and if you specify a criterion on a non-key field, it requires allow filtering. The second reason to require allow filtering is significantly less work to scan through a partition. It is still extra work over seeking to a specific row and getting N sequential rows though. So while an application developer and/or operator needs to be cautious about this second type, it's not necessarily a bad thing, depending on the table and the use case. I propose that we separate the way to specify allow filtering across an entire table (involving a scatter gather) from specifying allow filtering across a partition in a backwards compatible way. One idea that was brought up in Slack in the cassandra-dev room was to have allow filtering mean the superset - scanning across the table. Then if you want to specify that you *only* want to scan within a partition you would use something like {{ALLOW FILTERING [WITHIN PARTITION]}} So it will succeed if you specify non-key criteria within a single partition, but fail with a message to say it requires the full allow filtering. This would allow for a backwards compatible full allow filtering while allowing a user to specify that they want to just scan within a partition, but error out if trying to scan a full table. This is potentially also related to the capability limitation framework by which operators could more granularly specify what features are allowed or disallowed per user, discussed in CASSANDRA-8303. This way an operator could disallow the more general allow filtering while allowing the partition scan (or disallow them both at their discretion). was: Currently allow filtering can mean two things in the spirit of "avoid operations that don't seek to a specific row or sequential rows of data." First, it can mean scanning across the entire table to meet the criteria of the query. That's almost always a bad thing and should be discouraged or disabled (see CASSANDRA-8303). Second, it can mean filtering within a specific partition. For example, in a query you could specify the full partition key and if you specify a criterion on a non-key field, it requires allow filtering. The second reason to require allow filtering is significantly less work to scan through a partition. It is still extra work over seeking to a specific row and getting N sequential rows though. So while an application developer and/or operator needs to be cautious about this second type, it's not necessarily a bad thing, depending on the table and the use case. I propose that we separate the way to specify allow filtering across an entire table (involving a scatter gather) from specifying allow filtering across a partition in a backwards compatible way. One idea that was brought up in Slack in the cassandra-dev room was to have allow filtering mean the superset - scanning across the table. Then if you want to specify that you *only* want to scan within a partition you would use something like {{ALLOW FILTERING [WITHIN PARTITION]}} So it will succeed if you specify non-key criteria within a single partition, but fail with a message to say it requires the full allow filtering. This would allow for a backwards compatible full allow filtering while allowing a user to specify that they want to just scan within a partition, but error out if trying to scan a full table. This is potentially also related to the capability limitation framework by which operators could more granularly specify what features are allowed or disallowed per user, discussed in CASSANDRA-8303. This way an operator could disallow the more general allow filtering while allowing the partition scan (or disallow them both at their discretion). > Separate out allow filtering scanning through a partition versus scanning > over the table > > > Key: CASSANDRA-15803 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15803 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Syntax >Reporter: Jeremy Hanna >Priority: Normal > > Currently allow filtering can mean two things in the spirit of "avoid > operations that
[jira] [Updated] (CASSANDRA-15803) Separate out allow filtering scanning through a partition versus scanning over the table
[ https://issues.apache.org/jira/browse/CASSANDRA-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15803: - Description: Currently allow filtering can mean two things in the spirit of "avoid operations that don't seek to a specific row or sequential rows of data." First, it can mean scanning across the entire table to meet the criteria of the query. That's almost always a bad thing and should be discouraged or disabled (see CASSANDRA-8303). Second, it can mean filtering within a specific partition. For example, in a query you could specify the full partition key and if you specify a criterion on a non-key field, it requires allow filtering. The second reason to require allow filtering is significantly less work to scan through a partition. It is still extra work over seeking to a specific row and getting N sequential rows though. So while an application developer and/or operator needs to be cautious about this second type, it's not necessarily a bad thing, depending on the table and the use case. I propose that we separate the way to specify allow filtering across an entire table (involving a scatter gather) from specifying allow filtering across a partition in a backwards compatible way. One idea that was brought up in Slack in the cassandra-dev room was to have allow filtering mean the superset - scanning across the table. Then if you want to specify that you *only* want to scan within a partition you would use something like {{ALLOW FILTERING [WITHIN PARTITION]}} So it will succeed if you specify non-key criteria within a single partition, but fail with a message to say it requires the full allow filtering. This would allow for a backwards compatible full allow filtering while allowing a user to specify that they want to just scan within a partition, but error out if trying to scan a full table. This is potentially also related to the capability limitation framework by which operators could more granularly specify what features are allowed or disallowed per user, discussed in CASSANDRA-8303. This way an operator could disallow the more general allow filtering while allowing the partition scan (or disallow them both at their discretion). was: Currently allow filtering can mean two things in the spirit of "avoid operations that don't seek to a specific row or sequential rows of data." First, it can mean scanning across the entire table to meet the criteria of the query. That's almost always a bad thing and should be discouraged or disabled (see CASSANDRA-8303). Second, it can mean filtering within a specific partition. For example, in a query you could specify the full partition key and if you specify a criterion on a non-key field, it requires allow filtering. The second reason to require allow filtering is significantly less work to scan through a partition. It is still extra work over seeking to a specific row and getting N sequential rows though. So while an application developer and/or operator needs to be cautious about this second type, it's not necessarily a bad thing, depending on the table and the use case. I propose that we separate the way to specify allow filtering across an entire table (involving a scatter gather) from specifying allow filtering across a partition in a backwards compatible way. One idea that was brought up in Slack in the cassandra-dev room was to have allow filtering mean the superset - scanning across the table. Then if you want to specify that you *only* want to scan within a partition. So it will succeed if you specify non-key criteria within a single partition, but fail with a message to say it requires the full allow filtering. One way would be to have it be {{ALLOW FILTERING [WITHIN PARTITION]}} This would allow for a backwards compatible full allow filtering while allowing a user to specify that they want to just scan within a partition, but error out if trying to scan a full table. This is potentially also related to the capability limitation framework by which operators could more granularly specify what features are allowed or disallowed per user, discussed in CASSANDRA-8303. This way an operator could disallow the more general allow filtering while allowing the partition scan (or disallow them both at their discretion). > Separate out allow filtering scanning through a partition versus scanning > over the table > > > Key: CASSANDRA-15803 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15803 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Syntax >Reporter: Jeremy Hanna >Priority: Normal > > Currently allow filtering can mean two things in the spirit of "avoid > operations that
[jira] [Commented] (CASSANDRA-15775) Configuration to disallow queries with "allow filtering"
[ https://issues.apache.org/jira/browse/CASSANDRA-15775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105070#comment-17105070 ] Jeremy Hanna commented on CASSANDRA-15775: -- See CASSANDRA-8303 which generalizes this. The problem though is that allow filtering has two purposes - first, it allows you to scan over multiple partitions which is almost always bad. Second it is needed if you are scanning through a partition. See CASSANDRA-15803 for a proposal to separate out the first from the second case. > Configuration to disallow queries with "allow filtering" > > > Key: CASSANDRA-15775 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15775 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Local Write-Read Paths >Reporter: Christian Fredriksson >Priority: Normal > > Problem: We have inexperienced developers not following guidelines or best > pratices who do queries with "allow filtering" which have negative impact on > performance on other queries and developers. > It would be beneficial to have a (server side) configuration to disallow > these queries altogether. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15803) Separate out allow filtering scanning through a partition versus scanning over the table
Jeremy Hanna created CASSANDRA-15803: Summary: Separate out allow filtering scanning through a partition versus scanning over the table Key: CASSANDRA-15803 URL: https://issues.apache.org/jira/browse/CASSANDRA-15803 Project: Cassandra Issue Type: Improvement Components: CQL/Syntax Reporter: Jeremy Hanna Currently allow filtering can mean two things in the spirit of "avoid operations that don't seek to a specific row or sequential rows of data." First, it can mean scanning across the entire table to meet the criteria of the query. That's almost always a bad thing and should be discouraged or disabled (see CASSANDRA-8303). Second, it can mean filtering within a specific partition. For example, in a query you could specify the full partition key and if you specify a criterion on a non-key field, it requires allow filtering. The second reason to require allow filtering is significantly less work to scan through a partition. It is still extra work over seeking to a specific row and getting N sequential rows though. So while an application developer and/or operator needs to be cautious about this second type, it's not necessarily a bad thing, depending on the table and the use case. I propose that we separate the way to specify allow filtering across an entire table (involving a scatter gather) from specifying allow filtering across a partition in a backwards compatible way. One idea that was brought up in Slack in the cassandra-dev room was to have allow filtering mean the superset - scanning across the table. Then if you want to specify that you *only* want to scan within a partition. So it will succeed if you specify non-key criteria within a single partition, but fail with a message to say it requires the full allow filtering. One way would be to have it be {{ALLOW FILTERING [WITHIN PARTITION]}} This would allow for a backwards compatible full allow filtering while allowing a user to specify that they want to just scan within a partition, but error out if trying to scan a full table. This is potentially also related to the capability limitation framework by which operators could more granularly specify what features are allowed or disallowed per user, discussed in CASSANDRA-8303. This way an operator could disallow the more general allow filtering while allowing the partition scan (or disallow them both at their discretion). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072398#comment-17072398 ] Jeremy Hanna commented on CASSANDRA-13701: -- [~mshuler] What would we need to do to update testing to 16 so that it coincides with the new defaults? > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Jeremy Hanna >Priority: Low > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072390#comment-17072390 ] Jeremy Hanna commented on CASSANDRA-13701: -- As discussed in [this thread|https://lists.apache.org/thread.html/r164d8a4143551b5ef774734afdce0ef31a0e461d71276f8446be%40%3Cdev.cassandra.apache.org%3E] the community decided on a default of 16 for now. I'll assign to myself and put in some release notes about it. At the same time I'll add some documentation for the topic. > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Priority: Low > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-13701) Lower default num_tokens
[ https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna reassigned CASSANDRA-13701: Assignee: Jeremy Hanna > Lower default num_tokens > > > Key: CASSANDRA-13701 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13701 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Chris Lohfink >Assignee: Jeremy Hanna >Priority: Low > > For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not > necessary. It is very expensive for operations processes and scanning. Its > come up a lot and its pretty standard and known now to always reduce the > num_tokens within the community. We should just lower the defaults. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13749) add documentation about upgrade process to docs
[ https://issues.apache.org/jira/browse/CASSANDRA-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071481#comment-17071481 ] Jeremy Hanna commented on CASSANDRA-13749: -- I'm going to take a stab at this as it would be good to get in place with the upcoming 4.0 upgrade testing that people will be doing. > add documentation about upgrade process to docs > --- > > Key: CASSANDRA-13749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13749 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Documentation and Website >Reporter: Jon Haddad >Assignee: Sumanth Pasupuleti >Priority: Normal > Labels: documentation > > The docs don't have any information on how to upgrade. This question gets > asked constantly on the mailing list. > Seems like it belongs under the "Operating Cassandra" section. > https://cassandra.apache.org/doc/latest/operating/index.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-13749) add documentation about upgrade process to docs
[ https://issues.apache.org/jira/browse/CASSANDRA-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna reassigned CASSANDRA-13749: Assignee: Jeremy Hanna (was: Sumanth Pasupuleti) > add documentation about upgrade process to docs > --- > > Key: CASSANDRA-13749 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13749 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Documentation and Website >Reporter: Jon Haddad >Assignee: Jeremy Hanna >Priority: Normal > Labels: documentation > > The docs don't have any information on how to upgrade. This question gets > asked constantly on the mailing list. > Seems like it belongs under the "Operating Cassandra" section. > https://cassandra.apache.org/doc/latest/operating/index.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15522) Update defaults for the validity timeframe of roles, permissions, and credentials for 4.0
[ https://issues.apache.org/jira/browse/CASSANDRA-15522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15522: - Summary: Update defaults for the validity timeframe of roles, permissions, and credentials for 4.0 (was: Update defaults for the validity timeframe of roles, permissions, and credentials) > Update defaults for the validity timeframe of roles, permissions, and > credentials for 4.0 > - > > Key: CASSANDRA-15522 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15522 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Authorization >Reporter: Jeremy Hanna >Priority: Normal > > It's been found that the defaults for \{{roles_validity_in_ms}}, > \{{permissions_validity_in_ms}}, and \{{credentials_validity_in_ms}} have > been too low at 2000 ms or 2 seconds each. As [~alexott] put it in the dev > list discussion about defaults: > {quote}I have seen multiple times when authentication was failing under the > heavy load because queries to system tables were timing out - with these > defaults people may still have the possibility to get updates to > roles/credentials faster when specifying _update_interval_ variants of these > configurations. > {quote} > The suggestion is to set it to 6 (1 minute) or 12 (2 minutes). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15523) Update default snitch from SimpleSnitch to GossipingPropertyFileSnitch
[ https://issues.apache.org/jira/browse/CASSANDRA-15523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15523: - Description: Traditionally the project has had {{SimpleSnitch}} as the default primarily because it makes it easy for a user to download the software and start Cassandra without changing any defaults and it will just work. However, {{SimpleSnitch}} is not datacenter aware. {{GossipingPropertyFileSnitch}} could also be used as the default and would make the default snitch be datacenter aware. Out of the box it will work with the default configuration. This ticket would be to update the default from {{SimpleSnitch}} to {{GossipingPropertyFileSnitch}} to make the onboarding experience better for those who don't know to change this default if they intend to expand to multiple datacenters without too much hassle. was: Traditionally the project has had {{SimpleSnitch}} as the default primarily because it makes it easy for a user to download the software and start Cassandra without changing any defaults and it will just work. However, {{SimpleSnitch}} is not datacenter aware. {{GossipingPropertyFileSnitch}} could also be used as the default and would make the default snitch be datacenter aware. The user simply needs to update the local datacenter and rack names on each node and when it joins the cluster, it will propagate that topology information around the ring. This ticket would be to update the default from {{SimpleSnitch}} to {{GossipingPropertyFileSnitch}} to make the onboarding experience better for those who don't know to change this default if they intend to expand to multiple datacenters without too much hassle. > Update default snitch from SimpleSnitch to GossipingPropertyFileSnitch > -- > > Key: CASSANDRA-15523 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15523 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Jeremy Hanna >Priority: Normal > > Traditionally the project has had {{SimpleSnitch}} as the default primarily > because it makes it easy for a user to download the software and start > Cassandra without changing any defaults and it will just work. However, > {{SimpleSnitch}} is not datacenter aware. {{GossipingPropertyFileSnitch}} > could also be used as the default and would make the default snitch be > datacenter aware. Out of the box it will work with the default configuration. > This ticket would be to update the default from {{SimpleSnitch}} to > {{GossipingPropertyFileSnitch}} to make the onboarding experience better for > those who don't know to change this default if they intend to expand to > multiple datacenters without too much hassle. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15523) Update default snitch from SimpleSnitch to GossipingPropertyFileSnitch
Jeremy Hanna created CASSANDRA-15523: Summary: Update default snitch from SimpleSnitch to GossipingPropertyFileSnitch Key: CASSANDRA-15523 URL: https://issues.apache.org/jira/browse/CASSANDRA-15523 Project: Cassandra Issue Type: Improvement Components: Local/Config Reporter: Jeremy Hanna Traditionally the project has had {{SimpleSnitch}} as the default primarily because it makes it easy for a user to download the software and start Cassandra without changing any defaults and it will just work. However, {{SimpleSnitch}} is not datacenter aware. {{GossipingPropertyFileSnitch}} could also be used as the default and would make the default snitch be datacenter aware. The user simply needs to update the local datacenter and rack names on each node and when it joins the cluster, it will propagate that topology information around the ring. This ticket would be to update the default from {{SimpleSnitch}} to {{GossipingPropertyFileSnitch}} to make the onboarding experience better for those who don't know to change this default if they intend to expand to multiple datacenters without too much hassle. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15522) Update defaults for the validity timeframe of roles, permissions, and credentials
[ https://issues.apache.org/jira/browse/CASSANDRA-15522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15522: - Description: It's been found that the defaults for \{{roles_validity_in_ms}}, \{{permissions_validity_in_ms}}, and \{{credentials_validity_in_ms}} have been too low at 2000 ms or 2 seconds each. As [~alexott] put it in the dev list discussion about defaults: {quote}I have seen multiple times when authentication was failing under the heavy load because queries to system tables were timing out - with these defaults people may still have the possibility to get updates to roles/credentials faster when specifying _update_interval_ variants of these configurations. {quote} The suggestion is to set it to 6 (1 minute) or 12 (2 minutes). was: It's been found that the defaults for \{roles_validity_in_ms}, \{permissions_validity_in_ms}, and \{credentials_validity_in_ms} have been too low at 2000 ms or 2 seconds each. As [~alexott] put it in the dev list discussion about defaults: {quote} I have seen multiple times when authentication was failing under the heavy load because queries to system tables were timing out - with these defaults people may still have the possibility to get updates to roles/credentials faster when specifying _update_interval_ variants of these configurations. {quote} The suggestion is to set it to 6 (1 minute) or 12 (2 minutes). > Update defaults for the validity timeframe of roles, permissions, and > credentials > - > > Key: CASSANDRA-15522 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15522 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Authorization >Reporter: Jeremy Hanna >Priority: Normal > > It's been found that the defaults for \{{roles_validity_in_ms}}, > \{{permissions_validity_in_ms}}, and \{{credentials_validity_in_ms}} have > been too low at 2000 ms or 2 seconds each. As [~alexott] put it in the dev > list discussion about defaults: > {quote}I have seen multiple times when authentication was failing under the > heavy load because queries to system tables were timing out - with these > defaults people may still have the possibility to get updates to > roles/credentials faster when specifying _update_interval_ variants of these > configurations. > {quote} > The suggestion is to set it to 6 (1 minute) or 12 (2 minutes). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15522) Update defaults for the validity timeframe of roles, permissions, and credentials
Jeremy Hanna created CASSANDRA-15522: Summary: Update defaults for the validity timeframe of roles, permissions, and credentials Key: CASSANDRA-15522 URL: https://issues.apache.org/jira/browse/CASSANDRA-15522 Project: Cassandra Issue Type: Improvement Components: Feature/Authorization Reporter: Jeremy Hanna It's been found that the defaults for \{roles_validity_in_ms}, \{permissions_validity_in_ms}, and \{credentials_validity_in_ms} have been too low at 2000 ms or 2 seconds each. As [~alexott] put it in the dev list discussion about defaults: {quote} I have seen multiple times when authentication was failing under the heavy load because queries to system tables were timing out - with these defaults people may still have the possibility to get updates to roles/credentials faster when specifying _update_interval_ variants of these configurations. {quote} The suggestion is to set it to 6 (1 minute) or 12 (2 minutes). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15521) Update default for num_tokens from 256 to something more reasonable
[ https://issues.apache.org/jira/browse/CASSANDRA-15521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15521: - Description: The default for num_tokens or the number of token ranges assigned to a node using virtual nodes is way too high. 256 token ranges makes repair painful. Since it's a default, someone new to Cassandra won't know better and if left unchanged, they will have to live with it or perform a migration to a new datacenter with a lower number. At the same time, going too low with the default allocation algorithm can hotspot nodes to have more tokens assigned than others. There is a new token allocation algorithm introduced but it's not default. The proposal of this ticket is to set the default to something more reasonable to align with best practices without using the new token algorithm or giving it specific token values as some do. 32 is a good compromise and is what the project uses in a lot of the tests that are done. So generally it would be good to move to a more sane value and to align with testing so users are more confident that the defaults have a lot of testing behind them. As discussed on the dev mailing list, we want to make sure this change to the default doesn't come as an unpleasant surprise to cluster operators. For num_tokens specifically, if you were to upgrade to a version with the new default and the user didn't change it to the existing value, the node would not start, saying you can't change the num_tokens on an existing node. So we will want to put a release note to indicate that when upgrading, make a note of the num_tokens change when looking at the new configuration. Along with not being able to start nodes, which is fail-fast, there is the matter of adding new nodes to the cluster. You can certainly add a new node to a cluster or datacenter with a different number of token ranges assigned. It will give that node a different amount of data to be responsible for. For example, if the nodes in a datacenter all have num_tokens=256 (current default) and you add a node to that datacenter with num_tokens=32 (new default), it will only claim 1/8th of the token ranges and data as the other nodes in that datacenter. Fortunately, this is a property that is explicitly defined rather than implicit like some of the table settings. Also most if not all operators will upgrade the existing nodes to that new version before trying to add a node with that new version. So if there is a different number for num_tokens on the existing nodes, they'll be aware of it immediately. In any case, this is a long proposal for what will be a small change in the cassandra.yaml and something in the release notes, that is, changing the default num_tokens value from 256 to 32. was: The default for num_tokens or the number of token ranges assigned to a node using virtual nodes is way too high. 256 token ranges makes repair painful. Since it's a default, someone new to Cassandra won't know better and will have to live with it or perform a migration to a new datacenter with a lower number. At the same time, going too low with the default allocation algorithm can hotspot nodes to have more tokens assigned than others. There is a new token allocation algorithm introduced but it's not default. The proposal of this ticket is to set the default to something more reasonable to align with best practices without using the new token algorithm or giving it specific token values as some do. 32 is a good compromise and is what the project uses in a lot of the tests that are done. So generally it would be good to move to a more sane value and to align with testing so users are more confident that the defaults have a lot of testing behind them. As discussed on the dev mailing list, we want to make sure this change to the default doesn't come as an unpleasant surprise to cluster operators. For num_tokens specifically, if you were to upgrade to a version with the new default and the user didn't change it to the existing value, the node would not start, saying you can't change the num_tokens on an existing node. So we will want to put a release note to indicate that when upgrading, make a note of the num_tokens change when looking at the new configuration. Along with not being able to start nodes, which is fail-fast, there is the matter of adding new nodes to the cluster. You can certainly add a new node to a cluster or datacenter with a different number of token ranges assigned. It will give that node a different amount of data to be responsible for. For example, if the nodes in a datacenter all have num_tokens=256 (current default) and you add a node to that datacenter with num_tokens=32 (new default), it will only claim 1/8th of the token ranges and data as the other nodes in that datacenter. Fortunately, this is a property that is
[jira] [Created] (CASSANDRA-15521) Update default for num_tokens from 256 to something more reasonable
Jeremy Hanna created CASSANDRA-15521: Summary: Update default for num_tokens from 256 to something more reasonable Key: CASSANDRA-15521 URL: https://issues.apache.org/jira/browse/CASSANDRA-15521 Project: Cassandra Issue Type: Improvement Components: Feature/Virtual Nodes Reporter: Jeremy Hanna Assignee: Jeremy Hanna The default for num_tokens or the number of token ranges assigned to a node using virtual nodes is way too high. 256 token ranges makes repair painful. Since it's a default, someone new to Cassandra won't know better and will have to live with it or perform a migration to a new datacenter with a lower number. At the same time, going too low with the default allocation algorithm can hotspot nodes to have more tokens assigned than others. There is a new token allocation algorithm introduced but it's not default. The proposal of this ticket is to set the default to something more reasonable to align with best practices without using the new token algorithm or giving it specific token values as some do. 32 is a good compromise and is what the project uses in a lot of the tests that are done. So generally it would be good to move to a more sane value and to align with testing so users are more confident that the defaults have a lot of testing behind them. As discussed on the dev mailing list, we want to make sure this change to the default doesn't come as an unpleasant surprise to cluster operators. For num_tokens specifically, if you were to upgrade to a version with the new default and the user didn't change it to the existing value, the node would not start, saying you can't change the num_tokens on an existing node. So we will want to put a release note to indicate that when upgrading, make a note of the num_tokens change when looking at the new configuration. Along with not being able to start nodes, which is fail-fast, there is the matter of adding new nodes to the cluster. You can certainly add a new node to a cluster or datacenter with a different number of token ranges assigned. It will give that node a different amount of data to be responsible for. For example, if the nodes in a datacenter all have num_tokens=256 (current default) and you add a node to that datacenter with num_tokens=32 (new default), it will only claim 1/8th of the token ranges and data as the other nodes in that datacenter. Fortunately, this is a property that is explicitly defined rather than implicit like some of the table settings. Also most if not all operators will upgrade the existing nodes to that new version before trying to add a node with that new version. So if there is a different number for num_tokens on the existing nodes, they'll be aware of it immediately. In any case, this is a long proposal for what will be a small change in the cassandra.yaml and something in the release notes, that is, changing the default num_tokens value from 256 to 32. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13019) Improve clearsnapshot to delete the snapshot files slowly
[ https://issues.apache.org/jira/browse/CASSANDRA-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006592#comment-17006592 ] Jeremy Hanna edited comment on CASSANDRA-13019 at 1/2/20 5:18 AM: -- I like the idea to reduce the effect on the regular server operations when performing the snapshot, especially if there is a coordinated snapshot across the cluster. Because it may affect time it takes for operations that call {{snapshot}} indirectly, should we make a note of this in the NEWS.txt - both the availability of the throttle and that it may affect time to run things like {{truncate}} and {{drop}}? was (Author: jeromatron): I like the idea to reduce the effect on the regular server operations when performing the snapshot, especially if there is a coordinated snapshot across the cluster. Because it may affect time it takes for operations that call `snapshot` indirectly, should we make a note of this in the NEWS.txt - both the availability of the throttle and that it may affect time to run things like `truncate` and `drop`? > Improve clearsnapshot to delete the snapshot files slowly > -- > > Key: CASSANDRA-13019 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13019 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Dikang Gu >Assignee: Jeff Jirsa >Priority: Normal > Labels: pull-request-available > Fix For: 4.x > > Time Spent: 2h 10m > Remaining Estimate: 0h > > In our environment, we are creating snapshots for backup, after we finish the > backup, we are running {{clearsnapshot}} to delete the snapshot files. At > that time we may have thousands of files to delete, and it's causing sudden > disk usage spike. As a result, we are experiencing a spike of drop messages > from Cassandra. > I think we should implement something like {{slowrm}} to delete the snapshot > files slowly, avoid the sudden disk usage spike. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13019) Improve clearsnapshot to delete the snapshot files slowly
[ https://issues.apache.org/jira/browse/CASSANDRA-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006592#comment-17006592 ] Jeremy Hanna commented on CASSANDRA-13019: -- I like the idea to reduce the effect on the regular server operations when performing the snapshot, especially if there is a coordinated snapshot across the cluster. Because it may affect time it takes for operations that call `snapshot` indirectly, should we make a note of this in the NEWS.txt - both the availability of the throttle and that it may affect time to run things like `truncate` and `drop`? > Improve clearsnapshot to delete the snapshot files slowly > -- > > Key: CASSANDRA-13019 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13019 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Dikang Gu >Assignee: Jeff Jirsa >Priority: Normal > Labels: pull-request-available > Fix For: 4.x > > Time Spent: 2h 10m > Remaining Estimate: 0h > > In our environment, we are creating snapshots for backup, after we finish the > backup, we are running {{clearsnapshot}} to delete the snapshot files. At > that time we may have thousands of files to delete, and it's causing sudden > disk usage spike. As a result, we are experiencing a spike of drop messages > from Cassandra. > I think we should implement something like {{slowrm}} to delete the snapshot > files slowly, avoid the sudden disk usage spike. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15336) LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException When Running Range Queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15336: - Description: Hi All, This bug is similar to CASSANDRA-15172 but relates specifically to range queries running over range tombstones. *+Steps to Reproduce: +* CREATE KEYSPACE ks1 WITH replication = \{'class': 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = true; +*TABLE:*+ CREATE TABLE ks1.table1 ( col1 text, col2 text, col3 text, col4 text, col5 text, col6 timestamp, data text, PRIMARY KEY ((col1, col2, col3), col4, col5, col6) ); Inserted ~4 million rows and created range tombstones by deleting ~1 million rows. +*Create Data*+ _insert into ks1.table1 (col1, col2 , col3 , col4 , col5 , col6 , data ) VALUES ( '1', '11', '21', '1', 'a', 1231231230, 'data');_ _insert into ks1.table1 (col1, col2 , col3 , col4 , col5 , col6 , data ) VALUES ( '1', '11', '21', '2', 'a', 1231231230, 'data');_ _insert into ks1.table1 (col1, col2 , col3 , col4 , col5 , col6 , data ) VALUES ( '1', '11', '21', '3', 'a', 1231231230, 'data');_ _insert into ks1.table1 (col1, col2 , col3 , col4 , col5 , col6 , data ) VALUES ( '1', '11', '21', '4', 'a', 1231231230, 'data');_ _insert into ks1.table1 (col1, col2 , col3 , col4 , col5 , col6 , data ) VALUES ( '1', '11', '21', '5', 'a', 1231231230, 'data');_ +*Create Range Tombstones*+ delete from ks1.table1 where col1='1' and col2='11' and col3='21' and col4='1'; +*Query Live Rows (no tombstones)*+ _select * from ks1.table1 where col1='1' and col2='201' and col3='21' and col4='1' and col5='a' and *col6>1231231230*;_ No issues found, everything is running properly. +*Query Range Tombstones*+ _select * from ks1.table1 where col1='1' and col2='11' and col3='21' and col4='1' and col5='a' and *col6=1231231230*;_ No issues found, everything is running properly. +BUT when running range queries:+ _select * from ks1.table1 where col1='1' and col2='11' and col3='21' and col4='1' and col5='a' and *col6>1231231220;*_ WARN [ReadStage-1] 2019-09-23 14:17:10,281 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-1,5,main]: {} java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.cassandra.db.AbstractBufferClusteringPrefix.get(AbstractBufferClusteringPrefix.java:55) at org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSizeCompound(LegacyLayout.java:2545) at org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSize(LegacyLayout.java:2522) at org.apache.cassandra.db.LegacyLayout.serializedSizeAsLegacyPartition(LegacyLayout.java:565) at org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:446) at org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:352) at org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:171) at org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:77) at org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:802) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:953) at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:929) at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:62) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:114) at java.lang.Thread.run(Thread.java:745) This WARN is constantly generated until I stop the range queries script. Hope this helps.. Thanks! was: Hi All, This bug is similar to https://issues.apache.org/jira/browse/CASSANDRA-15172 but relates specifically to range queries running over range tombstones. *+Steps to Reproduce: +* CREATE KEYSPACE ks1 WITH replication = \{'class': 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = true; +*TABLE:*+ CREATE TABLE ks1.table1 ( col1 text, col2 text, col3 text, col4 text, col5 text, col6 timestamp, data text, PRIMARY KEY ((col1, col2, col3), col4, col5, col6) ); Inserted ~4 million rows and created range tombstones by deleting ~1 million rows. +*Create Data*+ _insert into ks1.table1 (col1, col2 , col3 , col4 , col5 , col6 , data ) VALUES ( '1', '11', '21', '1', 'a', 1231231230, 'data');_ _insert into ks1.table1 (col1, col2 , col3 , col4 , col5 , col6 , data ) VALUES ( '1', '11',
[jira] [Updated] (CASSANDRA-15322) Partition size virtual table
[ https://issues.apache.org/jira/browse/CASSANDRA-15322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-15322: - Labels: virtual-tables (was: ) > Partition size virtual table > > > Key: CASSANDRA-15322 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15322 > Project: Cassandra > Issue Type: New Feature >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Normal > Labels: virtual-tables > > Virtual table to provide on disk size (local) of a given partition. Useful > for checking for or verifying issues with wide partitions. This is dependent > on the lazy virtual table ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org