TimeWindowCompactionStrategy Operational Concerns
Hi, I want to use TWCS on a cassandra table. Documentation explain 2 concerns about it: In case we mix old and new data "in the traditional write path". => How can I create another write path to ensure that my old data aren't in same sstable than new? If I query old data and that generate a repair. => How can I check this and/or ensure don't trigg repair with query? Thank you all for your help Have a nice day
Re: TTL and disk space releasing
Hi it's my point as we have only one way to insert data it seems be cool to set it by code to allow change it by configuration if need. I use another way to clean my cluster : I down a node, delete bad sstables manually, up & repair the node. And apply it node by node. Now the nodes don't consumer more data that expected! Thank you all dears for help Le 06/10/2021 à 17:59, Jeff Jirsa a écrit : I think this is a bit extreme. If you know that 100% of all queries that write to the table include a TTL, not having a TTL on the table is just fine. You just need to ensure that you always write correctly. On Wed, Oct 6, 2021 at 8:57 AM Bowen Song <mailto:bo...@bso.ng>> wrote: TWCS without a table TTL is unlikely to work correctly, and adding the table TTL retrospectively alone is also unlikely to fix the existing issue. You may need to add the table default TTL and update all existing data to reflect the TTL change, and then trigger a major compaction to update the SSTable files' metadata (specifically, maximum timestamp in the SSTable and TTL max, which can be used to calculate the safe time for deleting the entire SSTable file). After all the above is done, you will need to wait for the table default TTL amount of time before everything is back to normal. The reason for the waiting time is because the major compaction will result in a single SSTable file expiring in the TTL time, and the SSTable will remain on disk until that amount of time has passed. So you will need enough disk space for about twice the amount of data you are expecting to have in that table. On 06/10/2021 16:34, Michel Barret wrote: > Hi, it's not set before. I set it to ensure all data have a ttl. > > Thanks for your help. > > Le 06/10/2021 à 13:47, Bowen Song a écrit : >> What is the the table's default TTL? (Note: it may be different than >> the TTL of the data in the table) >> >> On 06/10/2021 09:42, Michel Barret wrote: >>> Hello, >>> >>> I try to use cassandra (3.11.5) with 8 nodes (in single datacenter). >>> I use one simple table, all data are inserted with 31 days TTL (the >>> data are never updated). >>> >>> I use the TWCS strategy with: >>> - 'compaction_window_size': '24' >>> - 'compaction_window_unit': 'HOURS' >>> - 'max_threshold': '32' >>> - 'min_threshold': '4' >>> >>> Each node run one time by week a 'nodetool repair' and our >>> gc_grace_seconds is set to 10 days. >>> >>> I track the storage of nodes and the partition used for cassandra >>> data (only use for this) is consuming to ~40% after one month. >>> >>> But cassandra consume continuously more space, if I read the >>> sstables with sstabledump I find very old tombstones like it : >>> >>> "liveness_info" : { "tstamp" : "2021-07-26T08:15:00.092897Z", "ttl" >>> : 2678400, "expires_at" : "2021-08-26T08:15:00Z", "expired" : true } >>> >>> I don't understand why this tombstone isn't erased. I believe that I >>> apply all I found on internet without improvement. >>> >>> Anybody had a clue to fix my problem? >>> >>> Have a nice day
Re: TTL and disk space releasing
Hi, it's not set before. I set it to ensure all data have a ttl. Thanks for your help. Le 06/10/2021 à 13:47, Bowen Song a écrit : What is the the table's default TTL? (Note: it may be different than the TTL of the data in the table) On 06/10/2021 09:42, Michel Barret wrote: Hello, I try to use cassandra (3.11.5) with 8 nodes (in single datacenter). I use one simple table, all data are inserted with 31 days TTL (the data are never updated). I use the TWCS strategy with: - 'compaction_window_size': '24' - 'compaction_window_unit': 'HOURS' - 'max_threshold': '32' - 'min_threshold': '4' Each node run one time by week a 'nodetool repair' and our gc_grace_seconds is set to 10 days. I track the storage of nodes and the partition used for cassandra data (only use for this) is consuming to ~40% after one month. But cassandra consume continuously more space, if I read the sstables with sstabledump I find very old tombstones like it : "liveness_info" : { "tstamp" : "2021-07-26T08:15:00.092897Z", "ttl" : 2678400, "expires_at" : "2021-08-26T08:15:00Z", "expired" : true } I don't understand why this tombstone isn't erased. I believe that I apply all I found on internet without improvement. Anybody had a clue to fix my problem? Have a nice day
Re: TTL and disk space releasing
Thank you for your pointers. sstablemetadata seem's explain that we have data without ttl (= 0) I don't know how can appears in ours system. - I replace our ttl by query by the default ttl. - I reduce the gc grace seconds to one day - I apply the unchecked_tombstone_compaction (on 31 days of data we have very few query on data more than 7 days old) I will manually clean up the old sstable that will never be dropped. We will seen if it's good for us. Le 06/10/2021 à 11:58, Paul Chandler a écrit : Hi Michael, I have had similar problems in the past, and found this Last Pickle post very useful: https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html This should help you pinpoint what is stopping the SSTables being deleted. Assuming you are never manually deleting records from the table then there is no need to have a large gc_grace_seconds, as a large one is there to ensure tombstones are replicated correctly, and you won’t have any tombstones to worry about. If you are doing manual deletes, then that could be the cause of the issue, I wrote a post here about why that would be an issue: http://www.redshots.com/cassandra-twcs-must-have-ttls/ After reading these if you are still having problems please let us know. Thanks Paul On 6 Oct 2021, at 09:42, Michel Barret wrote: Hello, I try to use cassandra (3.11.5) with 8 nodes (in single datacenter). I use one simple table, all data are inserted with 31 days TTL (the data are never updated). I use the TWCS strategy with: - 'compaction_window_size': '24' - 'compaction_window_unit': 'HOURS' - 'max_threshold': '32' - 'min_threshold': '4' Each node run one time by week a 'nodetool repair' and our gc_grace_seconds is set to 10 days. I track the storage of nodes and the partition used for cassandra data (only use for this) is consuming to ~40% after one month. But cassandra consume continuously more space, if I read the sstables with sstabledump I find very old tombstones like it : "liveness_info" : { "tstamp" : "2021-07-26T08:15:00.092897Z", "ttl" : 2678400, "expires_at" : "2021-08-26T08:15:00Z", "expired" : true } I don't understand why this tombstone isn't erased. I believe that I apply all I found on internet without improvement. Anybody had a clue to fix my problem? Have a nice day
TTL and disk space releasing
Hello, I try to use cassandra (3.11.5) with 8 nodes (in single datacenter). I use one simple table, all data are inserted with 31 days TTL (the data are never updated). I use the TWCS strategy with: - 'compaction_window_size': '24' - 'compaction_window_unit': 'HOURS' - 'max_threshold': '32' - 'min_threshold': '4' Each node run one time by week a 'nodetool repair' and our gc_grace_seconds is set to 10 days. I track the storage of nodes and the partition used for cassandra data (only use for this) is consuming to ~40% after one month. But cassandra consume continuously more space, if I read the sstables with sstabledump I find very old tombstones like it : "liveness_info" : { "tstamp" : "2021-07-26T08:15:00.092897Z", "ttl" : 2678400, "expires_at" : "2021-08-26T08:15:00Z", "expired" : true } I don't understand why this tombstone isn't erased. I believe that I apply all I found on internet without improvement. Anybody had a clue to fix my problem? Have a nice day
Multiple clustering column and ordering issue.
Hello, We have just started exploring cassandra for our project and learning basics for designing tables to meet our queries. We will have a table with following fields : user_id list_id fname lname email AND so on. We have each user with multiple lists (csv data) so have created table as below CREATE TABLE IF NOT EXISTS mykeyspace.table3 ( user_id int, list_id bigint, id timeuuid, fname text, lname text, PRIMARY KEY (user_id,list_id,id) ); With above table we can execute below queries 1) fetch all data of one user_id with "WHERE user_id = ?". 2) fetch all data of one specific list_id of one user with "WHERE user_id = ? AND list_id = ?". Also using SASI index on fname, lname, email and other fields we can perform search 3) search from all records of one user_id (regardless of list_id) 4) search from all records of one list_id for one user_id Now the only issue with this table is ordering of data. With this table we can not order data by id field as below WITH CLUSTERING ORDER BY (id DESC); Records are ordered by list_id by default which is not useful for default ordering. Can anyone help here or suggest if we are missing something? Thanks!
JSON Cassandra 2.2 - insert syntax
Hi all, I'm trying to test the new JSON functionalities in C* 2.2. I'm using this example: https://issues.apache.org/jira/browse/CASSANDRA-7970 I believe there is a typo in the CREATE TABLE statement that requires frozen: CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext, frozenaddress); but my real problem is in the insert syntax. I've found the CQL-2.2 documentation and my best guess is this: INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}; but I get the error: SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message=line 1:23 mismatched input '{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}]};) Any idea? Thanks, Michael
Re: JSON Cassandra 2.2 - insert syntax
Thanks Zach, tried that but I get the same error: *SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message=line 1:24 mismatched input '{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [2101234567]}}}' expecting ')' (INSERT INTO users JSON ['{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [2101234567]}}]}';)* On Mon, Jun 1, 2015 at 6:12 PM, Zach Kurey zach.ku...@datastax.com wrote: Looks like you have your use of single vs. double quotes inverted. What you want is: INSERT INTO users JSON '{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [2101234567]}}}'; HTH On Mon, Jun 1, 2015 at 6:03 PM, Michel Blase mblas...@gmail.com wrote: Hi all, I'm trying to test the new JSON functionalities in C* 2.2. I'm using this example: https://issues.apache.org/jira/browse/CASSANDRA-7970 I believe there is a typo in the CREATE TABLE statement that requires frozen: CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext, frozenaddress); but my real problem is in the insert syntax. I've found the CQL-2.2 documentation and my best guess is this: INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}; but I get the error: SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message=line 1:23 mismatched input '{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}]};) Any idea? Thanks, Michael
Re: JSON Cassandra 2.2 - insert syntax
Zach, this is embarrassing.you were right, I was running 2.1 shame on me! but now I'm getting the error: *InvalidRequest: code=2200 [Invalid query] message=JSON values map contains unrecognized column: address* any idea? This is the sequence of commands that I'm running: CREATE KEYSPACE json WITH REPLICATION = { 'class' :'SimpleStrategy', 'replication_factor' : 1 }; USE json; CREATE TYPE address (street text,city text,zip_code int,phones settext); CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext, frozenaddress); INSERT INTO users JSON '{id: 123,name: jbellis,address: {home: { street: 123 Cassandra Dr,city:Austin,zip_code: 78747,phones: [ 2101234567]}}}'; Consider that I'm running a just downloaded C2.2 instance (I'm on a mac) Thanks and sorry for the waste of time before! On Mon, Jun 1, 2015 at 7:10 PM, Zach Kurey zach.ku...@datastax.com wrote: Hi Michel, My only other guess is that you actually are running Cassandra 2.1, since thats the exact error I get if I try to execute a JSON statement against a version earlier than 2.2. On Mon, Jun 1, 2015 at 6:13 PM, Michel Blase mblas...@gmail.com wrote: Thanks Zach, tried that but I get the same error: *SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message=line 1:24 mismatched input '{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [2101234567]}}}' expecting ')' (INSERT INTO users JSON ['{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [2101234567]}}]}';)* On Mon, Jun 1, 2015 at 6:12 PM, Zach Kurey zach.ku...@datastax.com wrote: Looks like you have your use of single vs. double quotes inverted. What you want is: INSERT INTO users JSON '{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747, phones: [2101234567]}}}'; HTH On Mon, Jun 1, 2015 at 6:03 PM, Michel Blase mblas...@gmail.com wrote: Hi all, I'm trying to test the new JSON functionalities in C* 2.2. I'm using this example: https://issues.apache.org/jira/browse/CASSANDRA-7970 I believe there is a typo in the CREATE TABLE statement that requires frozen: CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext, frozenaddress); but my real problem is in the insert syntax. I've found the CQL-2.2 documentation and my best guess is this: INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}; but I get the error: SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message=line 1:23 mismatched input '{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}]};) Any idea? Thanks, Michael