TimeWindowCompactionStrategy Operational Concerns

2022-09-15 Thread Michel Barret

Hi,

I want to use TWCS on a cassandra table. Documentation explain 2 
concerns about it:


In case we mix old and new data "in the traditional write path".

=> How can I create another write path to ensure that my old data aren't 
in same sstable than new?



If I query old data and that generate a repair.

=> How can I check this and/or ensure don't trigg repair with query?


Thank you all for your help

Have a nice day



Re: TTL and disk space releasing

2021-10-07 Thread Michel Barret
Hi it's my point as we have only one way to insert data it seems be cool 
to set it by code to allow change it by configuration if need.


I use another way to clean my cluster : I down a node, delete bad 
sstables manually, up & repair the node. And apply it node by node.


Now the nodes don't consumer more data that expected!

Thank you all dears for help

Le 06/10/2021 à 17:59, Jeff Jirsa a écrit :
I think this is a bit extreme. If you know that 100% of all queries that 
write to the table include a TTL, not having a TTL on the table is just 
fine. You just need to ensure that you always write correctly.


On Wed, Oct 6, 2021 at 8:57 AM Bowen Song <mailto:bo...@bso.ng>> wrote:


TWCS without a table TTL is unlikely to work correctly, and adding the
table TTL retrospectively alone is also unlikely to fix the existing
issue. You may need to add the table default TTL and update all
existing
data to reflect the TTL change, and then trigger a major compaction to
update the SSTable files' metadata (specifically, maximum timestamp in
the SSTable and TTL max, which can be used to calculate the safe time
for deleting the entire SSTable file). After all the above is done, you
will need to wait for the table default TTL amount of time before
everything is back to normal. The reason for the waiting time is
because
the major compaction will result in a single SSTable file expiring in
the TTL time, and the SSTable will remain on disk until that amount of
time has passed. So you will need enough disk space for about twice the
amount of data you are expecting to have in that table.

On 06/10/2021 16:34, Michel Barret wrote:
 > Hi, it's not set before. I set it to ensure all data have a ttl.
 >
 > Thanks for your help.
 >
 > Le 06/10/2021 à 13:47, Bowen Song a écrit :
 >> What is the the table's default TTL? (Note: it may be different
than
 >> the TTL of the data in the table)
     >>
 >> On 06/10/2021 09:42, Michel Barret wrote:
 >>> Hello,
 >>>
 >>> I try to use cassandra (3.11.5) with 8 nodes (in single
datacenter).
 >>> I use one simple table, all data are inserted with 31 days TTL
(the
 >>> data are never updated).
 >>>
 >>> I use the TWCS strategy with:
 >>> - 'compaction_window_size': '24'
 >>> - 'compaction_window_unit': 'HOURS'
 >>> - 'max_threshold': '32'
 >>> - 'min_threshold': '4'
 >>>
 >>> Each node run one time by week a 'nodetool repair' and our
 >>> gc_grace_seconds is set to 10 days.
 >>>
 >>> I track the storage of nodes and the partition used for cassandra
 >>> data (only use for this) is consuming to ~40% after one month.
 >>>
 >>> But cassandra consume continuously more space, if I read the
 >>> sstables with sstabledump I find very old tombstones like it :
 >>>
 >>> "liveness_info" : { "tstamp" : "2021-07-26T08:15:00.092897Z",
"ttl"
 >>> : 2678400, "expires_at" : "2021-08-26T08:15:00Z", "expired" :
true }
 >>>
 >>> I don't understand why this tombstone isn't erased. I believe
that I
 >>> apply all I found on internet without improvement.
 >>>
 >>> Anybody had a clue to fix my problem?
 >>>
 >>> Have a nice day



Re: TTL and disk space releasing

2021-10-06 Thread Michel Barret

Hi, it's not set before. I set it to ensure all data have a ttl.

Thanks for your help.

Le 06/10/2021 à 13:47, Bowen Song a écrit :
What is the the table's default TTL? (Note: it may be different than the 
TTL of the data in the table)


On 06/10/2021 09:42, Michel Barret wrote:

Hello,

I try to use cassandra (3.11.5) with 8 nodes (in single datacenter). I 
use one simple table, all data are inserted with 31 days TTL (the data 
are never updated).


I use the TWCS strategy with:
- 'compaction_window_size': '24'
- 'compaction_window_unit': 'HOURS'
- 'max_threshold': '32'
- 'min_threshold': '4'

Each node run one time by week a 'nodetool repair' and our 
gc_grace_seconds is set to 10 days.


I track the storage of nodes and the partition used for cassandra data 
(only use for this) is consuming to ~40% after one month.


But cassandra consume continuously more space, if I read the sstables 
with sstabledump I find very old tombstones like it :


"liveness_info" : { "tstamp" : "2021-07-26T08:15:00.092897Z", "ttl" : 
2678400, "expires_at" : "2021-08-26T08:15:00Z", "expired" : true }


I don't understand why this tombstone isn't erased. I believe that I 
apply all I found on internet without improvement.


Anybody had a clue to fix my problem?

Have a nice day


Re: TTL and disk space releasing

2021-10-06 Thread Michel Barret
Thank you for your pointers. sstablemetadata seem's explain that we have 
data without ttl (= 0) I don't know how can appears in ours system.


- I replace our ttl by query by the default ttl.
- I reduce the gc grace seconds to one day
- I apply the unchecked_tombstone_compaction (on 31 days of data we have 
very few query on data more than 7 days old)


I will manually clean up the old sstable that will never be dropped.

We will seen if it's good for us.

Le 06/10/2021 à 11:58, Paul Chandler a écrit :

Hi Michael,

I have had similar problems in the past, and found this Last Pickle post very 
useful: https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

This should help you pinpoint what is stopping the SSTables being deleted.

Assuming you are never manually deleting records from the table then there is 
no need to have a large gc_grace_seconds, as a large one is there to ensure 
tombstones are replicated correctly, and you won’t have any tombstones to worry 
about.

If you are doing manual deletes, then that could be the cause of the issue, I 
wrote a post here about why that would be an issue: 
http://www.redshots.com/cassandra-twcs-must-have-ttls/

After reading these if you are still having problems please let us know.

Thanks

Paul

  


On 6 Oct 2021, at 09:42, Michel Barret  wrote:

Hello,

I try to use cassandra (3.11.5) with 8 nodes (in single datacenter). I use one 
simple table, all data are inserted with 31 days TTL (the data are never 
updated).

I use the TWCS strategy with:
- 'compaction_window_size': '24'
- 'compaction_window_unit': 'HOURS'
- 'max_threshold': '32'
- 'min_threshold': '4'

Each node run one time by week a 'nodetool repair' and our gc_grace_seconds is 
set to 10 days.

I track the storage of nodes and the partition used for cassandra data (only 
use for this) is consuming to ~40% after one month.

But cassandra consume continuously more space, if I read the sstables with 
sstabledump I find very old tombstones like it :

"liveness_info" : { "tstamp" : "2021-07-26T08:15:00.092897Z", "ttl" : 2678400, "expires_at" : 
"2021-08-26T08:15:00Z", "expired" : true }

I don't understand why this tombstone isn't erased. I believe that I apply all 
I found on internet without improvement.

Anybody had a clue to fix my problem?

Have a nice day




TTL and disk space releasing

2021-10-06 Thread Michel Barret

Hello,

I try to use cassandra (3.11.5) with 8 nodes (in single datacenter). I 
use one simple table, all data are inserted with 31 days TTL (the data 
are never updated).


I use the TWCS strategy with:
- 'compaction_window_size': '24'
- 'compaction_window_unit': 'HOURS'
- 'max_threshold': '32'
- 'min_threshold': '4'

Each node run one time by week a 'nodetool repair' and our 
gc_grace_seconds is set to 10 days.


I track the storage of nodes and the partition used for cassandra data 
(only use for this) is consuming to ~40% after one month.


But cassandra consume continuously more space, if I read the sstables 
with sstabledump I find very old tombstones like it :


"liveness_info" : { "tstamp" : "2021-07-26T08:15:00.092897Z", "ttl" : 
2678400, "expires_at" : "2021-08-26T08:15:00Z", "expired" : true }


I don't understand why this tombstone isn't erased. I believe that I 
apply all I found on internet without improvement.


Anybody had a clue to fix my problem?

Have a nice day


Multiple clustering column and ordering issue.

2016-04-18 Thread Michel
Hello,

We have just started exploring cassandra for our project and learning
basics for designing tables to meet our queries.

We will have a table with following fields :
user_id
list_id
fname
lname
email
AND so on.

We have each user with multiple lists (csv data) so have created table as
below

CREATE TABLE IF NOT EXISTS mykeyspace.table3 (
user_id int,
list_id bigint,
id timeuuid,
fname text,
lname text,
PRIMARY KEY (user_id,list_id,id)
);

With above table we can execute below queries
1) fetch all data of one user_id with "WHERE user_id = ?".
2) fetch all data of one specific list_id of one user with "WHERE user_id =
? AND list_id = ?".

Also using SASI index on fname, lname, email and other fields we can
perform search
3) search from all records of one user_id (regardless of list_id)
4) search from all records of one list_id for one user_id

Now the only issue with this table is ordering of data. With this table we
can not order data by id field as below
WITH CLUSTERING ORDER BY (id DESC);

Records are ordered by list_id by default which is not useful for default
ordering.

Can anyone help here or suggest if we are missing something?

Thanks!


JSON Cassandra 2.2 - insert syntax

2015-06-01 Thread Michel Blase
Hi all,

I'm trying to test the new JSON functionalities in C* 2.2.

I'm using this example:

https://issues.apache.org/jira/browse/CASSANDRA-7970

I believe there is a typo in the CREATE TABLE statement that requires
frozen:

CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext,
frozenaddress);

but my real problem is in the insert syntax. I've found the CQL-2.2
documentation and my best guess is this:

INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home':
{'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
[2101234567]}}};

but I get the error:

SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
message=line 1:23 mismatched input '{'id': 123,'name':
'jbellis','address': {'home': {'street': '123 Cassandra Dr','city':
'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT
INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home':
{'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
[2101234567]}}]};)


Any idea?


Thanks,

Michael


Re: JSON Cassandra 2.2 - insert syntax

2015-06-01 Thread Michel Blase
Thanks Zach,

tried that but I get the same error:

*SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
message=line 1:24 mismatched input '{id: 123,name:
jbellis,address: {home: {street: 123 Cassandra Dr,city:
Austin,zip_code: 78747,phones: [2101234567]}}}' expecting ')' (INSERT
INTO users JSON  ['{id: 123,name: jbellis,address: {home:
{street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones:
[2101234567]}}]}';)*

On Mon, Jun 1, 2015 at 6:12 PM, Zach Kurey zach.ku...@datastax.com wrote:

 Looks like you have your use of single vs. double quotes inverted.  What
 you want is:

 INSERT INTO users JSON  '{id: 123,name: jbellis,address: {home:
 {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones:
 [2101234567]}}}';

 HTH

 On Mon, Jun 1, 2015 at 6:03 PM, Michel Blase mblas...@gmail.com wrote:

 Hi all,

 I'm trying to test the new JSON functionalities in C* 2.2.

 I'm using this example:

 https://issues.apache.org/jira/browse/CASSANDRA-7970

 I believe there is a typo in the CREATE TABLE statement that requires
 frozen:

 CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext,
 frozenaddress);

 but my real problem is in the insert syntax. I've found the CQL-2.2
 documentation and my best guess is this:

 INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home':
 {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
 [2101234567]}}};

 but I get the error:

 SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
 message=line 1:23 mismatched input '{'id': 123,'name':
 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city':
 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT
 INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home':
 {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
 [2101234567]}}]};)


 Any idea?


 Thanks,

 Michael






Re: JSON Cassandra 2.2 - insert syntax

2015-06-01 Thread Michel Blase
Zach,

this is embarrassing.you were right, I was running 2.1

shame on me! but now I'm getting the error:

*InvalidRequest: code=2200 [Invalid query] message=JSON values map
contains unrecognized column: address*
any idea? This is the sequence of commands that I'm running:

CREATE KEYSPACE json WITH REPLICATION = { 'class' :'SimpleStrategy',
'replication_factor' : 1 };

USE json;

CREATE TYPE address (street text,city text,zip_code int,phones settext);

CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext,
frozenaddress);

INSERT INTO users JSON  '{id: 123,name: jbellis,address: {home: {
street: 123 Cassandra Dr,city:Austin,zip_code: 78747,phones: [
2101234567]}}}';


Consider that I'm running a just downloaded C2.2 instance (I'm on a mac)

Thanks and sorry for the waste of time before!






On Mon, Jun 1, 2015 at 7:10 PM, Zach Kurey zach.ku...@datastax.com wrote:

 Hi Michel,

 My only other guess is that you actually are running Cassandra 2.1, since
 thats the exact error I get if I try to execute a JSON statement against a
 version earlier than 2.2.



 On Mon, Jun 1, 2015 at 6:13 PM, Michel Blase mblas...@gmail.com wrote:

 Thanks Zach,

 tried that but I get the same error:

 *SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
 message=line 1:24 mismatched input '{id: 123,name:
 jbellis,address: {home: {street: 123 Cassandra Dr,city:
 Austin,zip_code: 78747,phones: [2101234567]}}}' expecting ')' (INSERT
 INTO users JSON  ['{id: 123,name: jbellis,address: {home:
 {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones:
 [2101234567]}}]}';)*

 On Mon, Jun 1, 2015 at 6:12 PM, Zach Kurey zach.ku...@datastax.com
 wrote:

 Looks like you have your use of single vs. double quotes inverted.  What
 you want is:

 INSERT INTO users JSON  '{id: 123,name: jbellis,address: {home:
 {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,
 phones: [2101234567]}}}';

 HTH

 On Mon, Jun 1, 2015 at 6:03 PM, Michel Blase mblas...@gmail.com wrote:

 Hi all,

 I'm trying to test the new JSON functionalities in C* 2.2.

 I'm using this example:

 https://issues.apache.org/jira/browse/CASSANDRA-7970

 I believe there is a typo in the CREATE TABLE statement that requires
 frozen:

 CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext,
 frozenaddress);

 but my real problem is in the insert syntax. I've found the CQL-2.2
 documentation and my best guess is this:

 INSERT INTO users JSON {'id': 123,'name': 'jbellis','address':
 {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code':
 78747,'phones': [2101234567]}}};

 but I get the error:

 SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
 message=line 1:23 mismatched input '{'id': 123,'name':
 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city':
 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT
 INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home':
 {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
 [2101234567]}}]};)


 Any idea?


 Thanks,

 Michael