Re: TWCS repair and compact help

2021-06-29 Thread Kane Wilson
>
> Oh.  So our data is all messed up now because of the “nodetool compact” I
> ran.
>
>
>
> Hi Erick.  Thanks for the quick reply.
>
>
>
> I just want to be sure about compact.  I saw Cassandra will do compaction
> by itself even when I do not run “nodetool compact” manually (nodetool
> compactionstats always has some compaction running).  So this automatic
> compact by Cassandra will clean up the tombstoned data files?
>

They won't "compact", but will rather just be deleted once all the data in
the file passes its expiration time.


> Another question I have is, is there a way to un-mess my messed up data
> now?
>

Not really. The easiest way would be to re-insert all your data. If you're
not having any read performance issues you might be better just waiting the
7 days until the large SSTable is dropped.

-- 
raft.so - Cassandra consulting, support, and managed services


RE: TWCS repair and compact help

2021-06-29 Thread Eric Wong
Oh.  So our data is all messed up now because of the "nodetool compact" I ran.

Hi Erick.  Thanks for the quick reply.

I just want to be sure about compact.  I saw Cassandra will do compaction by 
itself even when I do not run "nodetool compact" manually (nodetool 
compactionstats always has some compaction running).  So this automatic compact 
by Cassandra will clean up the tombstoned data files?

Another question I have is, is there a way to un-mess my messed up data now?

Thanks,
Eric


From: Erick Ramirez 
Sent: Tuesday, June 29, 2021 6:34 PM
To: user@cassandra.apache.org
Subject: Re: TWCS repair and compact help

You definitely shouldn't perform manual compactions -- you should let the 
normal compaction tasks take care of it. It is unnecessary to manually run 
compactions since it creates more problems than it solves as I've explained in 
this post -- 
https://community.datastax.com/questions/6396/<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommunity.datastax.com%2Fquestions%2F6396%2F=04%7C01%7C%7C752550b0150740d8738008d93ae97621%7C621e62193ff94100b0e6fbe83e87c529%7C1%7C0%7C637605596658021039%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=pGXhTowWOcVC9Ag7zAkT85RuRzUdQFdner747sj737s%3D=0>.
 Cheers!


Re: TWCS repair and compact help

2021-06-29 Thread Gábor Auth
Hi,

On Tue, Jun 29, 2021 at 12:34 PM Erick Ramirez 
wrote:

> You definitely shouldn't perform manual compactions -- you should let the
> normal compaction tasks take care of it. It is unnecessary to manually run
> compactions since it creates more problems than it solves as I've explained
> in this post -- https://community.datastax.com/questions/6396/. Cheers!
>

Same issue here... Iwant to replace SizeTieredCompactionStrategy to
TimeWindowCompactionStrategy but I cannot achieve to split of the existing
SSTables to daily SSTables. Any idea about it? :)

-- 
Bye,
Auth Gábor (https://iotguru.cloud)


Re: TWCS repair and compact help

2021-06-29 Thread Erick Ramirez
You definitely shouldn't perform manual compactions -- you should let the
normal compaction tasks take care of it. It is unnecessary to manually run
compactions since it creates more problems than it solves as I've explained
in this post -- https://community.datastax.com/questions/6396/. Cheers!


TWCS repair and compact help

2021-06-29 Thread Eric Wong
Hi:

We need some help on cassandra repair and compact for a table that uses TWCS.  
We are running cassandra 4.0-rc1.  A database called test_db, biggest table 
"minute_rate", storing time-series data.  It has the following configuration:

CREATE TABLE test_db.minute_rate (
market smallint,
sin bigint,
field smallint,
slot timestamp,
close frozen,
high frozen,
low frozen,
open frozen,
PRIMARY KEY ((market, sin, field), slot)
) WITH CLUSTERING ORDER BY (slot ASC)
AND additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
'compaction_window_size': '4', 'compaction_window_unit': 'HOURS', 
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 604800
AND extensions = {}
AND gc_grace_seconds = 86400
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p';


minute_rate is configured to use TWCS.  We can see that it creates about 8 data 
files per day.  Each data file is relatively small size around 2GB each.  so 
far so good.

However, when "nodetool compact test_db minute_rate" ran on the weekend, it 
seems to consolidate all small data files into one big files.  nodetool compact 
after 2 or 3 weeks we ended up having bigger and bigger data files.  eating up 
all disk space on the machine.

>From what I understand about TWCS, cassandra will simply drop the data files 
>when the records inside the data file is older than default_time_to_live of 
>604800 (7 days).  But somehow this is not what we are seeing.  When I 
>sstablemetadata the oldest data file (sample below), I can see tombstone drop 
>time all got updated.  Resulting in the data file never get removed.

This lead me to think I am configuring things the wrong way.  So I want to know 
when using TWCS, do we need to repair and compact?

I saw in cassandra-reaper (we use reaper for repair) it is configured to skip 
TWCS.

Should I stop running "nodetool compact test_db minute_rate"?  If without 
"nodetool compact", will cassandra clean up the tombstoned data file?

Thanks,
Eric



# sstablemetadata na-6681303-big-Data.db
SSTable: 
/var/lib/cassandra/data/test_db/minute_rate-d7955270f31d11ea88fabb8dcc37b800/na-6681303-big
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.01
Minimum timestamp: 1622528285761740 (06/01/2021 06:18:05)
Maximum timestamp: 1624699757821614 (06/26/2021 09:29:17)
SSTable min local deletion time: 1624767191 (06/27/2021 04:13:11)
SSTable max local deletion time: 2147483647 (no tombstones)
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 0.2753232939720765
TTL min: 0
TTL max: 604800 (7 days)
First token: -9223371506914883753 (3870:16432974:7)
Last token: 9223358359954725918 (505:23187788:6)
minClusteringValues: [2021-04-12T16:54:00.000Z]
maxClusteringValues: [2021-06-26T09:28:00.000Z]
Estimated droppable tombstones: 0.3683520383378445
SSTable Level: 0
Repaired at: 0
Pending repair: --
Replay positions covered: {}
totalColumnsSet: 17179881008
totalRows: 4413102532
Estimated tombstone drop times:
   Drop Time| Count  (%)  Histogram
   1624768080 (06/27/2021 04:28:00) | 2 (  0)
   1624771440 (06/27/2021 05:24:00) |98 (  0)
   1624777020 (06/27/2021 06:57:00) |85 (  0)
   1624781640 (06/27/2021 08:14:00) |74 (  0)
   1624786080 (06/27/2021 09:28:00) |74 (  0)
   1624790280 (06/27/2021 10:38:00) |66 (  0)
   1624794900 (06/27/2021 11:55:00) |87 (  0)
   1624800060 (06/27/2021 13:21:00) |83 (  0)
   1624804680 (06/27/2021 14:38:00) |108064 (  0)
   1624809180 (06/27/2021 15:53:00) |304148 (  0)
   1624812540 (06/27/2021 16:49:00) |133188 (  0)
   1624819440 (06/27/2021 18:44:00) |88 (  0)
   1624824060 (06/27/2021 20:01:00) |73 (  0)
   1624828080 (06/27/2021 21:08:00) |66 (  0)
   1624832520 (06/27/2021 22:22:00) | 1 (  0)
   1624835880 (06/27/2021 23:18:00) |  23578916 (  0) o
   1624839720 (06/28/2021 00:22:00) |  21783899 (  0) o
   1624843740 (06/28/2021 01:29:00) |  22758204 (  0) o
   1624848120 (06/28/2021 02:42:00) |  25237306 (  0) o
   1624853520 (06/28/2021 04:12:00) |  44003185 (  0) O.
   1624858080 (06/28/2021 05:28:00) | 145977595 (  0) O
   1624862460 (06/28/2021 06:41:00) | 331875915 (  1) OOOo
   1624866060 (06/28/2021 07:41:00) | 463284230 (  2) .
   1624869540 (06/28/2021 08:39:00) | 455732185 (  2) 
   1624873140 (06/28/2021 09:39:00) |