RE: TWCS: Repair create new buckets with old data

2018-10-18 Thread wxn...@zjqunshuo.com
> Is the repair not necessary to get data files remove from filesystem ? 
The answer is no. IMO, Cassandra will remove sstable files automatically if it 
can make sure the sstable files are 100% of tombstones and safe to do deletion. 
If you use TWCS and you have only insertion and no update, you don't need run 
repair manually.

-Simon
 
From: Caesar, Maik
Date: 2018-10-18 20:30
To: user@cassandra.apache.org
Subject: RE: TWCS: Repair create new buckets with old data
Hello Simon,
Is the repair not necessary to get data files remove from filesystem ? My 
assumption was, that only repaired data will removed after TTL is reached.
 
Regards
Maik
 
From: wxn...@zjqunshuo.com [mailto:wxn...@zjqunshuo.com] 
Sent: Mittwoch, 17. Oktober 2018 09:02
To: user 
Subject: Re: TWCS: Repair create new buckets with old data
 
Hi Maik,
IMO, when using TWCS, you had better not run repair. The behaviour of TWCS is 
same with STCS for repair when merging sstables and the result is leaving 
sstables spanning multiple time buckets, but maybe I'm wrong. In my use case, I 
don't do repair with table using TWCS.
 
-Simon
 
From: Caesar, Maik
Date: 2018-10-16 17:46
To: user@cassandra.apache.org
Subject: TWCS: Repair create new buckets with old data
Hallo,
we work with Cassandra version 3.0.9 and have a problem in a table with TWCS. 
The command “nodetool repair” create always new files with old data. This avoid 
the delete of the old data.
The layout of the Table is following:
cqlsh> desc stat.spa
 
CREATE TABLE stat.spa (
region int,
id int,
date text,
hour int,
zippedjsonstring blob,
PRIMARY KEY ((region, id), date, hour)
) WITH CLUSTERING ORDER BY (date ASC, hour ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 
'max_threshold': '100', 'min_threshold': '4', 'tombstone_compaction_interval': 
'86460'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
 
Actual the oldest data are from 2017/04/15 and will not remove:
 
$ for f in *Data.db; do meta=$(sudo sstablemetadata $f); echo -e "Max:" $(date 
--date=@$(echo "$meta" | grep Maximum\ time | cut -d" "  -f3| cut -c 1-10) 
'+%Y/%m/%d %H:%M') "Min:" $(date --date=@$(echo "$meta" | grep Minimum\ time | 
cut -d" "  -f3| cut -c 1-10) '+%Y/%m/%d %H:%M') $(echo "$meta" | grep 
droppable) $(echo "$meta" | grep "Repaired at") ' \t ' $(ls -lh $f | awk 
'{print $5" "$6" "$7" "$8" "$9}'); done | sort
Max: 2017/04/15 12:08 Min: 2017/03/31 13:09 Estimated droppable tombstones: 
1.7731048805815162 Repaired at: 1525685601400 42K May 7 19:56 
mc-22922-big-Data.db
Max: 2017/04/17 13:49 Min: 2017/03/31 13:09 Estimated droppable tombstones: 
1.9600207684319835 Repaired at: 1525685601400 116M May 7 13:31 
mc-15096-big-Data.db
Max: 2017/04/21 13:43 Min: 2017/04/15 13:34 Estimated droppable tombstones: 
1.9090909090909092 Repaired at: 1525685601400 11K May 7 19:56 
mc-22921-big-Data.db
Max: 2017/05/23 21:45 Min: 2017/04/21 14:00 Estimated droppable tombstones: 
1.8360655737704918 Repaired at: 1525685601400 21M May 7 19:56 
mc-22919-big-Data.db
Max: 2017/06/12 15:19 Min: 2017/04/25 14:45 Estimated droppable tombstones: 
1.8091397849462365 Repaired at: 1525685601400 19M May 7 14:36 
mc-17095-big-Data.db
Max: 2017/06/15 15:26 Min: 2017/05/10 14:37 Estimated droppable tombstones: 
1.76536312849162 Repaired at: 1529612605539   9.3M Jun 21 22:31 
mc-25372-big-Data.db
…
 
After a „nodetool repair“ run, a new big data file is created that include old 
data from 2017/07/31.
 
Max: 2018/07/27 18:10 Min: 2017/03/31 13:13 Estimated droppable tombstones: 
0.08392555471691247 Repaired at: 011G Sep 11 22:02 
mc-39281-big-Data.db
…
Max: 2018/08/16 18:18 Min: 2018/08/06 12:19 Estimated droppable tombstones: 0.0 
Repaired at: 1534525730510123M Aug 17 23:46 mc-36847-big-Data.db
Max: 2018/08/17 19:20 Min: 2017/07/31 12:04 Estimated droppable tombstones: 
0.03385963490004347 Repaired at: 011G Sep 11 21:43 
mc-39265-big-Data.db
Max: 2018/08/17 19:20 Min: 2018/07/24 12:33 Estimated droppable tombstones: 0.0 
Repaired at: 1534525730510135M Sep 11 21:44 mc-39270-big-Data.db
…
Max: 2018/09/06 17:30 Min: 2018/08/28 12:17 Estimated droppable tombstones: 0.0 
Repaired at: 1536690786879129M Sep 11 21:10 mc-39238-big-Data.db
Max: 2018/09/07 18:22 Min: 2017/04/23 12:48 Estimated droppable tombstones: 

[ANNOUNCE] StratIO's Lucene plugin fork

2018-10-18 Thread kurt greaves
Hi all,

We've had confirmation from Stratio that they are no longer maintaining
their Lucene plugin for Apache Cassandra. We've thus decided to fork the
plugin to continue maintaining it. At this stage we won't be making any
additions to the plugin in the short term unless absolutely necessary, and
as 4.0 nears we'll begin making it compatible with the new major release.
We plan on taking the existing PR's and issues from the Stratio repository
and getting them merged/resolved, however this likely won't happen until
early next year. Having said that, we welcome all contributions and will
dedicate time to reviewing bugs in the current versions if people lodge
them and can help.

I'll note that this is new ground for us, we don't have much existing
knowledge of the plugin but are determined to learn. If anyone out there
has established knowledge about the plugin we'd be grateful for any
assistance!

You can find our fork here:
https://github.com/instaclustr/cassandra-lucene-index
At the moment, the only difference is that there is a 3.11.3 branch which
just has some minor changes to dependencies to better support 3.11.3.

Cheers,
Kurt


TWCS: Repair create new buckets with old data

2018-10-18 Thread Sri Rathan Rangisetti
Hi Maik,

Yes when u have old and new data mixed together the old SStable will not be
dropped until new SStable is fully expired.

There are couple of ways for you to reclaim the storage,

1.) If this is one time thing probably, you can manually run some commands
which will rewrite sstables like, nodetool compact, scrub or
garbagecollect

2.) If u think this would be recurring probably u should set
unchecked_tombstone_compaction to true default is false

unchecked_tombstone_compaction (default: false)
The single sstable compaction has quite strict checks for whether it should
be started, this option disables those checks and for some use cases this
might be needed. Note that this does not change anything for the actual
compaction, tombstones are only dropped if it is safe to do so - it might
just rewrite an sstable without being able to drop any tombstones.

In both cases it will trigger numerous compactions make sure u have enough
i/o or throttle ur compaction threads

If u r going to make DDL change - option 2, would recommend u to go through
some more info on how TWCS wrks

http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html


Thanks
Sri Rathan

Hallo,
>
> we work with Cassandra version 3.0.9 and have a problem in a table with
> TWCS. The command “nodetool repair” create always new files with old data.
> This avoid the delete of the old data.
>
> The layout of the Table is following:
>
> cqlsh> desc stat.spa
>
>
>
> CREATE TABLE stat.spa (
>
> region int,
>
> id int,
>
> date text,
>
> hour int,
>
> zippedjsonstring blob,
>
> PRIMARY KEY ((region, id), date, hour)
>
> ) WITH CLUSTERING ORDER BY (date ASC, hour ASC)
>
> AND bloom_filter_fp_chance = 0.01
>
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
> AND comment = ''
>
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
> 'max_threshold': '100', 'min_threshold': '4',
> 'tombstone_compaction_interval': '86460'}
>
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
> AND crc_check_chance = 1.0
>
> AND dclocal_read_repair_chance = 0.0
>
> AND default_time_to_live = 0
>
> AND gc_grace_seconds = 864000
>
> AND max_index_interval = 2048
>
> AND memtable_flush_period_in_ms = 0
>
> AND min_index_interval = 128
>
> AND read_repair_chance = 0.0
>
> AND speculative_retry = '99PERCENTILE';
>
>
>
> Actual the oldest data are from 2017/04/15 and will not remove:
>
>
>
> $ for f in *Data.db; do meta=$(sudo sstablemetadata $f); echo -e "Max:"
> $(date --date=@$(echo "$meta" | grep Maximum\ time | cut -d" "  -f3| cut -c
> 1-10) '+%Y/%m/%d %H:%M') "Min:" $(date --date=@$(echo "$meta" | grep
> Minimum\ time | cut -d" "  -f3| cut -c 1-10) '+%Y/%m/%d %H:%M') $(echo
> "$meta" | grep droppable) $(echo "$meta" | grep "Repaired at") ' \t ' $(ls
> -lh $f | awk '{print $5" "$6" "$7" "$8" "$9}'); done | sort
>
> Max: 2017/04/15 12:08 Min: 2017/03/31 13:09 Estimated droppable
> tombstones: 1.7731048805815162 Repaired at: 1525685601400 42K May 7
> 19:56 mc-22922-big-Data.db
>
> Max: 2017/04/17 13:49 Min: 2017/03/31 13:09 Estimated droppable
> tombstones: 1.9600207684319835 Repaired at: 1525685601400 116M May
> 7 13:31 mc-15096-big-Data.db
>
> Max: 2017/04/21 13:43 Min: 2017/04/15 13:34 Estimated droppable
> tombstones: 1.9090909090909092 Repaired at: 1525685601400 11K May 7
> 19:56 mc-22921-big-Data.db
>
> Max: 2017/05/23 21:45 Min: 2017/04/21 14:00 Estimated droppable
> tombstones: 1.8360655737704918 Repaired at: 1525685601400 21M May 7
> 19:56 mc-22919-big-Data.db
>
> Max: 2017/06/12 15:19 Min: 2017/04/25 14:45 Estimated droppable
> tombstones: 1.8091397849462365 Repaired at: 1525685601400 19M May 7
> 14:36 mc-17095-big-Data.db
>
> Max: 2017/06/15 15:26 Min: 2017/05/10 14:37 Estimated droppable
> tombstones: 1.76536312849162 Repaired at: 1529612605539   9.3M Jun
> 21 22:31 mc-25372-big-Data.db
>
> …
>
>
>
> After a „nodetool repair“ run, a new big data file is created that include
> old data from 2017/07/31.
>
>
>
> Max: 2018/07/27 18:10 Min: 2017/03/31 13:13 Estimated droppable
> tombstones: 0.08392555471691247 Repaired at: 011G Sep 11 22:02
> mc-39281-big-Data.db
>
> …
>
> Max: 2018/08/16 18:18 Min: 2018/08/06 12:19 Estimated droppable
> tombstones: 0.0 Repaired at: 1534525730510123M Aug 17 23:46
> mc-36847-big-Data.db
>
> Max: 2018/08/17 19:20 Min: 2017/07/31 12:04 Estimated droppable
> tombstones: 0.03385963490004347 Repaired at: 011G Sep 11
> 21:43 mc-39265-big-Data.db
>
> Max: 2018/08/17 19:20 Min: 2018/07/24 12:33 Estimated droppable
> tombstones: 0.0 Repaired at: 1534525730510135M Sep 11 21:44
> mc-39270-big-Data.db
>
> …
>
> Max: 2018/09/06 17:30 Min: 2018/08/28 12:17 Estimated droppable
> tombstones: 0.0 

RE: TWCS: Repair create new buckets with old data

2018-10-18 Thread Caesar, Maik
Hello Simon,
Is the repair not necessary to get data files remove from filesystem ? My 
assumption was, that only repaired data will removed after TTL is reached.

Regards
Maik

From: wxn...@zjqunshuo.com [mailto:wxn...@zjqunshuo.com]
Sent: Mittwoch, 17. Oktober 2018 09:02
To: user 
Subject: Re: TWCS: Repair create new buckets with old data

Hi Maik,
IMO, when using TWCS, you had better not run repair. The behaviour of TWCS is 
same with STCS for repair when merging sstables and the result is leaving 
sstables spanning multiple time buckets, but maybe I'm wrong. In my use case, I 
don't do repair with table using TWCS.

-Simon

From: Caesar, Maik
Date: 2018-10-16 17:46
To: user@cassandra.apache.org
Subject: TWCS: Repair create new buckets with old data
Hallo,
we work with Cassandra version 3.0.9 and have a problem in a table with TWCS. 
The command “nodetool repair” create always new files with old data. This avoid 
the delete of the old data.
The layout of the Table is following:
cqlsh> desc stat.spa

CREATE TABLE stat.spa (
region int,
id int,
date text,
hour int,
zippedjsonstring blob,
PRIMARY KEY ((region, id), date, hour)
) WITH CLUSTERING ORDER BY (date ASC, hour ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 
'max_threshold': '100', 'min_threshold': '4', 'tombstone_compaction_interval': 
'86460'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

Actual the oldest data are from 2017/04/15 and will not remove:

$ for f in *Data.db; do meta=$(sudo sstablemetadata $f); echo -e "Max:" $(date 
--date=@$(echo "$meta" | grep Maximum\ time | cut -d" "  -f3| cut -c 1-10) 
'+%Y/%m/%d %H:%M') "Min:" $(date --date=@$(echo "$meta" | grep Minimum\ time | 
cut -d" "  -f3| cut -c 1-10) '+%Y/%m/%d %H:%M') $(echo "$meta" | grep 
droppable) $(echo "$meta" | grep "Repaired at") ' \t ' $(ls -lh $f | awk 
'{print $5" "$6" "$7" "$8" "$9}'); done | sort
Max: 2017/04/15 12:08 Min: 2017/03/31 13:09 Estimated droppable tombstones: 
1.7731048805815162 Repaired at: 1525685601400 42K May 7 19:56 
mc-22922-big-Data.db
Max: 2017/04/17 13:49 Min: 2017/03/31 13:09 Estimated droppable tombstones: 
1.9600207684319835 Repaired at: 1525685601400 116M May 7 13:31 
mc-15096-big-Data.db
Max: 2017/04/21 13:43 Min: 2017/04/15 13:34 Estimated droppable tombstones: 
1.9090909090909092 Repaired at: 1525685601400 11K May 7 19:56 
mc-22921-big-Data.db
Max: 2017/05/23 21:45 Min: 2017/04/21 14:00 Estimated droppable tombstones: 
1.8360655737704918 Repaired at: 1525685601400 21M May 7 19:56 
mc-22919-big-Data.db
Max: 2017/06/12 15:19 Min: 2017/04/25 14:45 Estimated droppable tombstones: 
1.8091397849462365 Repaired at: 1525685601400 19M May 7 14:36 
mc-17095-big-Data.db
Max: 2017/06/15 15:26 Min: 2017/05/10 14:37 Estimated droppable tombstones: 
1.76536312849162 Repaired at: 1529612605539   9.3M Jun 21 22:31 
mc-25372-big-Data.db
…

After a „nodetool repair“ run, a new big data file is created that include old 
data from 2017/07/31.

Max: 2018/07/27 18:10 Min: 2017/03/31 13:13 Estimated droppable tombstones: 
0.08392555471691247 Repaired at: 011G Sep 11 22:02 
mc-39281-big-Data.db
…
Max: 2018/08/16 18:18 Min: 2018/08/06 12:19 Estimated droppable tombstones: 0.0 
Repaired at: 1534525730510123M Aug 17 23:46 mc-36847-big-Data.db
Max: 2018/08/17 19:20 Min: 2017/07/31 12:04 Estimated droppable tombstones: 
0.03385963490004347 Repaired at: 011G Sep 11 21:43 
mc-39265-big-Data.db
Max: 2018/08/17 19:20 Min: 2018/07/24 12:33 Estimated droppable tombstones: 0.0 
Repaired at: 1534525730510135M Sep 11 21:44 mc-39270-big-Data.db
…
Max: 2018/09/06 17:30 Min: 2018/08/28 12:17 Estimated droppable tombstones: 0.0 
Repaired at: 1536690786879129M Sep 11 21:10 mc-39238-big-Data.db
Max: 2018/09/07 18:22 Min: 2017/04/23 12:48 Estimated droppable tombstones: 
0.1548442441468401 Repaired at: 0 8.0G Sep 11 21:33 mc-39258-big-Data.db
Max: 2018/09/07 18:22 Min: 2018/09/07 12:15 Estimated droppable tombstones: 0.0 
Repaired at: 153669078687972M Sep 11 21:34 mc-39262-big-Data.db
Max: 2018/09/08 18:20 Min: 2018/08/22 12:17 Estimated droppable tombstones: 0.0 
Repaired at: 02.8G Sep 11 21:47 mc-39272-big-Data.db

The tool sstableexpiredblockers shows that the 

Re: Upgrade to version 3

2018-10-18 Thread Alain RODRIGUEZ
Hello,

You might want to have a look at
https://issues.apache.org/jira/browse/CASSANDRA-14823

It seems that you could face a *data loss* while upgrading to Cassandra
3.11.3. Apparently, it is still somewhat unsafe to upgrade Cassandra to
C*3, even if you use the latest C*3.0.17/3.11.3. According to Blake who
reported and worked on the fix:

[...], which will lead to duplicate start bounds being emitted, and
> incorrect dropping of rows in some cases
>

It's a bug that was recently fixed and that should be soon released, I hope.

I imagine that a lot of people did this upgrade already, it might be just
fine for you as well. Yet you might want to explore this issue and maybe
consider to wait for this patch to be released to reduce the risks (or
apply this patch yourself).

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le jeu. 18 oct. 2018 à 12:31, Anup Shirolkar 
a écrit :

> Hi,
>
> Yes you can upgrade from 2.2 to 3.11.3
>
> The steps for upgrade are there on lots of blogs and sites.
>
>  You can follow:
>
> https://myopsblog.wordpress.com/2017/12/04/upgrade-cassandra-cluster-from-2-x-to-3-x/
>
> You should read the NEWS.txt for information on any release while planning
> for upgrade.
> https://github.com/apache/cassandra/blob/trunk/NEWS.txt
>
> Please see below mail archive for your case of 2.2 to 3.x :
> https://www.mail-archive.com/user@cassandra.apache.org/msg45381.html
>
> Regards,
>
> Anup Shirolkar
>
>
>
>
> On Thu, 18 Oct 2018 at 09:30, Mun Dega  wrote:
>
>> Hello,
>>
>> If we are upgrading from version 2.2 to 3.x, should we go directly to
>> latest version 3.11.3?
>>
>> Anything we need to look out for?  If anyone can point to an upgrade
>> process that would be great!
>>
>>
>>