Re: TWCS: Repair create new buckets with old data

Jonathan Haddad Wed, 24 Oct 2018 11:43:20 -0700

Hey Meg, a couple thoughts.

>   Set a table level TTL with TWCS, and stop setting it with
inserts/updates (insert TTL overrides table level TTL). So, that your
entire sstable expires at the same time, as opposed to each insert expiring
at its own pace. So that for tombstone clean up, the system can just drop
the entire sstable at once.


Setting the TTL on a table or the query gives you the same exact result.
Setting it on the table is just there for convenience.  If it's not set at
the query level, it uses the default value.
See org.apache.cassandra.cql3.Attributes#getTimeToLive.  Generally speaking
I'd rather set it at the table level as well, but it's just to avoid weird
application bugs, not as an optimization.

>   I’d suggest removing the -pr. Running incremental repair with TWCS is
better.

If incremental repair worked correctly I would agree with you, but
unfortunately it doesn't.  Incremental repair has some subtle bugs that can
result in massive overstreaming due to the fact that it will sometimes not
correctly mark data as repaired.  My coworker Alex wrote up a good summary
on the changes to incremental going into 4.0 to fix these issues, it's
worth a read.
http://thelastpickle.com/blog/2018/09/10/incremental-repair-improvements-in-cassandra-4.html
.

Reaper (OSS, maintained by us @ TLP, see http://cassandra-reaper.io/) has
the ability to schedule subrange repairs on one or more tables, or all
tables except those in a blacklist.  Doing frequent subrange repairs will
limit the amount of data that will get streamed in and should help keep
things pretty consistent unless you're dropping a lot of mutations.  It's
not perfect but should cause less headache than incremental repair will.

Hope this helps.
Jon



On Thu, Oct 25, 2018 at 4:21 AM Meg Mara <mm...@digitalriver.com> wrote:

> Hi Maik,
>
>
>
> I have a similar Cassandra env, with similar table requirements. So these
> would be my suggestions:
>
>
>
> ·       Set a table level TTL with TWCS, and stop setting it with
> inserts/updates (insert TTL overrides table level TTL). So, that your
> entire sstable expires at the same time, as opposed to each insert expiring
> at its own pace. So that for tombstone clean up, the system can just drop
> the entire sstable at once.
>
> ·       Since you’re on v3.0.9, nodetool repair command runs incremental
> repair by default. And with inc repair, -pr option is not recommended.
> (ref. link below)
>
> ·       I’d suggest removing the -pr. Running incremental repair with
> TWCS is better.
>
> ·       Here’s why I think so -> Full repair and Full repair with –PR
> option would include all  the sstables in the repair process, which means
> the chance of your oldest and newest data mixing is very high.
>
> ·       Whereas, if you run incremental repair every 5 days for example,
> only the last five days of data would be included in that repair operation.
> So, the maximum ‘damage’ it would do is mixing 5 day old data in a new
> sstable.
>
> ·       Your table level TTL would then tombstone this data on 4 month +
> 5 day mark instead of on the 4 month mark. Which shouldn’t be a big
> concern. At least in our case it isn’t!
>
> ·       I wouldn’t stop running repairs on our TWCS tables, because we
> are too concerned with data consistency.
>
>
>
>
>
> Please read the note here:
>
> https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRepair.html
>
>
>
>
>
> Thank you,
>
> *Meg*
>
>
>
>
>
> *From:* Caesar, Maik [mailto:maik.cae...@dxc.com]
> *Sent:* Wednesday, October 24, 2018 2:17 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: TWCS: Repair create new buckets with old data
>
>
>
> Hi Meg,
>
> the ttl (4 month) is set during insert via insert statement with the
> application.
>
> The repair is started each day on one of ten hosts with command : nodetool
> --host hostname_# repair –pr
>
>
>
> Regards
>
> Maik
>
>
>
> *From:* Meg Mara [mailto:mm...@digitalriver.com <mm...@digitalriver.com>]
> *Sent:* Dienstag, 23. Oktober 2018 17:05
> *To:* user@cassandra.apache.org
> *Subject:* RE: TWCS: Repair create new buckets with old data
>
>
>
> Hi Maik,
>
>
>
> I noticed in your table description that your default_time_to_live = 0,
> so where is your TTL property set? At what point do your sstables get
> tombstoned?
>
>
>
> Also, could you please mention what kind of repair you performed on this
> table? (Incremental, Full, Full repair with -pr option)
>
>
>
> Thank you,
>
> *Meg*
>
>
>
>
>
> *From:* Caesar, Maik [mailto:maik.cae...@dxc.com <maik.cae...@dxc.com>]
> *Sent:* Monday, October 22, 2018 10:17 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: TWCS: Repair create new buckets with old data
>
>
>
> Ok, thanks.
>
> My conclusion:
>
> 1.       I will set unchecked_tombstone_compaction to true to get old
> data with tombstones removed
>
> 2.       I will exclude TWCS tables from repair
>
>
>
> Regarding exclude table from repair, is there any easy way to do this?
> Nodetool repaire do not support excludes.
>
>
>
> Regards
>
> Maik
>
>
>
> *From:* wxn...@zjqunshuo.com [mailto:wxn...@zjqunshuo.com
> <wxn...@zjqunshuo.com>]
> *Sent:* Freitag, 19. Oktober 2018 03:58
> *To:* user <user@cassandra.apache.org>
> *Subject:* RE: TWCS: Repair create new buckets with old data
>
>
>
> > Is the repair not necessary to get data files remove from filesystem ?
>
> The answer is no. IMO, Cassandra will remove sstable files automatically
> if it can make sure the sstable files are 100% of tombstones and safe to do
> deletion. If you use TWCS and you have only insertion and no update, you
> don't need run repair manually.
>
>
>
> -Simon
>
>
>
> *From:* Caesar, Maik <maik.cae...@dxc.com>
>
> *Date:* 2018-10-18 20:30
>
> *To:* user@cassandra.apache.org
>
> *Subject:* RE: TWCS: Repair create new buckets with old data
>
> Hello Simon,
>
> Is the repair not necessary to get data files remove from filesystem ? My
> assumption was, that only repaired data will removed after TTL is reached.
>
>
>
> Regards
>
> Maik
>
>
>
> *From:* wxn...@zjqunshuo.com [mailto:wxn...@zjqunshuo.com
> <wxn...@zjqunshuo.com>]
> *Sent:* Mittwoch, 17. Oktober 2018 09:02
> *To:* user <user@cassandra.apache.org>
> *Subject:* Re: TWCS: Repair create new buckets with old data
>
>
>
> Hi Maik,
>
> IMO, when using TWCS, you had better not run repair. The behaviour of TWCS
> is same with STCS for repair when merging sstables and the result is
> leaving sstables spanning multiple time buckets, but maybe I'm wrong. In my
> use case, I don't do repair with table using TWCS.
>
>
>
> -Simon
>
>
>
> *From:* Caesar, Maik <maik.cae...@dxc.com>
>
> *Date:* 2018-10-16 17:46
>
> *To:* user@cassandra.apache.org
>
> *Subject:* TWCS: Repair create new buckets with old data
>
> Hallo,
>
> we work with Cassandra version 3.0.9 and have a problem in a table with
> TWCS. The command “nodetool repair” create always new files with old data.
> This avoid the delete of the old data.
>
> The layout of the Table is following:
>
> cqlsh> desc stat.spa
>
>
>
> CREATE TABLE stat.spa (
>
>     region int,
>
>     id int,
>
>     date text,
>
>     hour int,
>
>     zippedjsonstring blob,
>
>     PRIMARY KEY ((region, id), date, hour)
>
> ) WITH CLUSTERING ORDER BY (date ASC, hour ASC)
>
>     AND bloom_filter_fp_chance = 0.01
>
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
>     AND comment = ''
>
>     AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
> 'max_threshold': '100', 'min_threshold': '4',
> 'tombstone_compaction_interval': '86460'}
>
>     AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
>     AND crc_check_chance = 1.0
>
>     AND dclocal_read_repair_chance = 0.0
>
>     AND default_time_to_live = 0
>
>     AND gc_grace_seconds = 864000
>
>     AND max_index_interval = 2048
>
>     AND memtable_flush_period_in_ms = 0
>
>     AND min_index_interval = 128
>
>     AND read_repair_chance = 0.0
>
>     AND speculative_retry = '99PERCENTILE';
>
>
>
> Actual the oldest data are from 2017/04/15 and will not remove:
>
>
>
> $ for f in *Data.db; do meta=$(sudo sstablemetadata $f); echo -e "Max:"
> $(date --date=@$(echo "$meta" | grep Maximum\ time | cut -d" "  -f3| cut -c
> 1-10) '+%Y/%m/%d %H:%M') "Min:" $(date --date=@$(echo "$meta" | grep
> Minimum\ time | cut -d" "  -f3| cut -c 1-10) '+%Y/%m/%d %H:%M') $(echo
> "$meta" | grep droppable) $(echo "$meta" | grep "Repaired at") ' \t ' $(ls
> -lh $f | awk '{print $5" "$6" "$7" "$8" "$9}'); done | sort
>
> Max: 2017/04/15 12:08 Min: 2017/03/31 13:09 Estimated droppable
> tombstones: 1.7731048805815162 Repaired at: 1525685601400         42K May 7
> 19:56 mc-22922-big-Data.db
>
> Max: 2017/04/17 13:49 Min: 2017/03/31 13:09 Estimated droppable
> tombstones: 1.9600207684319835 Repaired at: 1525685601400         116M May
> 7 13:31 mc-15096-big-Data.db
>
> Max: 2017/04/21 13:43 Min: 2017/04/15 13:34 Estimated droppable
> tombstones: 1.9090909090909092 Repaired at: 1525685601400         11K May 7
> 19:56 mc-22921-big-Data.db
>
> Max: 2017/05/23 21:45 Min: 2017/04/21 14:00 Estimated droppable
> tombstones: 1.8360655737704918 Repaired at: 1525685601400         21M May 7
> 19:56 mc-22919-big-Data.db
>
> Max: 2017/06/12 15:19 Min: 2017/04/25 14:45 Estimated droppable
> tombstones: 1.8091397849462365 Repaired at: 1525685601400         19M May 7
> 14:36 mc-17095-big-Data.db
>
> Max: 2017/06/15 15:26 Min: 2017/05/10 14:37 Estimated droppable
> tombstones: 1.76536312849162 Repaired at: 1529612605539           9.3M Jun
> 21 22:31 mc-25372-big-Data.db
>
> …
>
>
>
> After a „nodetool repair“ run, a new big data file is created that include
> old data from 2017/07/31.
>
>
>
> Max: 2018/07/27 18:10 Min: 2017/03/31 13:13 Estimated droppable
> tombstones: 0.08392555471691247 Repaired at: 0            11G Sep 11 22:02
> mc-39281-big-Data.db
>
> …
>
> Max: 2018/08/16 18:18 Min: 2018/08/06 12:19 Estimated droppable
> tombstones: 0.0 Repaired at: 1534525730510        123M Aug 17 23:46
> mc-36847-big-Data.db
>
> Max: 2018/08/17 19:20 Min: 2017/07/31 12:04 Estimated droppable
> tombstones: 0.03385963490004347 Repaired at: 0            11G Sep 11
> 21:43 mc-39265-big-Data.db
>
> Max: 2018/08/17 19:20 Min: 2018/07/24 12:33 Estimated droppable
> tombstones: 0.0 Repaired at: 1534525730510        135M Sep 11 21:44
> mc-39270-big-Data.db
>
> …
>
> Max: 2018/09/06 17:30 Min: 2018/08/28 12:17 Estimated droppable
> tombstones: 0.0 Repaired at: 1536690786879        129M Sep 11 21:10
> mc-39238-big-Data.db
>
> Max: 2018/09/07 18:22 Min: 2017/04/23 12:48 Estimated droppable
> tombstones: 0.1548442441468401 Repaired at: 0     8.0G Sep 11 21:33
> mc-39258-big-Data.db
>
> Max: 2018/09/07 18:22 Min: 2018/09/07 12:15 Estimated droppable
> tombstones: 0.0 Repaired at: 1536690786879        72M Sep 11 21:34
> mc-39262-big-Data.db
>
> Max: 2018/09/08 18:20 Min: 2018/08/22 12:17 Estimated droppable
> tombstones: 0.0 Repaired at: 0            2.8G Sep 11 21:47
> mc-39272-big-Data.db
>
>
>
> The tool sstableexpiredblockers shows that the file mc-39281-big-Data.db 
> blocks
> 95 expired files from getting dropped, for example the oldest file
> mc-22922-big-Data.db
>
>
>
> [BigTableReader(path='.../stat/spa-.../mc-39281-big-Data.db') (minTS =
> 1490958782530000, maxTS = 1532707837676719, maxLDT = 1557154990)
>
>   blocks 95 expired sstables from getting dropped:
>
>  [BigTableReader(path='.../stat/spa-.../mc-36936-big-Data.db') (minTS =
> 1500027128958000, maxTS = 1503666765807229, maxLDT = 1535202765)
>
> [BigTableReader(path='.../stat/spa-.../mc-22921-big-Data.db') (minTS =
> 1492256093314000, maxTS = 1492775013454001, maxLDT = 1524311013)
>
> [BigTableReader(path='.../stat/spa-.../mc-36947-big-Data.db') (minTS =
> 1492255708403000, maxTS = 1501937182477001, maxLDT = 1533473182)
>
> [BigTableReader(path='.../stat/spa-.../mc-32582-big-Data.db') (minTS =
> 1493028031639000, maxTS = 1499175057476001, maxLDT = 1530711057)
>
> [BigTableReader(path='.../stat/spa-.../mc-32560-big-Data.db') (minTS =
> 1500210297826000, maxTS = 1501416691390001, maxLDT = 1532952691)
>
> [BigTableReader(path='.../stat/spa-.../mc-32528-big-Data.db') (minTS =
> 1490958761762000, maxTS = 1504358072394248, maxLDT = 1535894072)
>
> [BigTableReader(path='.../stat/spa-.../mc-32572-big-Data.db') (minTS =
> 1500027103795000, maxTS = 1500297137808001, maxLDT = 1531833137)
>
> [BigTableReader(path='.../stat/spa-.../mc-36935-big-Data.db') (minTS =
> 1500038582669000, maxTS = 1503839159485824, maxLDT = 1535375159)
>
> [BigTableReader(path='.../stat/spa-.../mc-22922-big-Data.db') (minTS =
> 1490958570018000, maxTS = 1492250905633001, maxLDT = 1523786905)
>
> [BigTableReader(path='.../stat/spa-.../mc-33470-big-Data.db') (minTS =
> 1499940836241000, maxTS = 1500040376685000, maxLDT = 1531576376)
>
>
>
> Why create the repair such turbulence in new data files and how can we
> remove the old data?
>
>
>
> Kind Regards
>
> Maik Cäsar
>
>
>
>
>
> DXC Technology Company -- This message is transmitted to you by or on
> behalf of DXC Technology Company or one of its affiliates. It is intended
> exclusively for the addressee. The substance of this message, along with
> any attachments, may contain proprietary, confidential or privileged
> information or information that is otherwise legally exempt from
> disclosure. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient of this message, you are
> not authorized to read, print, retain, copy or disseminate any part of this
> message. If you have received this message in error, please destroy and
> delete all copies and notify the sender by return e-mail. Regardless of
> content, this e-mail shall not operate to bind DXC Technology Company or
> any of its affiliates to any order or other contract unless pursuant to
> explicit written agreement or government initiative expressly permitting
> the use of e-mail for such purpose. --.
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: TWCS: Repair create new buckets with old data

Reply via email to