[ https://issues.apache.org/jira/browse/CASSANDRA-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
C. Scott Andreas updated CASSANDRA-14941: ----------------------------------------- Reproduced In: 3.0.17 > Expired secondary index sstables are not promptly discarded under TWCS > ---------------------------------------------------------------------- > > Key: CASSANDRA-14941 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14941 > Project: Cassandra > Issue Type: Bug > Components: Feature/2i Index > Reporter: Samuel Klock > Priority: Major > > We have a table in a cluster running 3.0.17 storing roughly time-series data > using TWCS with a secondary index. We've noticed that while expired sstables > for the table are discarded mostly when we expect them to be, the expired > sstables for the secondary index would linger for weeks longer than expected > – essentially indefinitely. Eventually the sstables would fill disks, which > would require manual steps (deleting ancient index sstables) to address. We > verified with {{sstableexpiredblockers}} that there wasn't anything on disk > blocking the expired sstables from being dropped, so this looks like a bug. > Through some debugging, we traced the problem to the index's memtables, which > were consistently (except _just_ after node restarts) reporting a minimum > timestamp from September 2015 – much older than any of our live data – which > causes {{CompactionController.getFullyExpiredSSTables()}} to consistently > return an empty set. The reason that the index sstables report this minimum > timestamp is because of how index updates are created, using > {{PartitionUpdate.singleRowUpdate()}}: > {code:java} > public static PartitionUpdate singleRowUpdate(CFMetaData metadata, > DecoratedKey key, Row row, Row staticRow) > { > MutableDeletionInfo deletionInfo = MutableDeletionInfo.live(); > Holder holder = new Holder( > new PartitionColumns( > staticRow == null ? Columns.NONE : > Columns.from(staticRow.columns()), > row == null ? Columns.NONE : Columns.from(row.columns()) > ), > row == null ? BTree.empty() : BTree.singleton(row), > deletionInfo, > staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow, > EncodingStats.NO_STATS > ); > return new PartitionUpdate(metadata, key, holder, deletionInfo, > false); > } > {code} > The use of {{EncodingStats.NO_STATS}} makes it appear as though the earliest > timestamp in the resulting {{PartitionUpdate}} is from September 2015. That > timestamp becomes the minimum for the memtable. > Modifying this version of {{PartitionUpdate.singleRowUpdate()}} to: > {code:java} > public static PartitionUpdate singleRowUpdate(CFMetaData metadata, > DecoratedKey key, Row row, Row staticRow) > { > MutableDeletionInfo deletionInfo = MutableDeletionInfo.live(); > staticRow = (staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow); > EncodingStats stats = EncodingStats.Collector.collect(staticRow, > (row == null ? > > Collections.emptyIterator() : > > Iterators.singletonIterator(row)), > deletionInfo); > Holder holder = new Holder( > new PartitionColumns( > staticRow == Rows.EMPTY_STATIC_ROW ? Columns.NONE : > Columns.from(staticRow.columns()), > row == null ? Columns.NONE : Columns.from(row.columns()) > ), > row == null ? BTree.empty() : BTree.singleton(row), > deletionInfo, > staticRow, > stats > ); > return new PartitionUpdate(metadata, key, holder, deletionInfo, > false); > } > {code} > (i.e., computing an {{EncodingStats}} from the contents of the update) seems > to fix the problem. However, we're not certain whether A) there's a > functional reason the method was using {{EncodingStats.NO_STATS}} previously > or B) whether the {{EncodingStats}} the revised version creates is correct > (in particular, the use of {{deletionInfo}} feels a little suspect). We're > also not sure whether there's a more appropriate fix (e.g., changing how the > memtables compute the minimum timestamp, particularly in the {{NO_STATS}} > case). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org