[jira] [Commented] (CASSANDRA-14941) Expired secondary index sstables are not promptly discarded under TWCS

2019-08-15 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908395#comment-16908395
 ] 

Marcus Eriksson commented on CASSANDRA-14941:
-

I'll have a closer look tomorrow or early next week, but I think the problem is 
that we use {{EncodingStats#minTimestamp}} without checking for the epoch here: 
https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/Memtable.java#L256

> Expired secondary index sstables are not promptly discarded under TWCS
> --
>
> Key: CASSANDRA-14941
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14941
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/2i Index
>Reporter: Samuel Klock
>Priority: Normal
>
> We have a table in a cluster running 3.0.17 storing roughly time-series data 
> using TWCS with a secondary index. We've noticed that while expired sstables 
> for the table are discarded mostly when we expect them to be, the expired 
> sstables for the secondary index would linger for weeks longer than expected 
> – essentially indefinitely. Eventually the sstables would fill disks, which 
> would require manual steps (deleting ancient index sstables) to address. We 
> verified with {{sstableexpiredblockers}} that there wasn't anything on disk 
> blocking the expired sstables from being dropped, so this looks like a bug.
> Through some debugging, we traced the problem to the index's memtables, which 
> were consistently (except _just_ after node restarts) reporting a minimum 
> timestamp from September 2015 – much older than any of our live data – which 
> causes {{CompactionController.getFullyExpiredSSTables()}} to consistently 
> return an empty set. The reason that the index sstables report this minimum 
> timestamp is because of how index updates are created, using 
> {{PartitionUpdate.singleRowUpdate()}}:
> {code:java}
> public static PartitionUpdate singleRowUpdate(CFMetaData metadata, 
> DecoratedKey key, Row row, Row staticRow)
> {
> MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
> Holder holder = new Holder(
> new PartitionColumns(
> staticRow == null ? Columns.NONE : 
> Columns.from(staticRow.columns()),
> row == null ? Columns.NONE : Columns.from(row.columns())
> ),
> row == null ? BTree.empty() : BTree.singleton(row),
> deletionInfo,
> staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow,
> EncodingStats.NO_STATS
> );
> return new PartitionUpdate(metadata, key, holder, deletionInfo, 
> false);
> }
> {code}
> The use of {{EncodingStats.NO_STATS}} makes it appear as though the earliest 
> timestamp in the resulting {{PartitionUpdate}} is from September 2015. That 
> timestamp becomes the minimum for the memtable.
> Modifying this version of {{PartitionUpdate.singleRowUpdate()}} to:
> {code:java}
> public static PartitionUpdate singleRowUpdate(CFMetaData metadata, 
> DecoratedKey key, Row row, Row staticRow)
> {
> MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
> staticRow = (staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow);
> EncodingStats stats = EncodingStats.Collector.collect(staticRow,
>   (row == null ?
>
> Collections.emptyIterator() :
>
> Iterators.singletonIterator(row)),
>   deletionInfo);
> Holder holder = new Holder(
> new PartitionColumns(
> staticRow == Rows.EMPTY_STATIC_ROW ? Columns.NONE : 
> Columns.from(staticRow.columns()),
> row == null ? Columns.NONE : Columns.from(row.columns())
> ),
> row == null ? BTree.empty() : BTree.singleton(row),
> deletionInfo,
> staticRow,
> stats
> );
> return new PartitionUpdate(metadata, key, holder, deletionInfo, 
> false);
> }
> {code}
> (i.e., computing an {{EncodingStats}} from the contents of the update) seems 
> to fix the problem. However, we're not certain whether A) there's a 
> functional reason the method was using {{EncodingStats.NO_STATS}} previously 
> or B) whether the {{EncodingStats}} the revised version creates is correct 
> (in particular, the use of {{deletionInfo}} feels a little suspect). We're 
> also not sure whether there's a more appropriate fix (e.g., changing how the 
> memtables compute the minimum timestamp, particularly in the {{NO_STATS}} 
> case).



--
This message was sent by 

[jira] [Commented] (CASSANDRA-14941) Expired secondary index sstables are not promptly discarded under TWCS

2019-08-15 Thread Samuel Klock (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908200#comment-16908200
 ] 

Samuel Klock commented on CASSANDRA-14941:
--

Pinging this issue.  Any chance someone knowledgeable about this part of the 
code could take a look?  We've had positive experience testing this fix in a 
cluster exclusively handling time-series data, but we'd like confirmation that 
it's safe in general before deploying it in other clusters.  It'd also be great 
of course to have this fixed in 3.0.x (or at least 3.11.x/4.0) as well.

> Expired secondary index sstables are not promptly discarded under TWCS
> --
>
> Key: CASSANDRA-14941
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14941
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/2i Index
>Reporter: Samuel Klock
>Priority: Normal
>
> We have a table in a cluster running 3.0.17 storing roughly time-series data 
> using TWCS with a secondary index. We've noticed that while expired sstables 
> for the table are discarded mostly when we expect them to be, the expired 
> sstables for the secondary index would linger for weeks longer than expected 
> – essentially indefinitely. Eventually the sstables would fill disks, which 
> would require manual steps (deleting ancient index sstables) to address. We 
> verified with {{sstableexpiredblockers}} that there wasn't anything on disk 
> blocking the expired sstables from being dropped, so this looks like a bug.
> Through some debugging, we traced the problem to the index's memtables, which 
> were consistently (except _just_ after node restarts) reporting a minimum 
> timestamp from September 2015 – much older than any of our live data – which 
> causes {{CompactionController.getFullyExpiredSSTables()}} to consistently 
> return an empty set. The reason that the index sstables report this minimum 
> timestamp is because of how index updates are created, using 
> {{PartitionUpdate.singleRowUpdate()}}:
> {code:java}
> public static PartitionUpdate singleRowUpdate(CFMetaData metadata, 
> DecoratedKey key, Row row, Row staticRow)
> {
> MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
> Holder holder = new Holder(
> new PartitionColumns(
> staticRow == null ? Columns.NONE : 
> Columns.from(staticRow.columns()),
> row == null ? Columns.NONE : Columns.from(row.columns())
> ),
> row == null ? BTree.empty() : BTree.singleton(row),
> deletionInfo,
> staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow,
> EncodingStats.NO_STATS
> );
> return new PartitionUpdate(metadata, key, holder, deletionInfo, 
> false);
> }
> {code}
> The use of {{EncodingStats.NO_STATS}} makes it appear as though the earliest 
> timestamp in the resulting {{PartitionUpdate}} is from September 2015. That 
> timestamp becomes the minimum for the memtable.
> Modifying this version of {{PartitionUpdate.singleRowUpdate()}} to:
> {code:java}
> public static PartitionUpdate singleRowUpdate(CFMetaData metadata, 
> DecoratedKey key, Row row, Row staticRow)
> {
> MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
> staticRow = (staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow);
> EncodingStats stats = EncodingStats.Collector.collect(staticRow,
>   (row == null ?
>
> Collections.emptyIterator() :
>
> Iterators.singletonIterator(row)),
>   deletionInfo);
> Holder holder = new Holder(
> new PartitionColumns(
> staticRow == Rows.EMPTY_STATIC_ROW ? Columns.NONE : 
> Columns.from(staticRow.columns()),
> row == null ? Columns.NONE : Columns.from(row.columns())
> ),
> row == null ? BTree.empty() : BTree.singleton(row),
> deletionInfo,
> staticRow,
> stats
> );
> return new PartitionUpdate(metadata, key, holder, deletionInfo, 
> false);
> }
> {code}
> (i.e., computing an {{EncodingStats}} from the contents of the update) seems 
> to fix the problem. However, we're not certain whether A) there's a 
> functional reason the method was using {{EncodingStats.NO_STATS}} previously 
> or B) whether the {{EncodingStats}} the revised version creates is correct 
> (in particular, the use of {{deletionInfo}} feels a little suspect). We're 
> also not sure whether there's a more appropriate fix (e.g., changing how the 
>