[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902783#comment-15902783 ] Marcus Eriksson commented on CASSANDRA-13153: - oops, sorry for the delay, +1 > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902779#comment-15902779 ] Stefan Podkowinski commented on CASSANDRA-13153: [~krummas], any feedback on the latest, simplified patch version? > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892486#comment-15892486 ] Stefan Podkowinski commented on CASSANDRA-13153: I've changed my branches to the bare minimum of what needs to be done for filtering already repaired sstables and re-run tests. See above for links. > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889664#comment-15889664 ] Marcus Eriksson commented on CASSANDRA-13153: - Makes sense, but lets add this if/when we do that change? > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888552#comment-15888552 ] Stefan Podkowinski commented on CASSANDRA-13153: Yes, that could be an option as well. But as we already discussed the possibility of actually doing anti-compaction on repaired sstables for the sake of tracking repairedAt more accurately, I was hoping someone someday would be able to make use of the method as is by providing a reasonable repairedAt value for both anti-compaction outputs. But I'm open to add an assert instead, if you think I'm a bit to optimistic here. > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888348#comment-15888348 ] Marcus Eriksson commented on CASSANDRA-13153: - My thinking was that the only time anyone could expect the sstable with the non-repaired ranges to be something other than UNREPAIRED would be if they passed in repaired sstables, so having the assert shows that this is not expected > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888289#comment-15888289 ] Stefan Podkowinski commented on CASSANDRA-13153: I'm not sure I really understand what the additional {{repairedAtNotContainedInRange}} parameter has to do with adding an assert for making sure "all sstables are unrepaired". Even if all sstables are, we still need to apply a repairedAt value for those ranges not successfully repaired. > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888245#comment-15888245 ] Marcus Eriksson commented on CASSANDRA-13153: - Not sure I agree that adding the parameter helps, could we just add an assert in {{anticompactGroup}} that all sstables are unrepaired instead? > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878122#comment-15878122 ] Stefan Podkowinski commented on CASSANDRA-13153: Patch has now been finished based on my last suggestion to simply skip already repaired sstables during anti-compaction. I've also made the repairedAt timestamps for both the containing and not containing ranges an explicit parameter. This should help to avoid overlooking the fact that we have to deal with repairedAt for the not containing part as well. PR for corresponding dtest can be found [here|https://github.com/riptano/cassandra-dtest/pull/1447]. Test results (just started new dtest run with PR branch): ||2.2||3.0||3.11|| |[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-13153-2.2]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-13153-3.0]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-13153-3.11]| |[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-13153-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-13153-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-13153-3.11-dtest/]| |[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-13153-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-13153-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-13153-3.11-testall/]| > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired".
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871984#comment-15871984 ] Marcus Eriksson commented on CASSANDRA-13153: - [~spo...@gmail.com] the patch LGTM, could you run CI on it? > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot >Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869836#comment-15869836 ] Marcus Eriksson commented on CASSANDRA-13153: - bq. Can we really throw them into the same level locally, just because they have been at level X on other nodes? no, we check if it would create overlap before adding it to the manifest: https://github.com/apache/cassandra/blob/98d74ed998706e9e047dc0f7886a1e9b18df3ce9/src/java/org/apache/cassandra/db/compaction/LeveledManifest.java#L149 > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869826#comment-15869826 ] Stefan Podkowinski commented on CASSANDRA-13153: bq. If operators stop using incremental repairs, there's no harm in doing an anticompaction after a full repair. Even if there's no harm when it comes to consistency, it's still causing fragmentation of existing sstables. All repaired ranges will cause all replicas to go through all local, intersecting sstables and rewrite them segregated by affected and unaffected token ranges. This will cause unnecessary load and is probably pretty bad for LCS or STCS, as we constantly break up bigger sstables by doing so. One option to avoid this would be just to never run anti-compaction on repaired sstables. See [here|https://github.com/spodkowinski/cassandra/commit/684d1c72cda58fecea15b46f928a451df38d87cb] for a simple approach. I don't think anti-compaction was ever meant to work on already repaired sstables, so that's probably the most non-intrusive fix to avoid most of the known issues around incremental repairs discussed here. Btw, I'm also a bit confused by looking at [createWriterForAntiCompaction|https://github.com/apache/cassandra/blob/98d74ed998706e9e047dc0f7886a1e9b18df3ce9/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1285]. Each sstable's level will be streamed as well, doesn't it? Can we really throw them into the same level locally, just because they have been at level X on other nodes? Won't this potentially break the "non-overlapping sstables" guarantee by dropping them blindly to level X? > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868143#comment-15868143 ] Blake Eggleston commented on CASSANDRA-13153: - bq. shouldn't we get rid of anti-compactions for full repairs in 2.2+ as well? I think we're ok leaving them as is. Using pre 4.0 incremental repairs is root cause of this. If operators stop using incremental repairs, there's no harm in doing an anticompaction after a full repair. The only scenario it would cause problems is when using incremental repair for the first time after upgrading to 4.0, when the repaired datasets are very likely inconsistent. This could be addressed by just running a final full repair on the upgraded cluster. As part of CASSANDRA-9143, full repairs no longer perform anticompaction, and streamed sstables include the repairedAt time, which would bring the repaired and unrepaired datasets in sync. So having said all that, it seems like we should recommend that users who delete data: 1. Stop using incremental repair (pre-4.0) 2. Run a full repair after upgrading to 4.0 before using incremental repair again We should also recommend that even if users don't delete data, they should take a look at the amount of streaming their incremental repair is doing, and decide if it might be less expensive to just do full repairs instead. Thoughts? > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867821#comment-15867821 ] Marcus Eriksson commented on CASSANDRA-13153: - Ok, so the problem is actually if we run a -full repair and some of the ranges fail, we might anticompact an sstable to unrepaired. The fix would be that we anticompact to the previous value of repairedAt. Or that we, as suggested, don't anticompact on full repairs at all. > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867566#comment-15867566 ] Marcus Eriksson commented on CASSANDRA-13153: - [~spo...@gmail.com] are you saying that sstables marked as repaired are getting moved to unrepaired? I don't see how that could happen, with -full repairs, if a repaired sstable gets compacted away, it will (should?) stay in repaired, not get moved to unrepaired > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867542#comment-15867542 ] Stefan Podkowinski commented on CASSANDRA-13153: Taking a closer look at CASSANDRA-9143 again, I'm certain that your work there would indeed fix the issue described in this ticket, as we no longer do anti-compaction for full repairs. So that brings me back to the question: shouldn't we get rid of anti-compactions for full repairs in 2.2+ as well? > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866811#comment-15866811 ] Blake Eggleston commented on CASSANDRA-13153: - bq. CASSANDRA-13153 is not just about redundant re-streaming. It's about streaming only partial data for partitions or cells Right, agreed. My point was that not using incremental repair should fix [~Amanda.Debrot]'s problem. The part about redundant streaming just meant that as a workaround, it might not actually be as bad as it sounds. bq. With CASSANDRA-9143 it's not that bad, since you start on unrepaired, recent data and the next incremental run will indeed fix the data that has been left in unrepaired before, given it's run within gc_grace. But with CASSANDRA-13153 you might leak arbitrary old data into unrepaired, which should never happen. I'm not sure what you mean here. The goal of CASSANDRA-9143 was to prevent repaired data from ever leaking back into unrepaired, for both correctness and performance reasons. Do you mean that leaking data is still possible after CASSANDRA-9143, or that the point of this ticket is different? > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866518#comment-15866518 ] Stefan Podkowinski commented on CASSANDRA-13153: CASSANDRA-13153 is not just about redundant re-streaming. It's about streaming only _partial_ data for partitions or cells based on the circumstance if an individual sstable has been affected or not. If it did, you may end up leaking data that is covered by a tombstone back to unrepaired, while the tombstone in the unaffected sstable stays in repaired, and have the data streamed from there to all other nodes (which may already compacted the data and tombstone away). Or am I missing something here? With CASSANDRA-9143 it's not _that_ bad, since you start on unrepaired, recent data and the next incremental run will indeed fix the data that has been left in unrepaired before, given it's run within gc_grace. But with CASSANDRA-13153 you might leak arbitrary old data into unrepaired, which should never happen. > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866448#comment-15866448 ] Blake Eggleston commented on CASSANDRA-13153: - I think this can also happen just by running incremental repair only, because of the way it leaks data into the unrepaired sstable bucket. This has been fixed in CASSANDRA-9143… but that was only committed to trunk, since it’s not a trivial change. Unfortunately, the only way to avoid this in pre-4.0 clusters is to just not run incremental repair. This may not be as bad as it sounds though, since what pre CASSANDRA-9143 incremental repair gained in validation time, it likely lost in redundant re-streaming of otherwise repaired data. If you had a large sstable compacted during a repair, the entire thing would have to be streamed to every other replica on the next incremental repair. > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865792#comment-15865792 ] Stefan Podkowinski commented on CASSANDRA-13153: Getting back to this ticket and giving it some thoughts again, I'm pretty sure that it's not enough to disable anti-compaction for full PK repairs. This will only prevent the described issue for the repair initiator node, but not the involved other replicas. I'm afraid there's no way around disabling anti-compaction for full repairs completely to prevent this issue from happening. > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839768#comment-15839768 ] Amanda Debrot commented on CASSANDRA-13153: --- Hi Stefan, Yes true, it should just affect Cassandra 2.2+ versions. I forgot about that point with 2.1. I'll update the "since version". Thanks! > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions with incremental > repair - like 2.1 and 3.0. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839622#comment-15839622 ] Stefan Podkowinski commented on CASSANDRA-13153: Thanks reporting this, [~Amanda.Debrot]! Let me try to wrap-up again what's happending here.. I think the assumption was that anti-compaction will isolate repaired ranges into the repaired set of sstables, while parts of sstables not covered by the repair will stay in the unrepaired set. As described by Amanda, trouble starts when anti-compaction is taking place exclusively on already repaired sstables. Once we've finished repairing a certain range using full repair, anti-compaction will move unaffected ranges in overlapping sstables from the repaired into unrepaired set again, even if ranges have actually already been repaired before. As the overlap between ranges and sstables is non-deterministic, we could either see regular cells, tombstones or both being move to unrepaired, based on whether the sstable happens to overlap or not. Unfortunately this is not the only way that this could happen. As described in CASSANDRA-9143, compactions during the repairs can prevent anti-compaction for individual sstables and tombstones and data could end up in different sets in this case as well. bq. I've only tested it on Cassandra version 2.2 but it most likely also affects all Cassandra versions with incremental repair - like 2.1 and 3.0. I think 2.1 should not be affected, as we started doing anti-compactions for full repairs in 2.2. > Reappeared Data when Mixing Incremental and Full Repairs > > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 >Reporter: Amanda Debrot > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions with incremental > repair - like 2.1 and 3.0. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable