[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-03-09 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902783#comment-15902783
 ] 

Marcus Eriksson commented on CASSANDRA-13153:
-

oops, sorry for the delay, +1

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>Assignee: Stefan Podkowinski
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-03-09 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902779#comment-15902779
 ] 

Stefan Podkowinski commented on CASSANDRA-13153:


[~krummas], any feedback on the latest, simplified patch version?

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>Assignee: Stefan Podkowinski
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-03-02 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892486#comment-15892486
 ] 

Stefan Podkowinski commented on CASSANDRA-13153:


I've changed my branches to the bare minimum of what needs to be done for 
filtering already repaired sstables and re-run tests. See above for links.

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>Assignee: Stefan Podkowinski
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-28 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889664#comment-15889664
 ] 

Marcus Eriksson commented on CASSANDRA-13153:
-

Makes sense, but lets add this if/when we do that change?

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>Assignee: Stefan Podkowinski
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-28 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888552#comment-15888552
 ] 

Stefan Podkowinski commented on CASSANDRA-13153:


Yes, that could be an option as well. But as we already discussed the 
possibility of actually doing anti-compaction on repaired sstables for the sake 
of tracking repairedAt more accurately, I was hoping someone someday would be 
able to make use of the method as is by providing a reasonable repairedAt value 
for both anti-compaction outputs. But I'm open to add an assert instead, if you 
think I'm a bit to optimistic here.

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>Assignee: Stefan Podkowinski
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-28 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888348#comment-15888348
 ] 

Marcus Eriksson commented on CASSANDRA-13153:
-

My thinking was that the only time anyone could expect the sstable with the 
non-repaired ranges to be something other than UNREPAIRED would be if they 
passed in repaired sstables, so having the assert shows that this is not 
expected

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>Assignee: Stefan Podkowinski
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-28 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888289#comment-15888289
 ] 

Stefan Podkowinski commented on CASSANDRA-13153:


I'm not sure I really understand what the additional 
{{repairedAtNotContainedInRange}} parameter has to do with adding an assert for 
making sure "all sstables are unrepaired". Even if all sstables are, we still 
need to apply a repairedAt value for those ranges not successfully repaired.

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>Assignee: Stefan Podkowinski
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-28 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888245#comment-15888245
 ] 

Marcus Eriksson commented on CASSANDRA-13153:
-

Not sure I agree that adding the parameter helps, could we just add an assert 
in {{anticompactGroup}} that all sstables are unrepaired instead?

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>Assignee: Stefan Podkowinski
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-22 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878122#comment-15878122
 ] 

Stefan Podkowinski commented on CASSANDRA-13153:


Patch has now been finished based on my last suggestion to simply skip already 
repaired sstables during anti-compaction. I've also made the repairedAt 
timestamps for both the containing and not containing ranges an explicit 
parameter. This should help to avoid overlooking the fact that we have to deal 
with repairedAt for the not containing part as well.

PR for corresponding dtest can be found 
[here|https://github.com/riptano/cassandra-dtest/pull/1447].

Test results (just started new dtest run with PR branch):

||2.2||3.0||3.11||
|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-13153-2.2]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-13153-3.0]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-13153-3.11]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-13153-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-13153-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-13153-3.11-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-13153-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-13153-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-13153-3.11-testall/]|





> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>Assignee: Stefan Podkowinski
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   

[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-17 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871984#comment-15871984
 ] 

Marcus Eriksson commented on CASSANDRA-13153:
-

[~spo...@gmail.com] the patch LGTM, could you run CI on it?

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>Assignee: Stefan Podkowinski
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-16 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869836#comment-15869836
 ] 

Marcus Eriksson commented on CASSANDRA-13153:
-

bq. Can we really throw them into the same level locally, just because they 
have been at level X on other nodes?
no, we check if it would create overlap before adding it to the manifest:
https://github.com/apache/cassandra/blob/98d74ed998706e9e047dc0f7886a1e9b18df3ce9/src/java/org/apache/cassandra/db/compaction/LeveledManifest.java#L149

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-16 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869826#comment-15869826
 ] 

Stefan Podkowinski commented on CASSANDRA-13153:


bq.  If operators stop using incremental repairs, there's no harm in doing an 
anticompaction after a full repair. 

Even if there's no harm when it comes to consistency, it's still causing 
fragmentation of existing sstables. All repaired ranges will cause all replicas 
to go through all local, intersecting sstables and rewrite them segregated by 
affected and unaffected token ranges. This will cause unnecessary load and is 
probably pretty bad for LCS or STCS, as we constantly break up bigger sstables 
by doing so.

One option to avoid this would be just to never run anti-compaction on repaired 
sstables. See 
[here|https://github.com/spodkowinski/cassandra/commit/684d1c72cda58fecea15b46f928a451df38d87cb]
 for a simple approach. I don't think anti-compaction was ever meant to work on 
already repaired sstables, so that's probably the most non-intrusive fix to 
avoid most of the known issues around incremental repairs discussed here.

Btw, I'm also a bit confused by looking at 
[createWriterForAntiCompaction|https://github.com/apache/cassandra/blob/98d74ed998706e9e047dc0f7886a1e9b18df3ce9/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1285].
 Each sstable's level will be streamed as well, doesn't it? Can we really throw 
them into the same level locally, just because they have been at level X on 
other nodes? Won't this potentially break the "non-overlapping sstables" 
guarantee by dropping them blindly to level X?


> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The 

[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-15 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868143#comment-15868143
 ] 

Blake Eggleston commented on CASSANDRA-13153:
-

bq. shouldn't we get rid of anti-compactions for full repairs in 2.2+ as well?

I think we're ok leaving them as is. Using pre 4.0 incremental repairs is root 
cause of this. If operators stop using incremental repairs, there's no harm in 
doing an anticompaction after a full repair. The only scenario it would cause 
problems is when using incremental repair for the first time after upgrading to 
4.0, when the repaired datasets are very likely inconsistent. This could be 
addressed by just running a final full repair on the upgraded cluster. As part 
of CASSANDRA-9143, full repairs no longer perform anticompaction, and streamed 
sstables include the repairedAt time, which would bring the repaired and 
unrepaired datasets in sync.

So having said all that, it seems like we should recommend that users who 
delete data:
1. Stop using incremental repair (pre-4.0)
2. Run a full repair after upgrading to 4.0 before using incremental repair 
again

We should also recommend that even if users don't delete data, they should take 
a look at the amount of streaming their incremental repair is doing, and decide 
if it might be less expensive to just do full repairs instead.

Thoughts?

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) 

[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-15 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867821#comment-15867821
 ] 

Marcus Eriksson commented on CASSANDRA-13153:
-

Ok, so the problem is actually if we run a -full repair and some of the ranges 
fail, we might anticompact an sstable to unrepaired. The fix would be that we 
anticompact to the previous value of repairedAt.

Or that we, as suggested, don't anticompact on full repairs at all.

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-15 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867566#comment-15867566
 ] 

Marcus Eriksson commented on CASSANDRA-13153:
-

[~spo...@gmail.com] are you saying that sstables marked as repaired are getting 
moved to unrepaired? I don't see how that could happen, with -full repairs, if 
a repaired sstable gets compacted away, it will (should?) stay in repaired, not 
get moved to unrepaired

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-15 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867542#comment-15867542
 ] 

Stefan Podkowinski commented on CASSANDRA-13153:


Taking a closer look at CASSANDRA-9143 again, I'm certain that your work there 
would indeed fix the issue described in this ticket, as we no longer do 
anti-compaction for full repairs. So that brings me back to the question: 
shouldn't we get rid of anti-compactions for full repairs in 2.2+ as well?

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-14 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866811#comment-15866811
 ] 

Blake Eggleston commented on CASSANDRA-13153:
-

bq. CASSANDRA-13153 is not just about redundant re-streaming. It's about 
streaming only partial data for partitions or cells 

Right, agreed. My point was that not using incremental repair should fix 
[~Amanda.Debrot]'s problem. The part about redundant streaming just meant that 
as a workaround, it might not actually be as bad as it sounds.

bq. With CASSANDRA-9143 it's not that bad, since you start on unrepaired, 
recent data and the next incremental run will indeed fix the data that has been 
left in unrepaired before, given it's run within gc_grace. But with 
CASSANDRA-13153 you might leak arbitrary old data into unrepaired, which should 
never happen.

I'm not sure what you mean here. The goal of CASSANDRA-9143 was to prevent 
repaired data from ever leaking back into unrepaired, for both correctness and 
performance reasons. Do you mean that leaking data is still possible after 
CASSANDRA-9143, or that the point of this ticket is different?

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable 

[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-14 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866518#comment-15866518
 ] 

Stefan Podkowinski commented on CASSANDRA-13153:


CASSANDRA-13153 is not just about redundant re-streaming. It's about streaming 
only _partial_ data for partitions or cells based on the circumstance if an 
individual sstable has been affected or not. If it did, you may end up leaking 
data that is covered by a tombstone back to unrepaired, while the tombstone in 
the unaffected sstable stays in repaired, and have the data streamed from there 
to all other nodes (which may already compacted the data and tombstone away). 
Or am I missing something here?

With CASSANDRA-9143 it's not _that_ bad, since you start on unrepaired, recent 
data and the next incremental run will indeed fix the data that has been left 
in unrepaired before, given it's run within gc_grace. But with CASSANDRA-13153 
you might leak arbitrary old data into unrepaired, which should never happen.

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 

[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-14 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866448#comment-15866448
 ] 

Blake Eggleston commented on CASSANDRA-13153:
-

I think this can also happen just by running incremental repair only, because 
of the way it leaks data into the unrepaired sstable bucket. This has been 
fixed in CASSANDRA-9143… but that was only committed to trunk, since it’s not a 
trivial change. Unfortunately, the only way to avoid this in pre-4.0 clusters 
is to just not run incremental repair.

This may not be as bad as it sounds though, since what pre CASSANDRA-9143 
incremental repair gained in validation time, it likely lost in redundant 
re-streaming of otherwise repaired data. If you had a large sstable compacted 
during a repair, the entire thing would have to be streamed to every other 
replica on the next incremental repair.

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If 

[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-02-14 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865792#comment-15865792
 ] 

Stefan Podkowinski commented on CASSANDRA-13153:


Getting back to this ticket and giving it some thoughts again, I'm pretty sure 
that it's not enough to disable anti-compaction for full PK repairs. This will 
only prevent the described issue for the repair initiator node, but not the 
involved other replicas. I'm afraid there's no way around disabling 
anti-compaction for full repairs completely to prevent this issue from 
happening.

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions after 2.2, if they 
> have anticompaction with full repair.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-01-26 Thread Amanda Debrot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839768#comment-15839768
 ] 

Amanda Debrot commented on CASSANDRA-13153:
---

Hi Stefan,

Yes true, it should just affect Cassandra 2.2+ versions.  I forgot about that 
point with 2.1.  I'll update the "since version".  Thanks!

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions with incremental 
> repair - like 2.1 and 3.0.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-01-26 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839622#comment-15839622
 ] 

Stefan Podkowinski commented on CASSANDRA-13153:


Thanks reporting this, [~Amanda.Debrot]! Let me try to wrap-up again what's 
happending here..

I think the assumption was that anti-compaction will isolate repaired ranges 
into the repaired set of sstables, while parts of sstables not covered by the 
repair will stay in the unrepaired set. As described by Amanda, trouble starts 
when anti-compaction is taking place exclusively on already repaired sstables. 
Once we've finished repairing a certain range using full repair, 
anti-compaction will move unaffected ranges in overlapping sstables from the 
repaired into unrepaired set again, even if ranges have actually already been 
repaired before. As the overlap between ranges and sstables is 
non-deterministic, we could either see regular cells, tombstones or both being 
move to unrepaired, based on whether the sstable happens to overlap or not. 

Unfortunately this is not the only way that this could happen. As described in 
CASSANDRA-9143, compactions during the repairs can prevent anti-compaction for 
individual sstables and tombstones and data could end up in different sets in 
this case as well. 

bq.  I've only tested it on Cassandra version 2.2 but it most likely also 
affects all Cassandra versions with incremental repair - like 2.1 and 3.0.

I think 2.1 should not be affected, as we started doing anti-compactions for 
full repairs in 2.2.

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions with incremental 
> repair - like 2.1 and 3.0.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable