[jira] [Updated] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-01-26 Thread Amanda Debrot (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amanda Debrot updated CASSANDRA-13153:
--
Since Version: 2.2.0 beta 1  (was: 2.1 beta1)

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions with incremental 
> repair - like 2.1 and 3.0.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-01-26 Thread Amanda Debrot (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amanda Debrot updated CASSANDRA-13153:
--
Description: 
This happens for both LeveledCompactionStrategy and 
SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 but 
it most likely also affects all Cassandra versions after 2.2, if they have 
anticompaction with full repair.

When mixing incremental and full repairs, there are a few scenarios where the 
Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
repaired.  Then if it is past gc_grace, and the tombstone and data has been 
compacted out on other replicas, the next incremental repair will push the Data 
to other replicas without the tombstone.

Simplified scenario:
3 node cluster with RF=3
Intial config:
Node 1 has data and tombstone in separate SSTables.
Node 2 has data and no tombstone.
Node 3 has data and tombstone in separate SSTables.

Incremental repair (nodetool repair -pr) is run every day so now we have 
tombstone on each node.
Some minor compactions have happened since so data and tombstone get merged to 
1 SSTable on Nodes 1 and 3.
Node 1 had a minor compaction that merged data with tombstone. 1 
SSTable with tombstone.
Node 2 has data and tombstone in separate SSTables.
Node 3 had a minor compaction that merged data with tombstone. 1 
SSTable with tombstone.

Incremental repairs keep running every day.
Full repairs run weekly (nodetool repair -full -pr). 
Now there are 2 scenarios where the Data SSTable will get marked as 
"Unrepaired" while Tombstone SSTable will get marked as "Repaired".

Scenario 1:
Since the Data and Tombstone SSTable have been marked as "Repaired" and 
anticompacted, they have had minor compactions with other SSTables containing 
keys from other ranges.  During full repair, if the last node to run it doesn't 
own this particular key in it's partitioner range, the Data and Tombstone 
SSTable will get anticompacted and marked as "Unrepaired".  Now in the next 
incremental repair, if the Data SSTable is involved in a minor compaction 
during the repair but the Tombstone SSTable is not, the resulting compacted 
SSTable will be marked "Unrepaired" and Tombstone SSTable is marked "Repaired".

Scenario 2:
Only the Data SSTable had minor compaction with other SSTables 
containing keys from other ranges after being marked as "Repaired".  The 
Tombstone SSTable was never involved in a minor compaction so therefore all 
keys in that SSTable belong to 1 particular partitioner range. During full 
repair, if the last node to run it doesn't own this particular key in it's 
partitioner range, the Data SSTable will get anticompacted and marked as 
"Unrepaired".   The Tombstone SSTable stays marked as Repaired.

Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
key, the tombstone will get compacted out.
Node 1 has nothing.
Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
SSTable) in separate SSTables.
Node 3 has nothing.

Now when the next incremental repair runs, it will only use the Data SSTable to 
build the merkle tree since the tombstone SSTable is flagged as repaired and 
data SSTable is marked as unrepaired.  And the data will get repaired against 
the other two nodes.
Node 1 has data.
Node 2 has data and tombstone in separate SSTables.
Node 3 has data.
If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 2, 
or 2 and 3, however, it would return no data.

Tested this with single range tokens for simplicity.


  was:
This happens for both LeveledCompactionStrategy and 
SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 but 
it most likely also affects all Cassandra versions with incremental repair - 
like 2.1 and 3.0.

When mixing incremental and full repairs, there are a few scenarios where the 
Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
repaired.  Then if it is past gc_grace, and the tombstone and data has been 
compacted out on other replicas, the next incremental repair will push the Data 
to other replicas without the tombstone.

Simplified scenario:
3 node cluster with RF=3
Intial config:
Node 1 has data and tombstone in separate SSTables.
Node 2 has data and no tombstone.
Node 3 has data and tombstone in separate SSTables.

Incremental repair (nodetool repair -pr) is run every day so now we have 
tombstone on each node.
Some minor compactions have happened since so data and tombstone get merged to 
1 SSTable on Nodes 1 and 3.
Node 1 had a minor compaction that merged data with tombstone. 1 
SSTable with tombstone.
Node 2 has data and tombstone in separate SSTables.
Node 3 had a minor compaction that merged data with 

[jira] [Commented] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-01-26 Thread Amanda Debrot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839768#comment-15839768
 ] 

Amanda Debrot commented on CASSANDRA-13153:
---

Hi Stefan,

Yes true, it should just affect Cassandra 2.2+ versions.  I forgot about that 
point with 2.1.  I'll update the "since version".  Thanks!

> Reappeared Data when Mixing Incremental and Full Repairs
> 
>
> Key: CASSANDRA-13153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction, Tools
> Environment: Apache Cassandra 2.2
>Reporter: Amanda Debrot
>  Labels: Cassandra
> Attachments: log-Reappeared-Data.txt, 
> Step-by-Step-Simulate-Reappeared-Data.txt
>
>
> This happens for both LeveledCompactionStrategy and 
> SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 
> but it most likely also affects all Cassandra versions with incremental 
> repair - like 2.1 and 3.0.
> When mixing incremental and full repairs, there are a few scenarios where the 
> Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
> repaired.  Then if it is past gc_grace, and the tombstone and data has been 
> compacted out on other replicas, the next incremental repair will push the 
> Data to other replicas without the tombstone.
> Simplified scenario:
> 3 node cluster with RF=3
> Intial config:
>   Node 1 has data and tombstone in separate SSTables.
>   Node 2 has data and no tombstone.
>   Node 3 has data and tombstone in separate SSTables.
> Incremental repair (nodetool repair -pr) is run every day so now we have 
> tombstone on each node.
> Some minor compactions have happened since so data and tombstone get merged 
> to 1 SSTable on Nodes 1 and 3.
>   Node 1 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 had a minor compaction that merged data with tombstone. 1 
> SSTable with tombstone.
> Incremental repairs keep running every day.
> Full repairs run weekly (nodetool repair -full -pr). 
> Now there are 2 scenarios where the Data SSTable will get marked as 
> "Unrepaired" while Tombstone SSTable will get marked as "Repaired".
> Scenario 1:
> Since the Data and Tombstone SSTable have been marked as "Repaired" 
> and anticompacted, they have had minor compactions with other SSTables 
> containing keys from other ranges.  During full repair, if the last node to 
> run it doesn't own this particular key in it's partitioner range, the Data 
> and Tombstone SSTable will get anticompacted and marked as "Unrepaired".  Now 
> in the next incremental repair, if the Data SSTable is involved in a minor 
> compaction during the repair but the Tombstone SSTable is not, the resulting 
> compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked 
> "Repaired".
> Scenario 2:
> Only the Data SSTable had minor compaction with other SSTables 
> containing keys from other ranges after being marked as "Repaired".  The 
> Tombstone SSTable was never involved in a minor compaction so therefore all 
> keys in that SSTable belong to 1 particular partitioner range. During full 
> repair, if the last node to run it doesn't own this particular key in it's 
> partitioner range, the Data SSTable will get anticompacted and marked as 
> "Unrepaired".   The Tombstone SSTable stays marked as Repaired.
> Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
> key, the tombstone will get compacted out.
>   Node 1 has nothing.
>   Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
> SSTable) in separate SSTables.
>   Node 3 has nothing.
> Now when the next incremental repair runs, it will only use the Data SSTable 
> to build the merkle tree since the tombstone SSTable is flagged as repaired 
> and data SSTable is marked as unrepaired.  And the data will get repaired 
> against the other two nodes.
>   Node 1 has data.
>   Node 2 has data and tombstone in separate SSTables.
>   Node 3 has data.
> If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 
> 2, or 2 and 3, however, it would return no data.
> Tested this with single range tokens for simplicity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-01-25 Thread Amanda Debrot (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amanda Debrot updated CASSANDRA-13153:
--
Description: 
This happens for both LeveledCompactionStrategy and 
SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 but 
it most likely also affects all Cassandra versions with incremental repair - 
like 2.1 and 3.0.

When mixing incremental and full repairs, there are a few scenarios where the 
Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
repaired.  Then if it is past gc_grace, and the tombstone and data has been 
compacted out on other replicas, the next incremental repair will push the Data 
to other replicas without the tombstone.

Simplified scenario:
3 node cluster with RF=3
Intial config:
Node 1 has data and tombstone in separate SSTables.
Node 2 has data and no tombstone.
Node 3 has data and tombstone in separate SSTables.

Incremental repair (nodetool repair -pr) is run every day so now we have 
tombstone on each node.
Some minor compactions have happened since so data and tombstone get merged to 
1 SSTable on Nodes 1 and 3.
Node 1 had a minor compaction that merged data with tombstone. 1 
SSTable with tombstone.
Node 2 has data and tombstone in separate SSTables.
Node 3 had a minor compaction that merged data with tombstone. 1 
SSTable with tombstone.

Incremental repairs keep running every day.
Full repairs run weekly (nodetool repair -full -pr). 
Now there are 2 scenarios where the Data SSTable will get marked as 
"Unrepaired" while Tombstone SSTable will get marked as "Repaired".

Scenario 1:
Since the Data and Tombstone SSTable have been marked as "Repaired" and 
anticompacted, they have had minor compactions with other SSTables containing 
keys from other ranges.  During full repair, if the last node to run it doesn't 
own this particular key in it's partitioner range, the Data and Tombstone 
SSTable will get anticompacted and marked as "Unrepaired".  Now in the next 
incremental repair, if the Data SSTable is involved in a minor compaction 
during the repair but the Tombstone SSTable is not, the resulting compacted 
SSTable will be marked "Unrepaired" and Tombstone SSTable is marked "Repaired".

Scenario 2:
Only the Data SSTable had minor compaction with other SSTables 
containing keys from other ranges after being marked as "Repaired".  The 
Tombstone SSTable was never involved in a minor compaction so therefore all 
keys in that SSTable belong to 1 particular partitioner range. During full 
repair, if the last node to run it doesn't own this particular key in it's 
partitioner range, the Data SSTable will get anticompacted and marked as 
"Unrepaired".   The Tombstone SSTable stays marked as Repaired.

Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
key, the tombstone will get compacted out.
Node 1 has nothing.
Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
SSTable) in separate SSTables.
Node 3 has nothing.

Now when the next incremental repair runs, it will only use the Data SSTable to 
build the merkle tree since the tombstone SSTable is flagged as repaired and 
data SSTable is marked as unrepaired.  And the data will get repaired against 
the other two nodes.
Node 1 has data.
Node 2 has data and tombstone in separate SSTables and levels.
Node 3 has data.
If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 2, 
or 2 and 3, however, it would return no data.

Tested this with single range tokens for simplicity.


  was:
This happens for both LeveledCompactionStrategy and 
SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 but 
it most likely also affects all Cassandra versions with incremental repair - 
like 2.1 and 3.0.

When mixing incremental and full repairs, there are a few scenarios where the 
Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
repaired.  Then if it is past gc_grace, and the tombstone and data has been 
compacted out on other replicas, the next incremental repair will push the Data 
to other replicas without the tombstone.

Simplified scenario:
3 node cluster with RF=3
Intial config:
Node 1 has data and tombstone in separate SSTables.
Node 2 has data and no tombstone.
Node 3 has data and tombstone in separate SSTables.

Incremental repair (nodetool repair -pr) is run every day so now we have 
tombstone on each node.
Some minor compactions have happened since so data and tombstone get merged to 
1 SSTable on Nodes 1 and 3.
Node 1 had a minor compaction that merged data with tombstone. 1 
SSTable with tombstone.
Node 2 has data and tombstone in separate SSTables.
Node 3 had a minor compaction that merged data with 

[jira] [Updated] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-01-25 Thread Amanda Debrot (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amanda Debrot updated CASSANDRA-13153:
--
Description: 
This happens for both LeveledCompactionStrategy and 
SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 but 
it most likely also affects all Cassandra versions with incremental repair - 
like 2.1 and 3.0.

When mixing incremental and full repairs, there are a few scenarios where the 
Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
repaired.  Then if it is past gc_grace, and the tombstone and data has been 
compacted out on other replicas, the next incremental repair will push the Data 
to other replicas without the tombstone.

Simplified scenario:
3 node cluster with RF=3
Intial config:
Node 1 has data and tombstone in separate SSTables.
Node 2 has data and no tombstone.
Node 3 has data and tombstone in separate SSTables.

Incremental repair (nodetool repair -pr) is run every day so now we have 
tombstone on each node.
Some minor compactions have happened since so data and tombstone get merged to 
1 SSTable on Nodes 1 and 3.
Node 1 had a minor compaction that merged data with tombstone. 1 
SSTable with tombstone.
Node 2 has data and tombstone in separate SSTables.
Node 3 had a minor compaction that merged data with tombstone. 1 
SSTable with tombstone.

Incremental repairs keep running every day.
Full repairs run weekly (nodetool repair -full -pr). 
Now there are 2 scenarios where the Data SSTable will get marked as 
"Unrepaired" while Tombstone SSTable will get marked as "Repaired".

Scenario 1:
Since the Data and Tombstone SSTable have been marked as "Repaired" and 
anticompacted, they have had minor compactions with other SSTables containing 
keys from other ranges.  During full repair, if the last node to run it doesn't 
own this particular key in it's partitioner range, the Data and Tombstone 
SSTable will get anticompacted and marked as "Unrepaired".  Now in the next 
incremental repair, if the Data SSTable is involved in a minor compaction 
during the repair but the Tombstone SSTable is not, the resulting compacted 
SSTable will be marked "Unrepaired" and Tombstone SSTable is marked "Repaired".

Scenario 2:
Only the Data SSTable had minor compaction with other SSTables 
containing keys from other ranges after being marked as "Repaired".  The 
Tombstone SSTable was never involved in a minor compaction so therefore all 
keys in that SSTable belong to 1 particular partitioner range. During full 
repair, if the last node to run it doesn't own this particular key in it's 
partitioner range, the Data SSTable will get anticompacted and marked as 
"Unrepaired".   The Tombstone SSTable stays marked as Repaired.

Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
key, the tombstone will get compacted out.
Node 1 has nothing.
Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
SSTable) in separate SSTables.
Node 3 has nothing.

Now when the next incremental repair runs, it will only use the Data SSTable to 
build the merkle tree since the tombstone SSTable is flagged as repaired and 
data SSTable is marked as unrepaired.  And the data will get repaired against 
the other two nodes.
Node 1 has data.
Node 2 has data and tombstone in separate SSTables.
Node 3 has data.
If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 2, 
or 2 and 3, however, it would return no data.

Tested this with single range tokens for simplicity.


  was:
This happens for both LeveledCompactionStrategy and 
SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 but 
it most likely also affects all Cassandra versions with incremental repair - 
like 2.1 and 3.0.

When mixing incremental and full repairs, there are a few scenarios where the 
Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
repaired.  Then if it is past gc_grace, and the tombstone and data has been 
compacted out on other replicas, the next incremental repair will push the Data 
to other replicas without the tombstone.

Simplified scenario:
3 node cluster with RF=3
Intial config:
Node 1 has data and tombstone in separate SSTables.
Node 2 has data and no tombstone.
Node 3 has data and tombstone in separate SSTables.

Incremental repair (nodetool repair -pr) is run every day so now we have 
tombstone on each node.
Some minor compactions have happened since so data and tombstone get merged to 
1 SSTable on Nodes 1 and 3.
Node 1 had a minor compaction that merged data with tombstone. 1 
SSTable with tombstone.
Node 2 has data and tombstone in separate SSTables.
Node 3 had a minor compaction that merged data with tombstone. 1 

[jira] [Created] (CASSANDRA-13153) Reappeared Data when Mixing Incremental and Full Repairs

2017-01-25 Thread Amanda Debrot (JIRA)
Amanda Debrot created CASSANDRA-13153:
-

 Summary: Reappeared Data when Mixing Incremental and Full Repairs
 Key: CASSANDRA-13153
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13153
 Project: Cassandra
  Issue Type: Bug
  Components: Compaction, Tools
 Environment: Apache Cassandra 2.2
Reporter: Amanda Debrot
 Attachments: log-Reappeared-Data.txt, 
Step-by-Step-Simulate-Reappeared-Data.txt

This happens for both LeveledCompactionStrategy and 
SizeTieredCompactionStrategy.  I've only tested it on Cassandra version 2.2 but 
it most likely also affects all Cassandra versions with incremental repair - 
like 2.1 and 3.0.

When mixing incremental and full repairs, there are a few scenarios where the 
Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as 
repaired.  Then if it is past gc_grace, and the tombstone and data has been 
compacted out on other replicas, the next incremental repair will push the Data 
to other replicas without the tombstone.

Simplified scenario:
3 node cluster with RF=3
Intial config:
Node 1 has data and tombstone in separate SSTables.
Node 2 has data and no tombstone.
Node 3 has data and tombstone in separate SSTables.

Incremental repair (nodetool repair -pr) is run every day so now we have 
tombstone on each node.
Some minor compactions have happened since so data and tombstone get merged to 
1 SSTable on Nodes 1 and 3.
Node 1 had a minor compaction that merged data with tombstone. 1 
SSTable with tombstone.
Node 2 has data and tombstone in separate SSTables.
Node 3 had a minor compaction that merged data with tombstone. 1 
SSTable with tombstone.

Incremental repairs keep running every day.
Full repairs run weekly (nodetool repair -full -pr). 
Now there are 2 scenarios where the Data SSTable will get marked as 
"Unrepaired" while Tombstone SSTable will get marked as "Repaired".

Scenario 1:
Since the Data and Tombstone SSTable have been marked as "Repaired" and 
anticompacted, they have had minor compactions with other SSTables containing 
keys from other ranges.  During full repair, if the last node to run it doesn't 
own this particular key in it's partitioner range, the Data and Tombstone 
SSTable will get anticompacted and marked as "Unrepaired".  Now in the next 
incremental repair, if the Data SSTable is involved in a minor compaction 
during the repair but the Tombstone SSTable is not, the resulting compacted 
SSTable will be marked "Unrepaired" and Tombstone SSTable is marked "Repaired".

Scenario 2:
Only the Data SSTable had minor compaction with other SSTables 
containing keys from other ranges after being marked as "Repaired".  The 
Tombstone SSTable was never involved in a minor compaction so therefore all 
keys in that SSTable belong to 1 particular partitioner range. During full 
repair, if the last node to run it doesn't own this particular key in it's 
partitioner range, the Data SSTable will get anticompacted and marked as 
"Unrepaired".   The Tombstone SSTable stays marked as Repaired.

Then it’s past gc_grace.  Since Node’s #1 and #3 only have 1 SSTable for that 
key, the tombstone will get compacted out.
Node 1 has nothing.
Node 2 has data (in unrepaired SSTable) and tombstone (in repaired 
SSTable) in separate SSTables and levels.
Node 3 has nothing.

Now when the next incremental repair runs, it will only use the Data SSTable to 
build the merkle tree since the tombstone SSTable is flagged as repaired and 
data SSTable is marked as unrepaired.  And the data will get repaired against 
the other two nodes.
Node 1 has data.
Node 2 has data and tombstone in separate SSTables and levels.
Node 3 has data.
If a read request hits Node 1 and 3, it will return data.  If it hits 1 and 2, 
or 2 and 3, however, it would return no data.

Tested this with single range tokens for simplicity.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)