[
https://issues.apache.org/jira/browse/CASSANDRA-18656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741215#comment-17741215
]
Caleb Rackliffe edited comment on CASSANDRA-18656 at 7/14/23 1:32 AM:
--
One way we might address this is by making sure streamed SSTables and the
indexes attached to them both fall within the scope of the {{STREAM}}
transaction.
The current SSTable streaming process is roughly:
1.) stream an SSTable
2.) commit the streaming transaction w/ the SSTable
3.) add the SSTable to the column family
4.) via listener notification as part of 3, index the new SSTable in a blocking
fashion
2 and 3 are in this order, because if 3 came before 2, the new SSTable could
participate in reads, the node could die before the transaction committed, and
the SSTable would be gone after restart. The problem is that if the node dies
while 4 is in progress, the node will come back up thinking that the streaming
operation was wholly successful, and allow startup to complete. The index in
question will be rebuilt, but that rebuild will not block startup, and the
index will be unusable while that happens.
I propose that we move 4 between 1 and 2. This way the SSTable and related
indexes are ready to query and we commit the transaction, or the transaction is
simply considered failed on restart. (i.e. On restart, it would just be as the
streaming had never occurred.) Doing this should make the system of marking the
index unbuilt and then built again irrelevant across restart as well, although
I'm not entirely sure that would roll back any of the complexity of
CASSANDRA-10130 and CASSANDRA-13725. {{SecondaryIndexManager}} currently
handles {{SSTableAddedNotification}} for more than just streaming, and we would
have to take care that we leave those cases intact (SSTable import, etc.),
although they may suffer from similar problems.
EDIT: This might not be a viable solution for legacy 2i...see below...
was (Author: maedhroz):
One way we can address this is by making sure streamed SSTables and the indexes
attached to them both fall within the scope of the {{STREAM}} transaction.
The current SSTable streaming process is roughly:
1.) stream an SSTable
2.) commit the streaming transaction w/ the SSTable
3.) add the SSTable to the column family
4.) via listener notification as part of 3, index the new SSTable in a blocking
fashion
2 and 3 are in this order, because if 3 came before 2, the new SSTable could
participate in reads, the node could die before the transaction committed, and
the SSTable would be gone after restart. The problem is that if the node dies
while 4 is in progress, the node will come back up thinking that the streaming
operation was wholly successful, and allow startup to complete. The index in
question will be rebuilt, but that rebuild will not block startup, and the
index will be unusable while that happens.
I propose that we move 4 between 1 and 2. This way the SSTable and related
indexes are ready to query and we commit the transaction, or the transaction is
simply considered failed on restart. (i.e. On restart, it would just be as the
streaming had never occurred.) Doing this should make the system of marking the
index unbuilt and then built again irrelevant across restart as well, although
I'm not entirely sure that would roll back any of the complexity of
CASSANDRA-10130 and CASSANDRA-13725. {{SecondaryIndexManager}} currently
handles {{SSTableAddedNotification}} for more than just streaming, and we would
have to take care that we leave those cases intact (SSTable import, etc.),
although they may suffer from similar problems.
> Ensure SSTable streaming transactions do not commit before building attached
> secondary indexes
> --
>
> Key: CASSANDRA-18656
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18656
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Streaming, Feature/2i Index, Feature/SAI,
> Local/Startup and Shutdown
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.x
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Back in 2015, we identified in CASSANDRA-10130 a case where failures in 2i
> builds after SSTable streaming could leave indexes in a partially built
> state, even after a restart, requiring manual operator intervention. There,
> and in CASSANDRA-13725, we made an attempt to remedy this situation, ensuring
> that indexes would at least be rebuilt on restart after this kind of failure.
> However, there are some difficulties the solution there does not address.
> Let's look at a simple example...
> Suppose an SSTable has been streamed to a