maedhroz commented on code in PR #2460:
URL: https://github.com/apache/cassandra/pull/2460#discussion_r1253762041


##########
src/java/org/apache/cassandra/index/sai/StorageAttachedIndexGroup.java:
##########
@@ -250,15 +250,15 @@ public void handleNotification(INotification 
notification, Object sender)
             SSTableAddedNotification notice = (SSTableAddedNotification) 
notification;
 
             // Avoid validation for index files just written following 
Memtable flush.
-            boolean validate = !notice.memtable().isPresent();
+            IndexValidation validate = notice.memtable().isPresent() ? 
IndexValidation.NONE : IndexValidation.CHECKSUM;

Review Comment:
   One of the general things I looked at while reviewing here was whether the 
notification order of the different `INotificationConsumer` things matters. 
`StorageAttachedIndexGroup` and `SecondaryIndexManager` in particular are the 
interesting ones. Basically...do we checksum the things we should and not the 
things we shouldn't, given we happen to have SIM registered before the index 
group? Here's what I've come up with:
   
   1.) When we stream entire SSTables w/ their attached index components (or 
locally import new SSTables w/ index components already built), SIM handles the 
notification, but the index build it triggers should be a no-op. Even if it 
built something it wouldn't register as a full rebuild and no checksumming 
would be done. Then when `StorageAttachedIndexGroup` handles the notification 
here, it does the checksum, effectively protecting us from corruption 
introduced by a streaming source. This all seems reasonable.
   
   2.) When we don't stream an entire SSTable during repair or partial SSTable 
bootstrapping (or locally import new SSTables w/ index components not already 
built), SIM handles the notification and blocks to build SSTable-attached index 
bits. Then (again, this is an accident of our ordering) 
`StorageAttachedIndexGroup` handles the notification. It sees that there isn't 
a Memtable and validates the checksum. _This isn't the end of the world, but it 
would seem unnecessary, given we just build the indexes from the streamed 
SSTables._ If we knew that all of the added SSTables didn't come with SAI 
components, we'd bypass checksumming, right?
   
   I'm not sure how exactly we want to address this, or whether we would want 
to address it in this patch (although that might be nice, since we have it 
loaded into our working memories), but perhaps we could agree that it's an 
issue (or not, and that I'm missing something) and go from there?
   
   Two things that come to mind to "fix" this are a.) having more information 
in the notification or b.) having the index group handle notifications *before* 
SIM does. (SIM only handles `SSTableAddedNotification` FWIW).
   
   CC @mike-tr-adamson @adelapena 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to