adelapena commented on pull request #570:
URL: https://github.com/apache/cassandra/pull/570#issuecomment-631559153


   I have left a few suggested changes 
[here](https://github.com/adelapena/cassandra/commit/dc0030ea4e961a2f17ebf8cc486963eec0c1e037).
 Changes for the dtest are 
[here](https://github.com/adelapena/cassandra-dtest/commit/758508b7759dfe13af724df68ee80834662ac013).
   
   As mentioned in my previous comment, the sequence of operations in 
`SIM#rebuildIndexesBlocking` is modified to:
   1. Set indexes as writable to not miss incoming writes.
   2. Flush the base table if some index wasn't writable before, so the 
potentially unindexed memtable contents are moved to the set of sstables to be 
indexed.
   3. Acquire sstable refs and rebuild.
   
   I have also done some modifications to the way we determine whether we 
should use `Index#getBuildTaskSupport` or `Index#getRecoveryTaskSupport`. 
`SIM#buildIndexesBlocking` is only called by either 
`SIM#rebuildIndexesBlocking` or `SIM#handleNotification`. I understand that if 
the caller is `rebuildIndexesBlocking` we are always doing a full rebuild, and 
probably the only reason to do that is that we are attempting to recover a 
damaged index. Conversely, if the caller is `handleNotification` we are doing a 
partial build of new data that is unable to recover previous errors. This is 
why I think that the `isFullRebuild` argument is enough to determine whether we 
are recovering or indexing new data. This makes full rebuild independent of the 
status of the index, so any call to `nodetool rebuild_index` will involve 
`getRecoveryTaskSupport`, which is intended to do any initialization work that 
might be needed prior to a full index build. Do you think this assumption is 
correct, or are we interested in treating separately full rebuilds without a 
previous failure?
   
   Unit tests are also slightly modified to reflect the changes. 
`SecondaryIndexManagerTest` is extended with queryability/writability checks 
over a write-only-on-failure index.
   
   Summarizing, I think that at this point the patch achieves the following:
   * Full index rebuilds give indexes the opportunity to run specific rebuild 
logic, including initialization. This logic can be different from the logic 
that is used when regularly indexing new sstables.
   * If any full rebuild or a partial build task fails, the index can start to 
reject queries and ignore writes until a full rebuild is successfully run. 
Whether it rejects queries or ignores writes is defined by the index 
implementation.
   * Rebuilt indexes don't miss data after having been ignoring writes.
   * Existing indexes preserve their original behaviour. They always accept 
writes and only reject reads if the initial build has failed.
   * The index API is backwards compatible.
   
   Do you think it makes any sense?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org

Reply via email to