[GitHub] [cassandra] adelapena commented on a change in pull request #570: CASSANDRA-13606 Improve handling of 2i initialization failures
adelapena commented on a change in pull request #570: URL: https://github.com/apache/cassandra/pull/570#discussion_r427459670 ## File path: src/java/org/apache/cassandra/index/SecondaryIndexManager.java ## @@ -655,6 +695,9 @@ private synchronized void markIndexFailed(Index index) SystemKeyspace.setIndexRemoved(baseCfs.keyspace.getName(), indexName); needsFullRebuild.add(indexName); + +if (!index.getSupportedLoadTypeOnFailure().supportsWrites() && writableIndexes.remove(indexName) != null) +logger.info("Index [" + indexName + "] became not-writable."); Review comment: I think that the flush should happed right after we have ensured that the indexes are marked as writable. Otherwise, we might have new untracked writes populating the new memtable while the indexes are not writable. In the same vein, the sstable selection should happen after we have marked the indexes as writable. I have almost ready a commit with this and a few minor suggestions, I'll push it tomorrow. We're almost there! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org
[GitHub] [cassandra] adelapena commented on a change in pull request #570: CASSANDRA-13606 Improve handling of 2i initialization failures
adelapena commented on a change in pull request #570: URL: https://github.com/apache/cassandra/pull/570#discussion_r426791652 ## File path: src/java/org/apache/cassandra/index/SecondaryIndexManager.java ## @@ -655,6 +695,9 @@ private synchronized void markIndexFailed(Index index) SystemKeyspace.setIndexRemoved(baseCfs.keyspace.getName(), indexName); needsFullRebuild.add(indexName); + +if (!index.getSupportedLoadTypeOnFailure().supportsWrites() && writableIndexes.remove(indexName) != null) +logger.info("Index [" + indexName + "] became not-writable."); Review comment: Hi @bereng. I understand that the original behaviour was: ```java return isInitialBuild ? LoadType.WRITE : LoadType.ALL; ``` That's because the indexes were always writable independently of any kind of build failures, there wasn't even a notion of writability. And, once they were queryable once, they were queryable forever. Just took a quick look at the `SIMTest` failures, I think they are happening because `TestingIndex.shouldFailCreate` is wrongly missed from `TestingIndex.clear()`. Once it's fixed to reset it, and we change a couple of assertions about writability in `initializingIndexNotQueryableButWritableAfterPartialRebuild`, all the tests seem to pass, at least locally. I'll take a closer look tomorrow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org
[GitHub] [cassandra] adelapena commented on a change in pull request #570: CASSANDRA-13606 Improve handling of 2i initialization failures
adelapena commented on a change in pull request #570: URL: https://github.com/apache/cassandra/pull/570#discussion_r426566283 ## File path: src/java/org/apache/cassandra/index/SecondaryIndexManager.java ## @@ -655,6 +695,9 @@ private synchronized void markIndexFailed(Index index) SystemKeyspace.setIndexRemoved(baseCfs.keyspace.getName(), indexName); needsFullRebuild.add(indexName); + +if (!index.getSupportedLoadTypeOnFailure().supportsWrites() && writableIndexes.remove(indexName) != null) +logger.info("Index [" + indexName + "] became not-writable."); Review comment: > Is it reasonable a 'not ready' index should be serving reads or writes? For initial build it should't serve reads, but for rebuilds it's more complicated because the index could still be in good condition for most reads. IMO if we were creating the index API from scratch or adding a new implementation probably the best option would be to set not properly (re)built indexes as unable to serve reads and writes. But, given that we have had the cf-based implementation around for a while, I think we should by now preserve their original behaviour and provide the API mechanisms to change that in new implementations, and perhaps also in the standard indexes in the future. That and the recovery task seem a pair of nice improvements to ship with this ticket. Accepting reads and writes when the rebuild has failed makes some sense in the particular case of cf-based regular indexes because they don't have an initialization task and they are based on idempotent operations on the underlaying column family. I'd say they focus in availability over consistency because even failed rebuilds always leave the index is same or better condition than before. In contrast, it's easy to imagine other implementations where a failed rebuild leaves the index completely corrupt and unusable. I think there could easily be use cases out there relying on this behaviour, and I'm afraid of arbitrarily changing that behaviour without coming back. If we still want to move to the new approach, either in this ticket or in a dedicated one, we should probably open a discussion on the mail list to see if there are deployments relaying on the classic behaviour. Another tempting option to ease the migration of cf-based indexes to the new consistency-over-availability approach would be making the enum returned by `getSupportedLoadTypeOnFailure` configurable through index options, for example: ``` CREATE INDEX my_idx ON t WITH OPTIONS = {'supported_load_on_failed_init': 'WRITE', 'supported_load_on_failed_rebuild': 'ALL'} ``` This is something that perhaps we could do in the followup ticket to change the default behaviour of regular cf-based indexes, while keeping this ticket focused on the API changes and the recovery task. WDYT? Maybe I'm wrong and there isn't anyone out there relying on the old 2i behaviour and we should go straight to the the new failed-rebuild-means-broken behaviour. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org
[GitHub] [cassandra] adelapena commented on a change in pull request #570: CASSANDRA-13606 Improve handling of 2i initialization failures
adelapena commented on a change in pull request #570: URL: https://github.com/apache/cassandra/pull/570#discussion_r426566283 ## File path: src/java/org/apache/cassandra/index/SecondaryIndexManager.java ## @@ -655,6 +695,9 @@ private synchronized void markIndexFailed(Index index) SystemKeyspace.setIndexRemoved(baseCfs.keyspace.getName(), indexName); needsFullRebuild.add(indexName); + +if (!index.getSupportedLoadTypeOnFailure().supportsWrites() && writableIndexes.remove(indexName) != null) +logger.info("Index [" + indexName + "] became not-writable."); Review comment: > Is it reasonable a 'not ready' index should be serving reads or writes? For initial build it should't serve reads, but for rebuilds it's more complicated because the index could still be in good condition. IMO if we were creating the index API from scratch or adding a new implementation probably the best option would be to set not properly (re)built index as unable to serve reads and writes. But, given that we have had the cf-based implementation around for a while, I think we should by now preserve their original behaviour and provide the API mechanisms to change that in new implementations and perhaps in the standard indexes in the future. That and the recovery task seems a pair of nice improvements to ship with this ticket. Accepting reads and writes when failed makes some sense in the particular case of cf-based regular indexes because they don't have an initialization task and they are based on idempotent operations on the underlaying column family. I'd say they focus in availability over consistency because even failed rebuilds always leave the index is same or better condition than before. In contrast, it's easy to imagine other implementations where a failed rebuild leaves the index completely corrupt and unusable. I think there could easily be use cases out there relying on this behaviour, and I'm afraid of arbitrarily changing that behaviour without coming back. If we still want to move to the new approach, either in this ticket or in a dedicated one, we should probably open a discussion in the mail list to see if there are deployments relaying on the classic behaviour. Another tempting option to ease the migration of cf-based indexes to the new consistency-over-availability approach would be making the enum returned by `getSupportedLoadTypeOnFailure` configurable through index options, for example: ``` CREATE INDEX my_idx ON t WITH OPTIONS = {'supported_load_on_failed_init': 'WRITE', 'supported_load_on_failed_rebuild': 'ALL'} ``` This is something that perhaps we could do in the ticket to change the default behaviour of regular cf-based indexes, while keeping this ticket focused on the API changes and the recovery task. WDYT? Maybe I'm wrong and there isn't anyone out there relying on the old 2i behaviour and we should go straight to the the new failed-rebuild-means-broken behaviour. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org
[GitHub] [cassandra] adelapena commented on a change in pull request #570: CASSANDRA-13606 Improve handling of 2i initialization failures
adelapena commented on a change in pull request #570: URL: https://github.com/apache/cassandra/pull/570#discussion_r425925073 ## File path: src/java/org/apache/cassandra/index/SecondaryIndexManager.java ## @@ -655,6 +695,9 @@ private synchronized void markIndexFailed(Index index) SystemKeyspace.setIndexRemoved(baseCfs.keyspace.getName(), indexName); needsFullRebuild.add(indexName); + +if (!index.getSupportedLoadTypeOnFailure().supportsWrites() && writableIndexes.remove(indexName) != null) +logger.info("Index [" + indexName + "] became not-writable."); Review comment: Perhaps the problem was assigning `LoadType.NOOP` as the default behaviour. Indeed, removing that line leaves `LoadType#supportsReads()` without usages. I think that, to preserve the behaviour of the default indexes, and to have more flexibility, we should keep that line and make a distinction between failures in the initial build task and failures in rebuilds. As I mention on the dtest review, the classic behaviour would be: ```java /** * Returns the type of operations supported by the index in case its building has failed and it's needing recovery. * * @param isInitialBuild {@code true} if the failure is for the initial build task on index creation, {@code false} * if the failure is for a rebuild or recovery. */ default LoadType getSupportedLoadTypeOnFailure(boolean isInitialBuild) { return isInitialBuild ? LoadType.WRITE : LoadType.ALL; } ``` I think that each behaviour has advantages and disadvantages depending on the use case and, while I'm quite happy with the changes in the index API, I'm a bit afraid of changing the behaviour of the standard index implementation. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org
[GitHub] [cassandra] adelapena commented on a change in pull request #570: CASSANDRA-13606 Improve handling of 2i initialization failures
adelapena commented on a change in pull request #570: URL: https://github.com/apache/cassandra/pull/570#discussion_r425925073 ## File path: src/java/org/apache/cassandra/index/SecondaryIndexManager.java ## @@ -655,6 +695,9 @@ private synchronized void markIndexFailed(Index index) SystemKeyspace.setIndexRemoved(baseCfs.keyspace.getName(), indexName); needsFullRebuild.add(indexName); + +if (!index.getSupportedLoadTypeOnFailure().supportsWrites() && writableIndexes.remove(indexName) != null) +logger.info("Index [" + indexName + "] became not-writable."); Review comment: Perhaps the problem was assigning `LoadType.NOOP` as the default behaviour. Indeed, removing that line leaves `LoadType#supportsReads()` without usages. I think that, to preserve the behaviour of the default indexes, and to have more flexibility, we should keep that line and make a distinction between failures in the initial build task and failures in rebuilds. As I mention on the dtest review, the classic behaviour would be: ```java /** * Returns the type of operations supported by the index in case its building has failed and it's needing recovery. * * @param isInitialBuild {@code true} if the failure is for the initial build task on index creation, {@code false} * if the failure is for a rebuild or recovery. */ default LoadType getSupportedLoadTypeOnFailure(boolean isInitialBuild) { return isInitialBuild ? LoadType.WRITE : LoadType.ALL; } ``` I think that all the approaches have advantages and disadvantages depending on the use case and, while I'm quite happy with the changes in the index API, I'm a bit afraid of changing the behaviour of the standard index implementation. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org
[GitHub] [cassandra] adelapena commented on a change in pull request #570: CASSANDRA-13606 Improve handling of 2i initialization failures
adelapena commented on a change in pull request #570: URL: https://github.com/apache/cassandra/pull/570#discussion_r425925073 ## File path: src/java/org/apache/cassandra/index/SecondaryIndexManager.java ## @@ -655,6 +695,9 @@ private synchronized void markIndexFailed(Index index) SystemKeyspace.setIndexRemoved(baseCfs.keyspace.getName(), indexName); needsFullRebuild.add(indexName); + +if (!index.getSupportedLoadTypeOnFailure().supportsWrites() && writableIndexes.remove(indexName) != null) +logger.info("Index [" + indexName + "] became not-writable."); Review comment: I'm not sure this is inline with the `LoadType.NOOP` behaviour of the indexes. Indeed, it leaves `LoadType#supportsReads()` without usages. I think that, to preserve the behaviour of the default indexes, we should make a distinction between failures in the initial build task and failures in rebuilds. As I mention on the dtest review, the classic behaviour would be: ```java /** * Returns the type of operations supported by the index in case its building has failed and it's needing recovery. * * @param isInitialBuild {@code true} if the failure is for the initial build task on index creation, {@code false} * if the failure is for a rebuild or recovery. */ default LoadType getSupportedLoadTypeOnFailure(boolean isInitialBuild) { return isInitialBuild ? LoadType.WRITE : LoadType.ALL; } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org
[GitHub] [cassandra] adelapena commented on a change in pull request #570: CASSANDRA-13606 Improve handling of 2i initialization failures
adelapena commented on a change in pull request #570: URL: https://github.com/apache/cassandra/pull/570#discussion_r425925073 ## File path: src/java/org/apache/cassandra/index/SecondaryIndexManager.java ## @@ -655,6 +695,9 @@ private synchronized void markIndexFailed(Index index) SystemKeyspace.setIndexRemoved(baseCfs.keyspace.getName(), indexName); needsFullRebuild.add(indexName); + +if (!index.getSupportedLoadTypeOnFailure().supportsWrites() && writableIndexes.remove(indexName) != null) +logger.info("Index [" + indexName + "] became not-writable."); Review comment: I'm not sure this is inline with the `LoadType.NOOP` behaviour of the indexes. Indeed, it leaves `LoadType#supportsReads()` without usages. I think that, to preserve the behaviour of the default indexes, we should make a distinction between failures in the initial build task and failures in rebuilds. As I mention on the dtest review, the current behaviour would be: ```java /** * Returns the type of operations supported by the index in case its building has failed and it's needing recovery. * * @param isInitialBuild {@code true} if the failure is for the initial build task on index creation, {@code false} * if the failure is for a rebuild or recovery. */ default LoadType getSupportedLoadTypeOnFailure(boolean isInitialBuild) { return isInitialBuild ? LoadType.WRITE : LoadType.ALL; } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org
[GitHub] [cassandra] adelapena commented on a change in pull request #570: CASSANDRA-13606 Improve handling of 2i initialization failures
adelapena commented on a change in pull request #570: URL: https://github.com/apache/cassandra/pull/570#discussion_r421413678 ## File path: src/java/org/apache/cassandra/index/Index.java ## @@ -136,6 +135,7 @@ */ public interface Index { +public static enum LoadType {READ, WRITE, ALL, NONE}; Review comment: +1 to the new enum names. It would be nice to add a very brief comment about what it is, for the sake of the unaware reader. The static and the semicolon are not needed: ```suggestion public enum LoadType {READ, WRITE, ALL, NONE} ``` ## File path: src/java/org/apache/cassandra/index/internal/CassandraIndex.java ## @@ -146,6 +146,22 @@ protected abstract ByteBuffer getIndexedValue(ByteBuffer partitionKey, CellPath path, ByteBuffer cellValue); + +public boolean supportsLoad(LoadType load) +{ +switch (load) +{ +case ALL: +return supportedLoads == LoadType.ALL; +case READ: +return supportedLoads == LoadType.ALL || supportedLoads == LoadType.READ; +case WRITE: +return supportedLoads == LoadType.ALL || supportedLoads == LoadType.WRITE; +default: +return false; +} Review comment: This block of code is repeated in a couple of test index implementations, and it's to expect that other implementations would need to repeat it. I think that defining that `ALL` includes `READ` and `WRITE`, etc. seems more a characteristic of `LoadType` than the index itself, so perhaps we could move this logic to the enum itself, for example: ```java public enum LoadType { READ, WRITE, ALL, NONE; public boolean accepts(LoadType load) { switch (this) { case ALL: return true; case READ: return load == LoadType.READ || load == LoadType.NONE; case WRITE: return load == LoadType.WRITE || load == LoadType.NONE; default: return false; } } } ``` That way we would simplify usage later: ```java @Override public boolean supportsLoad(LoadType load) { return supportedLoadType.accepts(load); } ``` ## File path: src/java/org/apache/cassandra/index/SecondaryIndexManager.java ## @@ -338,20 +355,20 @@ public void markAllIndexesRemoved() public void rebuildIndexesBlocking(Set indexNames) { try (ColumnFamilyStore.RefViewFragment viewFragment = baseCfs.selectAndReference(View.selectFunction(SSTableSet.CANONICAL)); - Refs allSSTables = viewFragment.refs) -{ -Set toRebuild = indexes.values().stream() - .filter(index -> indexNames.contains(index.getIndexMetadata().name)) - .filter(Index::shouldBuildBlocking) - .collect(Collectors.toSet()); -if (toRebuild.isEmpty()) -{ -logger.info("No defined indexes with the supplied names: {}", Joiner.on(',').join(indexNames)); -return; -} - -buildIndexesBlocking(allSSTables, toRebuild, true); -} +Refs allSSTables = viewFragment.refs) + { + Set toRebuild = indexes.values().stream() + .filter(index -> indexNames.contains(index.getIndexMetadata().name)) + .filter(Index::shouldBuildBlocking) + .collect(Collectors.toSet()); + if (toRebuild.isEmpty()) + { + logger.info("No defined indexes with the supplied names: {}", Joiner.on(',').join(indexNames)); + return; + } + + buildIndexesBlocking(allSSTables, toRebuild, true); + } Review comment: Nit: this seems to have missed alignment with an extra tab ## File path: src/java/org/apache/cassandra/index/SecondaryIndexManager.java ## @@ -619,8 +642,16 @@ private synchronized void markIndexesBuilding(Set indexes, boolean isFull private synchronized void markIndexBuilt(Index index, boolean isFullRebuild) { String indexName = index.getIndexMetadata().name; -if (isFullRebuild) +if (isFullRebuild && index.supportsLoad(Index.LoadType.READ)) +{ queryableIndexes.add(indexName); +logger.info("Index [" + indexName + "] became queryable."); +} +if (isFullRebuild && index.supportsLoad(Index.LoadType.WRITE)) +{ +writableIndexes.add(indexName); +
[GitHub] [cassandra] adelapena commented on a change in pull request #570: CASSANDRA-13606 Improve handling of 2i initialization failures
adelapena commented on a change in pull request #570: URL: https://github.com/apache/cassandra/pull/570#discussion_r420736376 ## File path: test/unit/org/apache/cassandra/index/internal/CustomCassandraIndex.java ## @@ -94,6 +95,21 @@ protected boolean supportsOperator(ColumnMetadata indexedColumn, Operator operat { return operator.equals(Operator.EQ); } + +public boolean supportsLoad(Loads load) +{ +switch (load) +{ +case ALL: +return supportedLoads.equals(Loads.ALL); Review comment: Same as before, we can use `==` ## File path: src/java/org/apache/cassandra/index/internal/CassandraIndex.java ## @@ -146,6 +146,22 @@ protected abstract ByteBuffer getIndexedValue(ByteBuffer partitionKey, CellPath path, ByteBuffer cellValue); + +public boolean supportsLoad(Loads load) +{ +switch (load) +{ +case ALL: +return supportedLoads.equals(Loads.ALL); Review comment: Being an enum, we can use `==` instead of `equals` ## File path: src/java/org/apache/cassandra/index/Index.java ## @@ -136,6 +135,7 @@ */ public interface Index { +public static enum Loads {READS, WRITES, ALL, NONE}; Review comment: Using a plural name for a enum feels a little bit odd to me, what about something like `LoadType` or the more operation `OperationType`? Also I can't think of other future values for the enum given it's current usage, so I'm wondering if we could replace it by two booleans, implicit through a pair of `supportsWrites`/`supportsReads` methods. WDYT? ## File path: src/java/org/apache/cassandra/index/SecondaryIndexManager.java ## @@ -1126,17 +1165,26 @@ public UpdateTransaction newUpdateTransaction(PartitionUpdate update, WriteConte { if (!hasIndexes()) return UpdateTransaction.NO_OP; - -Index.Indexer[] indexers = indexes.values().stream() - .map(i -> i.indexerFor(update.partitionKey(), - update.columns(), - nowInSec, - ctx, - IndexTransaction.Type.UPDATE)) - .filter(Objects::nonNull) - .toArray(Index.Indexer[]::new); - -return indexers.length == 0 ? UpdateTransaction.NO_OP : new WriteTimeTransaction(indexers); + +ArrayList idxrs = new ArrayList<>(); Review comment: Ignorable nit: perhaps we could name this variable `indexers`, and use `idxrs` for the array below. Also, I wonder if it would make sense to use an iterable instead of an array in `WriteTimeTransaction`, so we can directly pass this `ArrayList` without the `toArray` conversion. Would it hurt performance? ## File path: src/java/org/apache/cassandra/index/SecondaryIndexManager.java ## @@ -424,16 +427,47 @@ public static String getIndexName(String cfName) assert isIndexColumnFamily(cfName); return StringUtils.substringAfter(cfName, Directories.SECONDARY_INDEX_NAME_SEPARATOR); } + +/** + * Does a blocking full rebuild/reovery of the specifed indexes from all the sstables in the base table. + * Note also that this method of (re)building/recovering indexes: + * a) takes a set of index *names* rather than Indexers + * b) marks existing indexes removed prior to rebuilding + * c) fails if such marking operation conflicts with any ongoing index builds, as full rebuilds cannot be run + * concurrently + * + * @param indexNames the list of indexes to be rebuilt + * @param isRecovery True if want to run recovery rather than rebuilding + */ +private void rebuildIndexesBlocking(Set indexNames, boolean isRecovery) +{ +try (ColumnFamilyStore.RefViewFragment viewFragment = baseCfs.selectAndReference(View.selectFunction(SSTableSet.CANONICAL)); + Refs allSSTables = viewFragment.refs) +{ +Set toRebuild = indexes.values().stream() + .filter(index -> indexNames.contains(index.getIndexMetadata().name)) + .filter(Index::shouldBuildBlocking) + .collect(Collectors.toSet()); +if (toRebuild.isEmpty()) +{ +logger.info("No defined indexes with the supplied names: {}", Joiner.on(',').join(indexNames)); +return; +} + +buildIndexesBlocking(allSSTables, toRebuild, true,