michaeljmarshall commented on code in PR #4353:
URL: https://github.com/apache/cassandra/pull/4353#discussion_r2683540237
##########
src/java/org/apache/cassandra/index/sai/disk/format/IndexDescriptor.java:
##########
@@ -126,7 +127,10 @@ public PrimaryKeyMap.Factory
newPrimaryKeyMapFactory(SSTableReader sstable)
public SSTableIndex newSSTableIndex(SSTableContext sstableContext,
StorageAttachedIndex index)
{
- return version.onDiskFormat().newSSTableIndex(sstableContext, index);
+ if (isIndexEmpty(index.termType(), index.identifier()))
+ return index.termType().isVector() ? new
EmptyIndex(sstableContext, index) : null;
Review Comment:
From one perspective, returning `null` here maintains the current behavior
for non-vector indexes. It's the returning of the `EmptyIndex` in the vector
case that changes the behavior. This method is called when we update the
sstables in the `IndexViewManager`. We use the result from this method to get a
reference to each of the index's sstables when building the `View`. Returning
`null` lets the view manager skip the sstable reference. If we return `null`
here for vector, there are tests that fail when we have an sstable with no
indexed vectors that contains a tombstone for the row. See
`rangeRestrictedTestWithDuplicateVectorsAndAddNullVector`.
The test fails because in the vector query path, we use the `View` to pick
the memtables and sstables we'll search when materializing rows so that we can
validly call `isIndexDataValid` which relies on stable sstable id and memtable
references for validation. If we exclude the empty index, we materialize the
row incorrectly and produce invalid results. In the non-vector case, we
actually acquire a new view of the data for every time we go to materialize the
row from storage, avoiding this case entirely.
We could consider using the non-vector query views in the same way we use
them for vector queries, but then we need to build logic to merge views in the
case of more than one query predicate. We could also move the skip logic to a
higher level to make the design more obvious within the `IndexViewManager`. I
don't have an answer yet. I'll continue to think on it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]