Hello everyone,

We've observed the two following issues with empty indexes, which we
believe to be bugs. I hope this list is a suitable communication
channel for reporting them, because I don't have access to the
bugtracker.

1. Group queries on an empty index fail with an HTTP 500 status and
the following error:
java.lang.IllegalArgumentException: numHits must be > 0; please use
TotalHitCountCollectorManager if you just need the total hit count

2. Replication does not trigger under certain conditions keeping an
empty index on the follower although the leader's index is not empty.
This happens when the index on
the leader is emptied and then restored to the previously replicated version

I've successfully reproduced these issues with Solr 9.6.1 and 9.7.0.
I've also got scripts for an automated reproduction, which I'll gladly
share.


The following is a more detailed description of the issues and the
steps to reproduce them:

--- Regarding 1.:

It's enough to run a Group Query against an empty Solr index, e.g.
/solr/gettingstarted/select?group.query=id%3A(1)&group=true

This will result in an error and the following log message:

ERROR (qtp738677855-25-null-1) [c: s: r: x:gettingstarted t:null-1]
o.a.s.h.RequestHandlerBase Server exception =>
java.lang.IllegalArgumentException: numHits must be > 0; please use
TotalHitCountCollectorManager if you just need the total hit count
        at 
org.apache.lucene.search.TopScoreDocCollectorManager.<init>(TopScoreDocCollectorManager.java:67)
java.lang.IllegalArgumentException: numHits must be > 0; please use
TotalHitCountCollectorManager if you just need the total hit count
        at 
org.apache.lucene.search.TopScoreDocCollectorManager.<init>(TopScoreDocCollectorManager.java:67)
~[?:?]
        at 
org.apache.lucene.search.TopScoreDocCollector.create(TopScoreDocCollector.java:212)
~[?:?]
        at 
org.apache.solr.search.Grouping$CommandQuery.createFirstPassCollector(Grouping.java:909)
~[?:?]
        at org.apache.solr.search.Grouping.execute(Grouping.java:335) ~[?:?]
        at 
org.apache.solr.handler.component.QueryComponent.doProcessGroupedSearch(QueryComponent.java:1658)
~[?:?]
        at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:428)
~[?:?]
        at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:465)
~[?:?] [...]


--- Regarding 2.:

I'm assuming a setup consisting of 1. a leader, 2. a repeater (leader
+ follower) and 3. a follower. The repeater is configured such that
replication is not started automatically but requires an explicit
fetchindex. Other setups may also be possible.

a) Initially the leader should have a non-empty index. The repeater
and follower should have the same non-empty index via replication.

b) Now, delete the index on the repeater (and only there). This
results in an empty index on the repeater with the following version &
generation: Version: 0 Generation: 1

c) Eventually the follower will pick this up and it will empty its
index as indicated by the log:

  INFO  (indexFetcher-25-thread-1) [c: s: r: x: t:]
o.a.s.h.IndexFetcher Leader's generation: 1
  INFO  (indexFetcher-25-thread-1) [c: s: r: x: t:]
o.a.s.h.IndexFetcher Leader's version: 0
  INFO  (indexFetcher-25-thread-1) [c: s: r: x: t:]
o.a.s.h.IndexFetcher Follower's generation: 2
  INFO  (indexFetcher-25-thread-1) [c: s: r: x: t:]
o.a.s.h.IndexFetcher Follower's version: 1729523375294
  INFO  (indexFetcher-25-thread-1) [c: s: r: x: t:]
o.a.s.h.IndexFetcher New index in Leader. Deleting mine...

  This results in an empty index on the repeater but with an unchanged
version & generation: Version: 1729523375294 Generation: 2

  That's already somewhat unexpected. Instead, I would have expected
the same version & generation as on the repeater, i.e. Version: 0
Generation: 1. But so far that's not yet an issue. It's just
unexpected.

d) Next, I'll restore the index on the repeater by explicitly
replicating the previous index from the leader using fetchindex. On
the repeater replication eventually completes and the version &
generation are updated as expected: Version: 1729523375294 Generation:
2

e) Finally, I'd expect for the follower to also replicate the restored
index from the repeater. But that does not happen. That's the actual
issue.

I believe the root cause is that after "New index in Leader. Deleting
mine..." the version & generation of the follower should have been
reset to: Version: 0 Generation: 1. But because that did not happen,
the follower eventually believes its index is still up-to-date,
because the version & generation match the repeater. So the index of
the follower remains empty unlike the index of the.repeater and
leader.

It's possible to recover from this condition by changing the index on
the leader, which results in an altogether new version / generation,
which is then also replicated by the follower.


Thanks for your help
Andreas Born

Reply via email to