On 2/3/2020 12:06 PM, Rahul Goswami wrote:
I am working with Solr 7.2.1 and had a question regarding the performance
of wildcard searches.
q=*:*
vs
q=id:*
vs
q=id:[* TO *]
Can someone please rank them in the order of performance with the
underlying reason?
The only one of those that is an actual wildcard search is the middle
one. The others are special syntax. The first one is special syntax
that means "all documents." The third one is a range query with special
syntax that means "any value to any value".
If "id" is your uniqueKey field, which seems likely, then all three of
those queries will produce identical results, and the likely speed
ranking will be:
*:*
id:[* TO *]
id:*
The first two are going to complete pretty quickly, and the third will
be a LOT slower.
What Solr must do for a wildcard query is first look at the index to
determine what terms in the index match the wildcard string. And then
it will construct a Lucene query internally that quite literally
includes every single one of those terms. Which means that if the field
contains 10 million unique values for the id field, the constructed
query for id:* will contain ten million values. And each and every one
of them will be matched individually against the index. Getting the
list of matching terms in the first place will probably be pretty slow,
and then the individual matches against the index will add up quickly.
Thanks,
Shawn