On 2/3/2020 12:06 PM, Rahul Goswami wrote:
I am working with Solr 7.2.1 and had a question regarding the performance
of wildcard searches.

q=*:*
vs
q=id:*
vs
q=id:[* TO *]

Can someone please rank them in the order of performance with the
underlying reason?

The only one of those that is an actual wildcard search is the middle one. The others are special syntax. The first one is special syntax that means "all documents." The third one is a range query with special syntax that means "any value to any value".

If "id" is your uniqueKey field, which seems likely, then all three of those queries will produce identical results, and the likely speed ranking will be:

*:*
id:[* TO *]
id:*

The first two are going to complete pretty quickly, and the third will be a LOT slower.

What Solr must do for a wildcard query is first look at the index to determine what terms in the index match the wildcard string. And then it will construct a Lucene query internally that quite literally includes every single one of those terms. Which means that if the field contains 10 million unique values for the id field, the constructed query for id:* will contain ten million values. And each and every one of them will be matched individually against the index. Getting the list of matching terms in the first place will probably be pretty slow, and then the individual matches against the index will add up quickly.

Thanks,
Shawn

Reply via email to