Re: Performance comparison for wildcard searches

Shawn Heisey Mon, 03 Feb 2020 12:58:32 -0800

On 2/3/2020 12:06 PM, Rahul Goswami wrote:

I am working with Solr 7.2.1 and had a question regarding the performance
of wildcard searches.


q=*:*
vs
q=id:*
vs
q=id:[* TO *]

Can someone please rank them in the order of performance with the
underlying reason?

The only one of those that is an actual wildcard search is the middleone. The others are special syntax. The first one is special syntaxthat means "all documents." The third one is a range query with specialsyntax that means "any value to any value".

If "id" is your uniqueKey field, which seems likely, then all three ofthose queries will produce identical results, and the likely speedranking will be:


*:*
id:[* TO *]
id:*

The first two are going to complete pretty quickly, and the third willbe a LOT slower.

What Solr must do for a wildcard query is first look at the index todetermine what terms in the index match the wildcard string. And thenit will construct a Lucene query internally that quite literallyincludes every single one of those terms. Which means that if the fieldcontains 10 million unique values for the id field, the constructedquery for id:* will contain ten million values. And each and every oneof them will be matched individually against the index. Getting thelist of matching terms in the first place will probably be pretty slow,and then the individual matches against the index will add up quickly.


Thanks,
Shawn

Re: Performance comparison for wildcard searches

Reply via email to