> > > > > We do auto-complete through prefix searches on shingles. > > > > > > > Just to confirm, do you mean using EdgeNgram filter to produce letter > > ngrams > > of the tokens in the chosen field? > > > > > >> No, I'm talking about prefix search on tokens produced by a ShingleFilter. >> > > I did not know about the Prefix query parser in Solr. Thanks a lot for > pointing out the same. > > I find relatively little online material about the Solr/Lucene prefix query > parser. Kindly point me to any useful resource that I might be missing. > > I looked into the Solr/Lucene classes and found the required information. Am summarizing the same for the benefit of those that might refer to this thread in the future.
The change I had to make was very simple - make a call to getPrefixQuery instead of getWildcardQuery in my custom-modified Solr dismax query parser class. However, this will make a fairly significant difference in terms of efficiency. The key difference between the lucene WildcardQuery and PrefixQuery lies in their respective term enumerators, specifically in the term comparators. The termCompare method for PrefixQuery is more light-weight than that of WildcardQuery and is essentially an optimization given that a prefix query is nothing but a specialized case of Wildcard query. Also, this is why the lucene query parser automatically creates a PrefixQuery for query terms of the form 'foo*' instead of a WildcardQuery. A big thank you to Shalin for providing valuable guidance and insight. And one final request for Comment to Shalin on this topic - I am guessing you ensured there were no duplicate terms in the field(s) used for autocompletion. For our first version, I am thinking of eliminating the duplicates outside of the results handler that gives suggestions since duplicate suggestions originate only from different document IDs in our system and we do want the list of document IDs matched. Is there a better/different way of doing the same? Regards, Prasanna.