First off, thanks for taking a look, Erick! I see you helping lots of folks out here and I've learned a lot from your answers. Much appreciated!
> How regular are your patterns? Are they arbitrary? Good question. :) That's data that I should have included in the initial post but both the values in the `tag` field and the search query itself are totally arbitrary (*i.e. user entered values*). I see where you're going if the set of either part was limited. > What’s the field type anyway? Is this field tokenized? <field name="tag" type="text_kwt_fd_lc" indexed="true" stored="true" multiValued="true"/> <fieldType name="text_kwt_fd_lc" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer type="index"> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true" maxPosAsterisk="2" maxPosQuestion="1" minTrailing="2" maxFractionAsterisk="0"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> On Mon, Jun 29, 2020 at 10:33 AM Erick Erickson <erickerick...@gmail.com> wrote: > How regular are your patterns? Are they arbitrary? > What I’m wondering is if you could shift your work the the > indexing end, perhaps even in an auxiliary field. Could you, > say, just index “paid”, “ms-reply-unpaid” etc? Then there > are no wildcards at all. This akin to “concept search”. > > Otherwise ngramming is your best bet. > > What’s the field type anyway? Is this field tokenized? > > There are lots of options, but soooo much depends on whether > you can process the data such that you won’t need wildcards. > > Best, > Erick > > > On Jun 29, 2020, at 11:16 AM, Chris Dempsey <cdal...@gmail.com> wrote: > > > > Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*) > but > > I'm looking into options for optimizing something like this: > > > >> fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR > > tag:*ms-reply-paid* > > > > It's probably not a surprise that we're seeing performance issues with > > something like this. My understanding is that using the wildcard on both > > ends forces a full-text index search. Something like the above can't take > > advantage of something like the ReverseWordFilter either. I believe > > constructing `n-grams` is an option (*at the expense of index size*) but > is > > there anything I'm overlooking as a possible avenue to look into? > >