Re: Prefix + Suffix Wildcards in Searches

Chris Dempsey Mon, 29 Jun 2020 08:47:03 -0700

First off, thanks for taking a look, Erick! I see you helping lots of folks
out here and I've learned a lot from your answers. Much appreciated!


> How regular are your patterns? Are they arbitrary?

Good question. :) That's data that I should have included in the initial
post but both the values in the `tag` field and the search query itself are
totally arbitrary (*i.e. user entered values*). I see where you're going if
the set of either part was limited.

> What’s the field type anyway? Is this field tokenized?

<field name="tag" type="text_kwt_fd_lc" indexed="true" stored="true"
multiValued="true"/>

<fieldType name="text_kwt_fd_lc" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
    <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"
preserveOriginal="true" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.ReversedWildcardFilterFactory"
withOriginal="true" maxPosAsterisk="2" maxPosQuestion="1" minTrailing="2"
maxFractionAsterisk="0"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

On Mon, Jun 29, 2020 at 10:33 AM Erick Erickson <erickerick...@gmail.com>
wrote:

> How regular are your patterns? Are they arbitrary?
> What I’m wondering is if you could shift your work the the
> indexing end, perhaps even in an auxiliary field. Could you,
> say, just index “paid”, “ms-reply-unpaid” etc? Then there
> are no wildcards at all. This akin to “concept search”.
>
> Otherwise ngramming is your best bet.
>
> What’s the field type anyway? Is this field tokenized?
>
> There are lots of options, but soooo much depends on whether
> you can process the data such that you won’t need wildcards.
>
> Best,
> Erick
>
> > On Jun 29, 2020, at 11:16 AM, Chris Dempsey <cdal...@gmail.com> wrote:
> >
> > Hello, all! I'm relatively new to Solr and Lucene (*using Solr 7.7.1*)
> but
> > I'm looking into options for optimizing something like this:
> >
> >> fq=(tag:* -tag:*paid*) OR (tag:* -tag:*ms-reply-unpaid*) OR
> > tag:*ms-reply-paid*
> >
> > It's probably not a surprise that we're seeing performance issues with
> > something like this. My understanding is that using the wildcard on both
> > ends forces a full-text index search. Something like the above can't take
> > advantage of something like the ReverseWordFilter either. I believe
> > constructing `n-grams` is an option (*at the expense of index size*) but
> is
> > there anything I'm overlooking as a possible avenue to look into?
>
>

Re: Prefix + Suffix Wildcards in Searches

Reply via email to