Hello, Audrey.

Can you create a regexp capturing all-caps for
https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#pattern-replace-filter
 ?

On Wed, Nov 6, 2019 at 6:36 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com
<audrey.lorberf...@ibm.com> wrote:

> I would also love to know what filter to use to ignore capitalized
> acronyms... which one can do this OOTB?
>
> --
> Audrey Lorberfeld
> Data Scientist, w3 Search
> IBM
> audrey.lorberf...@ibm.com
>
>
> On 11/6/19, 3:54 AM, "Paras Lehana" <paras.leh...@indiamart.com> wrote:
>
>     Hi Community,
>
>     In Ref Guide 8.3's *Understanding Analyzers, Tokenizers, and Filters*
>     <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F3_understanding-2Danalyzers-2Dtokenizers-2Dand-2Dfilters.html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=yEGsn7-9_UxyVA_itjyjmvW4UAAO1WE_p0rDKTnULaE&s=dmVDu9CjG_4iJDG59qtuPB4vaj8769FPo7NwGyVPc9g&e=
> >
>     section, the text talks about precision and recall depending on how
> you use
>     analyzers during query and index time:
>
>     For indexing, you often want to simplify, or normalize, words. For
> example,
>     > setting all letters to lowercase, eliminating punctuation and
> accents,
>     > mapping words to their stems, and so on. Doing so can *increase
> recall *because,
>     > for example, "ram", "Ram" and "RAM" would all match a query for
> "ram". To *increase
>     > query-time precision*, a filter could be employed to narrow the
> matches
>     > by, for example, *ignoring all-cap acronyms* if you’re interested in
> male
>     > sheep, but not Random Access Memory.
>
>
>     In first case (about Recall), is it assumed that "ram" should match to
> all
>     three? *[Q1] *Because, to increase recall, we have to decrease false
>     negatives (documents not retrieved but are relevant). In other case
> (if the
>     three are not intended to match the query), precision is actually
> decreased
>     here (false positives are increased).
>
>     This makes sense for the second case, where precision should increase
> as we
>     are decreasing false positives (documents marked relevant wrongly).
>
>     However, the text talks about the method of "employing a filter that
>     ignores all-cap acronyms". How are we supposed to do that on query
> time?
>     *[Q2]* Weren't we supposed to remove filter (LCF) during the index
> time?
>
>
>     --
>     --
>     Regards,
>
>     *Paras Lehana* [65871]
>     Development Engineer, Auto-Suggest,
>     IndiaMART Intermesh Ltd.
>
>     8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
>     Noida, UP, IN - 201303
>
>     Mob.: +91-9560911996
>     Work: 01203916600 | Extn:  *8173*
>
>     --
>     IMPORTANT:
>     NEVER share your IndiaMART OTP/ Password with anyone.
>
>
>

-- 
Sincerely yours
Mikhail Khludnev

Reply via email to