Hello, Audrey. Can you create a regexp capturing all-caps for https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#pattern-replace-filter ?
On Wed, Nov 6, 2019 at 6:36 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote: > I would also love to know what filter to use to ignore capitalized > acronyms... which one can do this OOTB? > > -- > Audrey Lorberfeld > Data Scientist, w3 Search > IBM > audrey.lorberf...@ibm.com > > > On 11/6/19, 3:54 AM, "Paras Lehana" <paras.leh...@indiamart.com> wrote: > > Hi Community, > > In Ref Guide 8.3's *Understanding Analyzers, Tokenizers, and Filters* > < > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F3_understanding-2Danalyzers-2Dtokenizers-2Dand-2Dfilters.html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=yEGsn7-9_UxyVA_itjyjmvW4UAAO1WE_p0rDKTnULaE&s=dmVDu9CjG_4iJDG59qtuPB4vaj8769FPo7NwGyVPc9g&e= > > > section, the text talks about precision and recall depending on how > you use > analyzers during query and index time: > > For indexing, you often want to simplify, or normalize, words. For > example, > > setting all letters to lowercase, eliminating punctuation and > accents, > > mapping words to their stems, and so on. Doing so can *increase > recall *because, > > for example, "ram", "Ram" and "RAM" would all match a query for > "ram". To *increase > > query-time precision*, a filter could be employed to narrow the > matches > > by, for example, *ignoring all-cap acronyms* if you’re interested in > male > > sheep, but not Random Access Memory. > > > In first case (about Recall), is it assumed that "ram" should match to > all > three? *[Q1] *Because, to increase recall, we have to decrease false > negatives (documents not retrieved but are relevant). In other case > (if the > three are not intended to match the query), precision is actually > decreased > here (false positives are increased). > > This makes sense for the second case, where precision should increase > as we > are decreasing false positives (documents marked relevant wrongly). > > However, the text talks about the method of "employing a filter that > ignores all-cap acronyms". How are we supposed to do that on query > time? > *[Q2]* Weren't we supposed to remove filter (LCF) during the index > time? > > > -- > -- > Regards, > > *Paras Lehana* [65871] > Development Engineer, Auto-Suggest, > IndiaMART Intermesh Ltd. > > 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > Noida, UP, IN - 201303 > > Mob.: +91-9560911996 > Work: 01203916600 | Extn: *8173* > > -- > IMPORTANT: > NEVER share your IndiaMART OTP/ Password with anyone. > > > -- Sincerely yours Mikhail Khludnev