You may also want to look at something like: https://docs.querqy.org/index.html

ApacheCon had (is having..) a presentation on it that seemed quite
relevant to your needs. The videos should be live in a week or so.

Regards,
   Alex.

On Tue, 29 Sep 2020 at 22:56, Alexandre Rafalovitch <arafa...@gmail.com> wrote:
>
> I am not sure why you think stop words are your first choice. Maybe I
> misunderstand the question. I read it as that you need to exclude
> completely a set of documents that include specific keywords when
> called from specific module.
>
> If I wanted to differentiate the searches from specific module, I
> would give that module a different end-point (Request Query Handler),
> instead of /select. So, /nocigs or whatever.
>
> Then, in that end-point, you could do all sorts of extra things, such
> as setting appends or even invariants parameters, which would include
> filter query to exclude any documents matching specific keywords. I
> assume it is ok to return documents that are matching for other
> reasons.
>
> Ideally, you would mark the cigs documents during indexing with a
> binary or enumeration flag and then during search you just need to
> check against that flag. In that case, you could copyField  your text
> and run it against something like
> https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#keep-word-filter
> combined with Shingles for multiwords. Or similar. And just transform
> it as index-only so that the result is basically a yes/no flag.
> Similar thing could be done with UpdateRequestProcessor pipeline if
> you want to end up with a true boolean flag. The idea is the same,
> just to have an index-only flag that you force lock into for any
> request from specific module.
>
> Or even with something like ElevationSearchComponent. Same idea.
>
> Hope this helps.
>
> Regards,
>    Alex.
>
> On Tue, 29 Sep 2020 at 22:28, Derek Poh <d...@globalsources.com> wrote:
> >
> > Hi
> >
> > I have read in the mailings list that we should try to avoid using stop
> > words.
> >
> > I have a use case where I would like to know if there is other
> > alternative solutions beside using stop words.
> >
> > There is business requirement to return zero result when the search is
> > cigarette related words and the search is coming from a particular
> > module on our site. It does not apply to all searches from our site.
> > There is a list of these cigarette related words. This list contains
> > single word, multiple words (Electronic cigar), multiple words with
> > punctuation (e-cigarette case).
> > I am planning to copy a different set of search fields, that will
> > include the stopword filter in the index and query stage, for this
> > module to use.
> >
> > For this use case, other than using stop words to handle it, is there
> > any alternative solution?
> >
> > Derek
> >
> > ----------------------
> > CONFIDENTIALITY NOTICE
> >
> > This e-mail (including any attachments) may contain confidential and/or 
> > privileged information. If you are not the intended recipient or have 
> > received this e-mail in error, please inform the sender immediately and 
> > delete this e-mail (including any attachments) from your computer, and you 
> > must not use, disclose to anyone else or copy this e-mail (including any 
> > attachments), whether in whole or in part.
> >
> > This e-mail and any reply to it may be monitored for security, legal, 
> > regulatory compliance and/or other appropriate reasons.

Reply via email to