Thanks Erick, that seems to work! Should I leave it in qf also? For example the query "blue dog" may be represented as separate tokens in the keyword index.
On Mon, Sep 30, 2019 at 9:32 PM Erick Erickson <erickerick...@gmail.com> wrote: > Have you tried taking your keyword field out of the “qf” param and adding > it explicitly? As keyword:”ice cream” > > Best, > Erick > > > On Sep 30, 2019, at 5:27 AM, Ashwin Ramesh <ash...@canva.com> wrote: > > > > Hi everybody, > > > > I am using the edismax parser and have noticed a very specific behaviour > > with how sow=true (default) handles multiword keywords. > > > > We have a field called 'keywords', which uses the general > > KeywordTokenizerFactory. There are also other text fields like title and > > description. etc. > > > > When we index a document with a keyword "ice cream", for example, we know > > it gets indexed into that field as "ice cream". > > > > However, at query time, I noticed that if we run an Edismax query: > > q=ice cream > > qf=keywords > > > > I do not get that document back as a match. This is due to sow=true > > splitting the user's query and the final tokens not being present in the > > keywords field. > > > > I was wondering what the best practise around this was? Some thoughts I > > have had: > > > > 1. Index multi-word keywords with hyphens or somelike similar. E.g. "ice > > cream" -> "ice-cream" > > 2. Additionally index the separate words as keywords also. E.g. "ice > cream" > > -> "ice cream", "ice", "cream". However this method will result in the > loss > > of intent (q=ice would return this document). > > 3. Add a boost query which is an edismax query where we explicitly set > > sow=false and add a huge boost. E.g*. bq={!edismax qf=keywords^1000 > > sow=false bq="" boost="" pf="" tie=1.00 v="ice cream"}* > > > > Is there an industry practise solution to handle this type of problem? > Keep > > in mind that the other text fields may also include these terms. E.g. > > title="This is ice cream", which would match the query. This specific > > problem affects the keywords field for the obvious reason that the > indexing > > pipeline does not tokenize keywords. > > > > Thank you for all your amazing help, > > > > Regards, > > > > Ash > > > > -- > > *P.S. We've launched a new blog to share the latest ideas and case > studies > > from our team. Check it out here: product.canva.com > > <https://product.canva.com/>. *** > > ** <https://www.canva.com/>Empowering the > > world to design > > Also, we're hiring. Apply here! > > <https://about.canva.com/careers/> > > <https://twitter.com/canva> > > <https://facebook.com/canva> <https://au.linkedin.com/company/canva> > > <https://twitter.com/canva> <https://facebook.com/canva> > > <https://au.linkedin.com/company/canva> <https://instagram.com/canva> > > > > > > > > > > > > > > -- *P.S. We've launched a new blog to share the latest ideas and case studies from our team. Check it out here: product.canva.com <https://product.canva.com/>. *** ** <https://www.canva.com/>Empowering the world to design Also, we're hiring. Apply here! <https://about.canva.com/careers/> <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://twitter.com/canva> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> <https://instagram.com/canva>