Re: Query generation is different for search terms with and without "-"

2020-11-25 Thread Walter Underwood
Ages ago at Netflix, I fixed this with a few hundred synonyms. If you are working with a fixed vocabulary (movie titles, product names), that can work just fine. babysitter, baby-sitter, baby sitter fullmetal, full-metal, full metal manhunter, man-hunter, man hunter spiderman, spider-man, spider

Re: Query generation is different for search terms with and without "-"

2020-11-25 Thread Erick Erickson
Parameters, no. You could use a PatternReplaceCharFilterFactory. NOTE: *FilterFactory are _not_ what you want in this case, they are applied to individual tokens after parsing *CharFiterFactory are invoked on the entire input to the field, although I can’t say for certain that even that’s

Re: Query generation is different for search terms with and without "-"

2020-11-24 Thread Samuel Gutierrez
Are there any good workarounds/parameters we can use to fix this so it doesn't have to be solved client side? On Tue, Nov 24, 2020 at 7:50 AM matthew sporleder wrote: > Is the normal/standard solution here to regex remove the '-'s and > combine them into a single token? > > On Tue, Nov 24, 2020

Re: Query generation is different for search terms with and without "-"

2020-11-24 Thread matthew sporleder
Is the normal/standard solution here to regex remove the '-'s and combine them into a single token? On Tue, Nov 24, 2020 at 8:00 AM Erick Erickson wrote: > > This is a common point of confusion. There are two phases for creating a > query, > query _parsing_ first, then the analysis chain for

Re: Query generation is different for search terms with and without "-"

2020-11-24 Thread Erick Erickson
This is a common point of confusion. There are two phases for creating a query, query _parsing_ first, then the analysis chain for the parsed result. So what e-dismax sees in the two cases is: Name_enUS:“high tech” -> two tokens, since there are two of them pf2 comes into play.

Query generation is different for search terms with and without "-"

2020-11-23 Thread Samuel Gutierrez
I am troubleshooting an issue with ranking for search terms that contain a "-" vs the same query that does not contain the dash e.g. "high-tech" vs "high tech". The field that I am querying is using the standard tokenizer, so I would expect that the underlying lucene query should be the same for