Hi, Apologies if you are receiving it second time...having tough time with mail server..
I take a user entered query as it is and run it with dismax query handler. The documents fields have been filled from structured data, where different fields have different attributes like number of beds, number of baths, city name etc. A sample user query would look like "3 bed homes in new york". I would like this to match against city:new york and beds:3 beds. When I use dismax handler with boosts and tie parameter, I do not always get the most relevant top 10 results because there seem to be many factors in play one of which is not being able to recognize the presence of sub phrases and secondly not being able to ignore unwanted matches in unwanted fields. What are your thoughts on having one more request handler like dismax, but which uses a sub-phrase query instead of dismax query ? It would also provide the below parameters, on per field basis, to help customize the behavior of the request handler, and give more flexibility in different scenarios. . phraseBoost - how better is a 3 word sub phrase match than 2 word sub phrase match useOnlyMaxMatch - If many sub phrases match in the field, only the best score is used. ignoreDuplicates - If a field has duplicate matches, pick only one match for scoring. matchOnlyOneField - if match is found in the first field, remove the matched terms while querying the other fields. For example, for me city match is more important than in other fields. So,, I do not want the"new" in new york to match all other fields and skew the results, which is what i am seeing with dismax, irrespective of the high boosts. ignoreSomeLuceneScorefactors - Ignore the lucene tf, idf, query norm or any such criteria which is not needed for this field., since if I want exact matches only, they are really not important. They also seem to play a big role in me not being to get most relevant top 10 results. I see this handler might be useful in the below use cases - a) data is mostly exact in that, I am not trying to search on free text like, mails, reviews, articles, web pages etc b) numbers and their binding are important c) exact phrase or sub phrase matches are more important than rankings derived from tf, idf, query norm etc. d) need to make sure that in some cases some fields affect the scoring and in some they don't. I found this was the most difficult task, to trace the noise matches from the required ones for my use case. Your thoughts and suggestions on alternatives are welcome. Have also posted a question on sub phrase matching in lucene-user which is not related to having a solr handler with additional features like sub-phrase matching, for user entered queries. Thanks Preetam