Hi Doug, I'm back to this topic. Unfortunately, due to my DB structer, and business need, I will not be able to search against a single field (i.e.: using copyField). Thus, I have to use list of fields via "qf". Given this, I see you said above to use "tie=1.0" will that, more or less, address this scoring issue? Should "tie=1.0" be set on the request handler like so:
<requestHandler name="/select" class="solr.SearchHandler"> <lst name="defaults"> <str name="echoParams">explicit</str> <int name="rows">20</int> <str name="defType">edismax</str> <str name="qf">F1 F2 F3 F4 ... ... ...</str> <float name="tie">1.0</float> <str name="fl">_UNIQUE_FIELD_,score</str> <str name="wt">xml</str> <str name="indent">true</str> </lst> </requestHandler> Or must "tie" be passed as part of the URL? Thanks Steve On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > Yeah a copyField into one could be a good space/time tradeoff. It can be > more manageable to use an all field for both relevancy and performance, if > you can handle the duplication of data. > > You could set tie=1.0, which effectively sums all the matches instead of > picking the best match. You'll still have cases where one field's score > might just happen to be far off of another, and thus dominating the > summation. But something easy to try if you want to keep playing with > dismax. > > -Doug > > On Wed, May 20, 2015 at 2:56 PM, Steven White <swhite4...@gmail.com> > wrote: > > > Hi Doug, > > > > Your blog write up on relevancy is very interesting, I didn't know this. > > Looks like I have to go back to my drawing board and figure out an > > alternative solution: somehow get those group-based-fields data into a > > single field using copyField. > > > > Thanks > > > > Steve > > > > On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull < > > dturnb...@opensourceconnections.com> wrote: > > > > > Steven, > > > > > > I'd be concerned about your relevance with that many qf fields. Dismax > > > takes a "winner takes all" point of view to search. Field scores can > vary > > > by an order of magnitude (or even two) despite the attempts of query > > > normalization. You can read more here > > > > > > > > > http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/ > > > > > > I'm about to win the "blashphemer" merit badge, but ad-hoc all-field > like > > > searching over many fields is actually a good use case for > > Elasticsearch's > > > cross field queries. > > > > > > > > > https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html > > > > > > > > > http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/ > > > > > > It wouldn't be hard (and actually a great feature for the project) to > get > > > the Lucene query associated with cross field search into Solr. You > could > > > easily write a plugin to integrate it into a query parser: > > > > > > > > > https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java > > > > > > Hope that helps > > > -Doug > > > -- > > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource > Connections, > > > LLC | 240.476.9983 | http://www.opensourceconnections.com > > > Author: Relevant Search <http://manning.com/turnbull> from Manning > > > Publications > > > This e-mail and all contents, including attachments, is considered to > be > > > Company Confidential unless explicitly stated otherwise, regardless > > > of whether attachments are marked as such. > > > On Wed, May 20, 2015 at 8:27 AM, Steven White <swhite4...@gmail.com> > > > wrote: > > > > > > > Hi everyone, > > > > > > > > My solution requires that users in group-A can only search against a > > set > > > of > > > > fields-A and users in group-B can only search against a set of > > fields-B, > > > > etc. There can be several groups, as many as 100 even more. To meet > > > this > > > > need, I build my search by passing in the list of fields via "qf". > > What > > > > goes into "qf" can be large: as many as 1500 fields and each field > name > > > > averages 15 characters long, in effect the data passed via "qf" will > be > > > > over 20K characters. > > > > > > > > Given the above, beside the fact that a search for "apple" > translating > > > to a > > > > 20K characters passing over the network, what else within Solr and > > > Lucene I > > > > should be worried about if any? Will I hit some kind of a limit? > Will > > > > each search now require more CPU cycles? Memory? Etc. > > > > > > > > If the network traffic becomes an issue, my alternative solution is > to > > > > create a /select handler for each group and in that handler list the > > > fields > > > > under "qf". > > > > > > > > I have considered creating pseudo-fields for each group and then use > > > > copyField into that group. During search, I than can "qf" against > that > > > one > > > > field. Unfortunately, this is not ideal for my solution because the > > > fields > > > > that go into each group dynamically change (at least once a month) > and > > > when > > > > they do change, I have to re-index everything (this I have to avoid) > to > > > > sync that group-field. > > > > > > > > I'm using "qf" with edismax and my Solr version is 5.1. > > > > > > > > Thanks > > > > > > > > Steve > > > > > > > > > > > > > -- > *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections, > LLC | 240.476.9983 | http://www.opensourceconnections.com > Author: Relevant Search <http://manning.com/turnbull> from Manning > Publications > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless > of whether attachments are marked as such. >