Re: When is too many fields in "qf" is too many?

Steven White Tue, 26 May 2015 10:43:31 -0700

Hi Doug,

I'm back to this topic.  Unfortunately, due to my DB structer, and business
need, I will not be able to search against a single field (i.e.: using
copyField).  Thus, I have to use list of fields via "qf".  Given this, I
see you said above to use "tie=1.0" will that, more or less, address this
scoring issue?  Should "tie=1.0" be set on the request handler like so:


  <requestHandler name="/select" class="solr.SearchHandler">
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <int name="rows">20</int>
       <str name="defType">edismax</str>
       <str name="qf">F1 F2 F3 F4 ... ... ...</str>
       <float name="tie">1.0</float>
       <str name="fl">_UNIQUE_FIELD_,score</str>
       <str name="wt">xml</str>
       <str name="indent">true</str>
     </lst>
  </requestHandler>

Or must "tie" be passed as part of the URL?

Thanks

Steve


On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> Yeah a copyField into one could be a good space/time tradeoff. It can be
> more manageable to use an all field for both relevancy and performance, if
> you can handle the duplication of data.
>
> You could set tie=1.0, which effectively sums all the matches instead of
> picking the best match. You'll still have cases where one field's score
> might just happen to be far off of another, and thus dominating the
> summation. But something easy to try if you want to keep playing with
> dismax.
>
> -Doug
>
> On Wed, May 20, 2015 at 2:56 PM, Steven White <swhite4...@gmail.com>
> wrote:
>
> > Hi Doug,
> >
> > Your blog write up on relevancy is very interesting, I didn't know this.
> > Looks like I have to go back to my drawing board and figure out an
> > alternative solution: somehow get those group-based-fields data into a
> > single field using copyField.
> >
> > Thanks
> >
> > Steve
> >
> > On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull <
> > dturnb...@opensourceconnections.com> wrote:
> >
> > > Steven,
> > >
> > > I'd be concerned about your relevance with that many qf fields. Dismax
> > > takes a "winner takes all" point of view to search. Field scores can
> vary
> > > by an order of magnitude (or even two) despite the attempts of query
> > > normalization. You can read more here
> > >
> > >
> >
> http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/
> > >
> > > I'm about to win the "blashphemer" merit badge, but ad-hoc all-field
> like
> > > searching over many fields is actually a good use case for
> > Elasticsearch's
> > > cross field queries.
> > >
> > >
> >
> https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html
> > >
> > >
> >
> http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/
> > >
> > > It wouldn't be hard (and actually a great feature for the project) to
> get
> > > the Lucene query associated with cross field search into Solr. You
> could
> > > easily write a plugin to integrate it into a query parser:
> > >
> > >
> >
> https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java
> > >
> > > Hope that helps
> > > -Doug
> > > --
> > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource
> Connections,
> > > LLC | 240.476.9983 | http://www.opensourceconnections.com
> > > Author: Relevant Search <http://manning.com/turnbull> from Manning
> > > Publications
> > > This e-mail and all contents, including attachments, is considered to
> be
> > > Company Confidential unless explicitly stated otherwise, regardless
> > > of whether attachments are marked as such.
> > > On Wed, May 20, 2015 at 8:27 AM, Steven White <swhite4...@gmail.com>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > My solution requires that users in group-A can only search against a
> > set
> > > of
> > > > fields-A and users in group-B can only search against a set of
> > fields-B,
> > > > etc.  There can be several groups, as many as 100 even more.  To meet
> > > this
> > > > need, I build my search by passing in the list of fields via "qf".
> > What
> > > > goes into "qf" can be large: as many as 1500 fields and each field
> name
> > > > averages 15 characters long, in effect the data passed via "qf" will
> be
> > > > over 20K characters.
> > > >
> > > > Given the above, beside the fact that a search for "apple"
> translating
> > > to a
> > > > 20K characters passing over the network, what else within Solr and
> > > Lucene I
> > > > should be worried about if any?  Will I hit some kind of a limit?
> Will
> > > > each search now require more CPU cycles?  Memory?  Etc.
> > > >
> > > > If the network traffic becomes an issue, my alternative solution is
> to
> > > > create a /select handler for each group and in that handler list the
> > > fields
> > > > under "qf".
> > > >
> > > > I have considered creating pseudo-fields for each group and then use
> > > > copyField into that group.  During search, I than can "qf" against
> that
> > > one
> > > > field.  Unfortunately, this is not ideal for my solution because the
> > > fields
> > > > that go into each group dynamically change (at least once a month)
> and
> > > when
> > > > they do change, I have to re-index everything (this I have to avoid)
> to
> > > > sync that group-field.
> > > >
> > > > I'm using "qf" with edismax and my Solr version is 5.1.
> > > >
> > > > Thanks
> > > >
> > > > Steve
> > > >
> > >
> >
>
>
>
> --
> *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
> LLC | 240.476.9983 | http://www.opensourceconnections.com
> Author: Relevant Search <http://manning.com/turnbull> from Manning
> Publications
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>

Re: When is too many fields in "qf" is too many?

Reply via email to