Still, it seems like the right direction.   

Does it "smell" ok to have a few hundred request handlers?    Again, my logic 
is that if any given view requires no more than 50 fields, one request handler 
per view would work.   This is different than a request handler per user 
category (which requires access to any number of views and, thus, many more 
fields).

This does require a design change for Steven's application ...

Steven, do you have tables of the many-to-many relationship between fields and 
views and users and views?   If so, you should be able to programmatically 
generate the request handlers.

If these relationships change frequently, then some custom plugin will be 
required to access these tables at query time.

See what I mean?

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, May 28, 2015 12:07 PM
To: solr-user@lucene.apache.org
Subject: Re: When is too many fields in "qf" is too many?

Gotta agree with Jack here. This is an insane number of fields, query 
performance on any significant corpus will be "fraught" etc. The very first 
thing I'd look at is having that many fields. You have 3,500 different fields! 
Whatever the motivation for having that many fields is the place I'd start.....

Best,
Erick

On Thu, May 28, 2015 at 5:50 AM, Jack Krupansky <jack.krupan...@gmail.com> 
wrote:
> This does not even pass a basic smell test for reasonability of 
> matching the capabilities of Solr and the needs of your application. 
> I'd like to hear from others, but I personally would be -1 on this 
> approach to misusing qf. I'd simply say that you need to go back to 
> the drawing board, and that your primary focus should be on working 
> with your application product manager to revise your application 
> requirements to more closely match the capabilities of Solr.
>
> To put it simply, if you have more than a dozen fields in qf, you're 
> probably doing something wrong. In this case horribly wrong.
>
> Focus on designing your app to exploit the capabilities of Solr, not 
> to misuse them.
>
> In short, to answer the original question, more than a couple dozen 
> fields in qf is indeed too many. More than a dozen raises a yellow flag for 
> me.
>
>
> -- Jack Krupansky
>
> On Thu, May 28, 2015 at 8:13 AM, Steven White <swhite4...@gmail.com> wrote:
>
>> Hi Charles,
>>
>> That is what I have done.  At the moment, I have 22 request handlers, 
>> some have 3490 field items in "qf" (that's the most and the qf line 
>> spans over
>> 95,000 characters in solrconfig.xml file) and the least one has 1341 
>> fields.  I'm working on seeing if I can use copyField to copy the 
>> data of that view's field into a single pseudo-view-field and use 
>> that pseudo field for "qf" of that view's request handler.  The I 
>> still have outstanding with using copyField in this way is that it 
>> could lead to a complete re-indexing of all the data in that view 
>> when a field is adding / removing from that view.
>>
>> Thanks
>>
>> Steve
>>
>> On Wed, May 27, 2015 at 6:02 PM, Reitzel, Charles < 
>> charles.reit...@tiaa-cref.org> wrote:
>>
>> > One request handler per view?
>> >
>> > I think if you are able to make the actual view in use for the 
>> > current request a single value (vs. all views that the user could 
>> > use over time), it would keep the qf list down to a manageable size 
>> > (e.g. specified
>> within
>> > the request handler XML).   Not sure if this is feasible for  you, but it
>> > seems like a reasonable approach given the use case you describe.
>> >
>> > Just a thought ...
>> >
>> > -----Original Message-----
>> > From: Steven White [mailto:swhite4...@gmail.com]
>> > Sent: Tuesday, May 26, 2015 4:48 PM
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: When is too many fields in "qf" is too many?
>> >
>> > Thanks Doug.  I might have to take you on the hangout offer.  Let 
>> > me refine the requirement further and if I still see the need, I 
>> > will let
>> you
>> > know.
>> >
>> > Steve
>> >
>> > On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull < 
>> > dturnb...@opensourceconnections.com> wrote:
>> >
>> > > How you have tie is fine. Setting tie to 1 might give you 
>> > > reasonable results. You could easily still have scores that are 
>> > > just always an order of magnitude or two higher, but try it out!
>> > >
>> > > BTW Anything you put in teh URL can also be put into a request handler.
>> > >
>> > > If you ever just want to have a 15 minute conversation via 
>> > > hangout, happy to chat with you :) Might be fun to think through 
>> > > your prob
>> > together.
>> > >
>> > > -Doug
>> > >
>> > > On Tue, May 26, 2015 at 1:42 PM, Steven White 
>> > > <swhite4...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Doug,
>> > > >
>> > > > I'm back to this topic.  Unfortunately, due to my DB structer, 
>> > > > and
>> > > business
>> > > > need, I will not be able to search against a single field (i.e.:
>> > > > using copyField).  Thus, I have to use list of fields via "qf".
>> > > > Given this, I see you said above to use "tie=1.0" will that, 
>> > > > more or less, address this scoring issue?  Should "tie=1.0" be 
>> > > > set on the
>> > request handler like so:
>> > > >
>> > > >   <requestHandler name="/select" class="solr.SearchHandler">
>> > > >      <lst name="defaults">
>> > > >        <str name="echoParams">explicit</str>
>> > > >        <int name="rows">20</int>
>> > > >        <str name="defType">edismax</str>
>> > > >        <str name="qf">F1 F2 F3 F4 ... ... ...</str>
>> > > >        <float name="tie">1.0</float>
>> > > >        <str name="fl">_UNIQUE_FIELD_,score</str>
>> > > >        <str name="wt">xml</str>
>> > > >        <str name="indent">true</str>
>> > > >      </lst>
>> > > >   </requestHandler>
>> > > >
>> > > > Or must "tie" be passed as part of the URL?
>> > > >
>> > > > Thanks
>> > > >
>> > > > Steve
>> > > >
>> > > >
>> > > > On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull < 
>> > > > dturnb...@opensourceconnections.com> wrote:
>> > > >
>> > > > > Yeah a copyField into one could be a good space/time 
>> > > > > tradeoff. It can
>> > > be
>> > > > > more manageable to use an all field for both relevancy and 
>> > > > > performance,
>> > > > if
>> > > > > you can handle the duplication of data.
>> > > > >
>> > > > > You could set tie=1.0, which effectively sums all the matches 
>> > > > > instead
>> > > of
>> > > > > picking the best match. You'll still have cases where one 
>> > > > > field's score might just happen to be far off of another, and 
>> > > > > thus dominating the summation. But something easy to try if 
>> > > > > you want to keep playing with dismax.
>> > > > >
>> > > > > -Doug
>> > > > >
>> > > > > On Wed, May 20, 2015 at 2:56 PM, Steven White 
>> > > > > <swhite4...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi Doug,
>> > > > > >
>> > > > > > Your blog write up on relevancy is very interesting, I 
>> > > > > > didn't know
>> > > > this.
>> > > > > > Looks like I have to go back to my drawing board and figure 
>> > > > > > out an alternative solution: somehow get those 
>> > > > > > group-based-fields data into
>> > > a
>> > > > > > single field using copyField.
>> > > > > >
>> > > > > > Thanks
>> > > > > >
>> > > > > > Steve
>> > > > > >
>> > > > > > On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull < 
>> > > > > > dturnb...@opensourceconnections.com> wrote:
>> > > > > >
>> > > > > > > Steven,
>> > > > > > >
>> > > > > > > I'd be concerned about your relevance with that many qf fields.
>> > > > Dismax
>> > > > > > > takes a "winner takes all" point of view to search. Field 
>> > > > > > > scores
>> > > can
>> > > > > vary
>> > > > > > > by an order of magnitude (or even two) despite the 
>> > > > > > > attempts of
>> > > query
>> > > > > > > normalization. You can read more here
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > http://opensourceconnections.com/blog/2013/07/02/getting-dissed-b
>> > > y-dis 
>> > > max-why-your-incorrect-assumptions-about-dismax-are-hurting-searc
>> > > h-rel
>> > > evancy/
>> > > > > > >
>> > > > > > > I'm about to win the "blashphemer" merit badge, but 
>> > > > > > > ad-hoc
>> > > all-field
>> > > > > like
>> > > > > > > searching over many fields is actually a good use case 
>> > > > > > > for
>> > > > > > Elasticsearch's
>> > > > > > > cross field queries.
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross
>> > > _fiel
>> > > ds_queries.html
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cr
>> > > oss-f
>> > > ield-search-is-a-lie/
>> > > > > > >
>> > > > > > > It wouldn't be hard (and actually a great feature for the
>> > > > > > > project)
>> > > to
>> > > > > get
>> > > > > > > the Lucene query associated with cross field search into Solr.
>> > > > > > > You
>> > > > > could
>> > > > > > > easily write a plugin to integrate it into a query parser:
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > https://github.com/elastic/elasticsearch/blob/master/src/main/jav
>> > > a/org /apache/lucene/queries/BlendedTermQuery.java
>> > > > > > >
>> > > > > > > Hope that helps
>> > > > > > > -Doug
>> > > > > > > --
>> > > > > > > *Doug Turnbull **| *Search Relevance Consultant | 
>> > > > > > > OpenSource
>> > > > > Connections,
>> > > > > > > LLC | 240.476.9983 | http://www.opensourceconnections.com
>> > > > > > > Author: Relevant Search <http://manning.com/turnbull> 
>> > > > > > > from Manning Publications This e-mail and all contents, 
>> > > > > > > including attachments, is considered
>> > > to
>> > > > > be
>> > > > > > > Company Confidential unless explicitly stated otherwise, 
>> > > > > > > regardless of whether attachments are marked as such.
>> > > > > > > On Wed, May 20, 2015 at 8:27 AM, Steven White <
>> > > swhite4...@gmail.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi everyone,
>> > > > > > > >
>> > > > > > > > My solution requires that users in group-A can only 
>> > > > > > > > search
>> > > against
>> > > > a
>> > > > > > set
>> > > > > > > of
>> > > > > > > > fields-A and users in group-B can only search against a 
>> > > > > > > > set of
>> > > > > > fields-B,
>> > > > > > > > etc.  There can be several groups, as many as 100 even more.
>> > > > > > > > To
>> > > > meet
>> > > > > > > this
>> > > > > > > > need, I build my search by passing in the list of 
>> > > > > > > > fields via
>> > > "qf".
>> > > > > > What
>> > > > > > > > goes into "qf" can be large: as many as 1500 fields and 
>> > > > > > > > each
>> > > field
>> > > > > name
>> > > > > > > > averages 15 characters long, in effect the data passed 
>> > > > > > > > via
>> "qf"
>> > > > will
>> > > > > be
>> > > > > > > > over 20K characters.
>> > > > > > > >
>> > > > > > > > Given the above, beside the fact that a search for "apple"
>> > > > > translating
>> > > > > > > to a
>> > > > > > > > 20K characters passing over the network, what else 
>> > > > > > > > within Solr
>> > > and
>> > > > > > > Lucene I
>> > > > > > > > should be worried about if any?  Will I hit some kind 
>> > > > > > > > of a
>> > limit?
>> > > > > Will
>> > > > > > > > each search now require more CPU cycles?  Memory?  Etc.
>> > > > > > > >
>> > > > > > > > If the network traffic becomes an issue, my alternative 
>> > > > > > > > solution
>> > > is
>> > > > > to
>> > > > > > > > create a /select handler for each group and in that 
>> > > > > > > > handler list
>> > > > the
>> > > > > > > fields
>> > > > > > > > under "qf".
>> > > > > > > >
>> > > > > > > > I have considered creating pseudo-fields for each group 
>> > > > > > > > and then
>> > > > use
>> > > > > > > > copyField into that group.  During search, I than can "qf"
>> > > against
>> > > > > that
>> > > > > > > one
>> > > > > > > > field.  Unfortunately, this is not ideal for my 
>> > > > > > > > solution because
>> > > > the
>> > > > > > > fields
>> > > > > > > > that go into each group dynamically change (at least 
>> > > > > > > > once a
>> > > month)
>> > > > > and
>> > > > > > > when
>> > > > > > > > they do change, I have to re-index everything (this I 
>> > > > > > > > have to
>> > > > avoid)
>> > > > > to
>> > > > > > > > sync that group-field.
>> > > > > > > >
>> > > > > > > > I'm using "qf" with edismax and my Solr version is 5.1.
>> > > > > > > >
>> > > > > > > > Thanks
>> > > > > > > >
>> > > > > > > > Steve
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource
>> > > Connections,
>> > > > > LLC | 240.476.9983 | http://www.opensourceconnections.com
>> > > > > Author: Relevant Search <http://manning.com/turnbull> from 
>> > > > > Manning Publications This e-mail and all contents, including 
>> > > > > attachments, is considered to
>> > > be
>> > > > > Company Confidential unless explicitly stated otherwise, 
>> > > > > regardless of whether attachments are marked as such.
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource 
>> > > Connections, LLC | 240.476.9983 | 
>> > > http://www.opensourceconnections.com
>> > > Author: Relevant Search <http://manning.com/turnbull> from 
>> > > Manning Publications This e-mail and all contents, including 
>> > > attachments, is considered to be Company Confidential unless 
>> > > explicitly stated otherwise, regardless of whether attachments are 
>> > > marked as such.
>> > >
>> >
>> > *******************************************************************
>> > ****** This e-mail may contain confidential or privileged 
>> > information.
>> > If you are not the intended recipient, please notify the sender 
>> > immediately and then delete it.
>> >
>> > TIAA-CREF
>> > *******************************************************************
>> > ******
>> >
>>

*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*************************************************************************

Reply via email to