I use about that many qf's in Solr 1.4.1. It works. I'm not entirely sure if it has performance implications -- I do have searching that is somewhat slower then I'd like, but I'm not sure if the lengthy qf is a contributing factor, or other things I'm doing (like a dozen different facet.fields too!). I haven't profiled everything. But it doesn't grind my Solr to a halt or anything, it works.

Seperately, I've also been thinking of other ways to get similar highlighting behavior as you describe, give the 'field' that the match was in in the highlight response, but haven't come up with anything great, if your approach works, that's cool. I've been trying to think of a way to store a single stored field in a structured format (CSV? XML?), and somehow have the highlighter return the complete 'field' that matches, not just the surrounding X words. But haven't gotten anywhere on that, just an idle thought.

Jonathan

On 3/4/2011 10:09 AM, Jeff Schmidt wrote:
Hello:

I'm working on implementing a requirement where when a document is returned, we want to 
pithily tell the end user why. That is, say, with five documents returned, they may be so 
for similar or different reasons. These "reasons" are the field(s) in which 
matches occurred.  Some are more important than others, and I'll have to return just the 
most relevant one or two reasons to not overwhelm the user.

This is a separate goal than Solr's scoring of the returned documents. That is, 
index/query time boosting can indicate which fields are more significant in computing the 
overall document score, but then I need to know what fields where, matched with what 
terms. I do have an application that stands between Solr and the end user (RESTful API), 
so I figured I can rank the "reasons" and return more domain specific names 
rather than the Solr fields names.

So, I've turned to highlighting, and in the results I can see for each document ID 
the fields matched, and the text in the field etc. Great. But,  to get that to work, 
I have to specifically query individual fields. That is, the approach 
of<copyField>'ing a bunch of fields to a common text field for efficiency 
purposes is no longer an option. And, using the dismax request handler, I'm querying 
a lot of fields:

      <str name="qf">
         n_nameExact^4.0
         n_macromolecule_nameExact^3.0
         n_macromolecule_name^2.0
         n_macromolecule_id^1.8
         n_pathway_nameExact^1.5
         n_top_regulates
         n_top_regulated_by
         n_top_binds
         n_top_role_in_cell
         n_top_disease
         n_molecular_function
         n_protein_family
         n_subcell_location
         n_pathway_name
         n_cell_component
         n_bio_process
         n_synonym^0.5
         n_macromolecule_summary^0.6
         p_nameExact^4.0
         p_name^2.0
         p_description^0.6
      </str>

Is that crazy?  Is telling Solr to look at so many individual fields going to 
be a performance problem?  I'm only prototyping at this stage and it works 
great. :)  I've not run anything yet at scale handling lots of requests.

There are two document types in that shared index, demarcated using a field 
named type.  So, when configuring the SolrJ SolrQuery, I do setup 
addFilterQuery() to select one or the other type.

Anyway, using dismax with all of those query fields along with highlighting, I 
get the information I need to render meaningful results for the end user.  But, 
it has a sort of smell to it. :)   Shall I look for another way, or am I 
worrying about nothing?

I am current using Solr 3.1 trunk.

Thanks!

Jeff
--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com


Reply via email to