I use about that many qf's in Solr 1.4.1. It works. I'm not entirely
sure if it has performance implications -- I do have searching that is
somewhat slower then I'd like, but I'm not sure if the lengthy qf is a
contributing factor, or other things I'm doing (like a dozen different
facet.fields too!). I haven't profiled everything. But it doesn't
grind my Solr to a halt or anything, it works.
Seperately, I've also been thinking of other ways to get similar
highlighting behavior as you describe, give the 'field' that the match
was in in the highlight response, but haven't come up with anything
great, if your approach works, that's cool. I've been trying to think
of a way to store a single stored field in a structured format (CSV?
XML?), and somehow have the highlighter return the complete 'field' that
matches, not just the surrounding X words. But haven't gotten anywhere
on that, just an idle thought.
Jonathan
On 3/4/2011 10:09 AM, Jeff Schmidt wrote:
Hello:
I'm working on implementing a requirement where when a document is returned, we want to
pithily tell the end user why. That is, say, with five documents returned, they may be so
for similar or different reasons. These "reasons" are the field(s) in which
matches occurred. Some are more important than others, and I'll have to return just the
most relevant one or two reasons to not overwhelm the user.
This is a separate goal than Solr's scoring of the returned documents. That is,
index/query time boosting can indicate which fields are more significant in computing the
overall document score, but then I need to know what fields where, matched with what
terms. I do have an application that stands between Solr and the end user (RESTful API),
so I figured I can rank the "reasons" and return more domain specific names
rather than the Solr fields names.
So, I've turned to highlighting, and in the results I can see for each document ID
the fields matched, and the text in the field etc. Great. But, to get that to work,
I have to specifically query individual fields. That is, the approach
of<copyField>'ing a bunch of fields to a common text field for efficiency
purposes is no longer an option. And, using the dismax request handler, I'm querying
a lot of fields:
<str name="qf">
n_nameExact^4.0
n_macromolecule_nameExact^3.0
n_macromolecule_name^2.0
n_macromolecule_id^1.8
n_pathway_nameExact^1.5
n_top_regulates
n_top_regulated_by
n_top_binds
n_top_role_in_cell
n_top_disease
n_molecular_function
n_protein_family
n_subcell_location
n_pathway_name
n_cell_component
n_bio_process
n_synonym^0.5
n_macromolecule_summary^0.6
p_nameExact^4.0
p_name^2.0
p_description^0.6
</str>
Is that crazy? Is telling Solr to look at so many individual fields going to
be a performance problem? I'm only prototyping at this stage and it works
great. :) I've not run anything yet at scale handling lots of requests.
There are two document types in that shared index, demarcated using a field
named type. So, when configuring the SolrJ SolrQuery, I do setup
addFilterQuery() to select one or the other type.
Anyway, using dismax with all of those query fields along with highlighting, I
get the information I need to render meaningful results for the end user. But,
it has a sort of smell to it. :) Shall I look for another way, or am I
worrying about nothing?
I am current using Solr 3.1 trunk.
Thanks!
Jeff
--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com