On Aug 13, 2009, at 11:58 AM, Bill Au wrote:
Thanks for the tip on BFTQ. I have been using a nightly build
before that
was committed. I have upgrade to the latest nightly build and will
use that
instead of BTQ.
I got DelimitedPayloadTokenFilter to work and see that the terms and
payload
of the field are correct but the delimiter and payload are stored so
they
appear in the response also. Here is an example:
XML for indexing:
<field name="title">Solr|2.0 In|2.0 Action|2.0</field>
XML response:
<doc>
<str name"title">Solr|2.0 In|2.0 Action|2.0</str>
</doc>
Correct.
I want to set payload on a field that has a variable number of
words. So I
guess I can use a copy field with a PatternTokenizerFactory to
filter out
the delimiter and payload.
I am thinking maybe I can do this instead when indexing:
XML for indexing:
<field name="title" payload="2.0">Solr In Action</field>
Hmmm, interesting, what's your motivation vs. boosting the field?
This will simplify indexing as I don't have to repeat the payload
for each
word in the field. I do have to write a payload aware update
handler. It
looks like I can use Lucene's NumericPayloadTokenFilter in my custom
update
handler to
Any thoughts/comments/suggestions?
Bill
On Wed, Aug 12, 2009 at 7:13 AM, Grant Ingersoll
<gsing...@apache.org>wrote:
On Aug 11, 2009, at 5:30 PM, Bill Au wrote:
It looks like things have changed a bit since this subject was last
brought
up here. I see that there are support in Solr/Lucene for indexing
payload
data (DelimitedPayloadTokenFilterFactory and
DelimitedPayloadTokenFilter).
Overriding the Similarity class is straight forward. So the last
piece of
the puzzle is to use a BoostingTermQuery when searching. I think
all I
need
to do is to subclass Solr's LuceneQParserPlugin uses
SolrQueryParser under
the cover. I think all I need to do is to write my own query parser
plugin
that uses a custom query parser, with the only difference being in
the
getFieldQuery() method where a BoostingTermQuery is used instead
of a
TermQuery.
The BTQ is now deprecated in favor of the
BoostingFunctionTermQuery, which
gives some more flexibility in terms of how the spans in a single
document
are scored.
Am I on the right track?
Yes.
Has anyone done something like this already?
I intend to, but haven't started.
Since Solr already has indexing support for payload, I was hoping
that
query
support is already in the works if not available already. If not,
I am
willing to contribute but will probably need some guidance since my
knowledge in Solr query parser is weak.
https://issues.apache.org/jira/browse/SOLR-1337
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search