Re: Using Lucene's payload in Solr

Bill Au Thu, 13 Aug 2009 10:17:41 -0700

I need to boost a field differently according to the content of the field.
Here is an example:


<doc>
  <field name="name">Solr</field>
  <field name="category" payload="3.0">information retrieval</category>
  <field name="category" payload="2.0">webapp</category>
  <field name="category" payload="2.0>java</category>
  <field name="category" payload="1.0">xml</category>
</doc>
<doc>
  <field name="name">Tomcat</field>
  <field name="category" payload="3.0">webapp</category>
  <field name="category" payload="2.0>java</category>
</doc>
<doc>
  <field name="name">XMLSpy</field>
  <field name="category" payload="3.0">xml</category>
  <field name="category" payload="2.0">ide</category>
</doc>

A seach on category:webapp should return Tomcat before Solr.  A search on
category:xml should return XMLSpy before Solr.

Bill

On Thu, Aug 13, 2009 at 12:13 PM, Grant Ingersoll <gsing...@apache.org>wrote:

>
> On Aug 13, 2009, at 11:58 AM, Bill Au wrote:
>
>  Thanks for the tip on BFTQ.  I have been using a nightly build before that
>> was committed.  I have upgrade to the latest nightly build and will use
>> that
>> instead of BTQ.
>>
>> I got DelimitedPayloadTokenFilter to work and see that the terms and
>> payload
>> of the field are correct but the delimiter and payload are stored so they
>> appear in the response also.  Here is an example:
>>
>> XML for indexing:
>> <field name="title">Solr|2.0 In|2.0 Action|2.0</field>
>>
>>
>> XML response:
>> <doc>
>> <str name"title">Solr|2.0 In|2.0 Action|2.0</str>
>> </doc>
>>
>
>
> Correct.
>
>
>>
>>>  I want to set payload on a field that has a variable number of words.
>>  So I
>> guess I can use a copy field with a PatternTokenizerFactory to filter out
>> the delimiter and payload.
>>
>> I am thinking maybe I can do this instead when indexing:
>>
>> XML for indexing:
>> <field name="title" payload="2.0">Solr In Action</field>
>>
>
> Hmmm, interesting, what's your motivation vs. boosting the field?
>
>
>
>
>> This will simplify indexing as I don't have to repeat the payload for each
>> word in the field.  I do have to write a payload aware update handler.  It
>> looks like I can use Lucene's NumericPayloadTokenFilter in my custom
>> update
>> handler to
>>
>> Any thoughts/comments/suggestions?
>>
>>
>
>  Bill
>>
>>
>> On Wed, Aug 12, 2009 at 7:13 AM, Grant Ingersoll <gsing...@apache.org
>> >wrote:
>>
>>
>>> On Aug 11, 2009, at 5:30 PM, Bill Au wrote:
>>>
>>> It looks like things have changed a bit since this subject was last
>>>
>>>> brought
>>>> up here.  I see that there are support in Solr/Lucene for indexing
>>>> payload
>>>> data (DelimitedPayloadTokenFilterFactory and
>>>> DelimitedPayloadTokenFilter).
>>>> Overriding the Similarity class is straight forward.  So the last piece
>>>> of
>>>> the puzzle is to use a BoostingTermQuery when searching.  I think all I
>>>> need
>>>> to do is to subclass Solr's LuceneQParserPlugin uses SolrQueryParser
>>>> under
>>>> the cover.  I think all I need to do is to write my own query parser
>>>> plugin
>>>> that uses a custom query parser, with the only difference being in the
>>>> getFieldQuery() method where a BoostingTermQuery is used instead of a
>>>> TermQuery.
>>>>
>>>>
>>> The BTQ is now deprecated in favor of the BoostingFunctionTermQuery,
>>> which
>>> gives some more flexibility in terms of how the spans in a single
>>> document
>>> are scored.
>>>
>>>
>>>  Am I on the right track?
>>>>
>>>>
>>> Yes.
>>>
>>> Has anyone done something like this already?
>>>
>>>>
>>>>
>>> I intend to, but haven't started.
>>>
>>> Since Solr already has indexing support for payload, I was hoping that
>>>
>>>> query
>>>> support is already in the works if not available already.  If not, I am
>>>> willing to contribute but will probably need some guidance since my
>>>> knowledge in Solr query parser is weak.
>>>>
>>>>
>>>
>>> https://issues.apache.org/jira/browse/SOLR-1337
>>>
>>>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Re: Using Lucene's payload in Solr

Reply via email to