I need to boost a field differently according to the content of the field. Here is an example:
<doc> <field name="name">Solr</field> <field name="category" payload="3.0">information retrieval</category> <field name="category" payload="2.0">webapp</category> <field name="category" payload="2.0>java</category> <field name="category" payload="1.0">xml</category> </doc> <doc> <field name="name">Tomcat</field> <field name="category" payload="3.0">webapp</category> <field name="category" payload="2.0>java</category> </doc> <doc> <field name="name">XMLSpy</field> <field name="category" payload="3.0">xml</category> <field name="category" payload="2.0">ide</category> </doc> A seach on category:webapp should return Tomcat before Solr. A search on category:xml should return XMLSpy before Solr. Bill On Thu, Aug 13, 2009 at 12:13 PM, Grant Ingersoll <gsing...@apache.org>wrote: > > On Aug 13, 2009, at 11:58 AM, Bill Au wrote: > > Thanks for the tip on BFTQ. I have been using a nightly build before that >> was committed. I have upgrade to the latest nightly build and will use >> that >> instead of BTQ. >> >> I got DelimitedPayloadTokenFilter to work and see that the terms and >> payload >> of the field are correct but the delimiter and payload are stored so they >> appear in the response also. Here is an example: >> >> XML for indexing: >> <field name="title">Solr|2.0 In|2.0 Action|2.0</field> >> >> >> XML response: >> <doc> >> <str name"title">Solr|2.0 In|2.0 Action|2.0</str> >> </doc> >> > > > Correct. > > >> >>> I want to set payload on a field that has a variable number of words. >> So I >> guess I can use a copy field with a PatternTokenizerFactory to filter out >> the delimiter and payload. >> >> I am thinking maybe I can do this instead when indexing: >> >> XML for indexing: >> <field name="title" payload="2.0">Solr In Action</field> >> > > Hmmm, interesting, what's your motivation vs. boosting the field? > > > > >> This will simplify indexing as I don't have to repeat the payload for each >> word in the field. I do have to write a payload aware update handler. It >> looks like I can use Lucene's NumericPayloadTokenFilter in my custom >> update >> handler to >> >> Any thoughts/comments/suggestions? >> >> > > Bill >> >> >> On Wed, Aug 12, 2009 at 7:13 AM, Grant Ingersoll <gsing...@apache.org >> >wrote: >> >> >>> On Aug 11, 2009, at 5:30 PM, Bill Au wrote: >>> >>> It looks like things have changed a bit since this subject was last >>> >>>> brought >>>> up here. I see that there are support in Solr/Lucene for indexing >>>> payload >>>> data (DelimitedPayloadTokenFilterFactory and >>>> DelimitedPayloadTokenFilter). >>>> Overriding the Similarity class is straight forward. So the last piece >>>> of >>>> the puzzle is to use a BoostingTermQuery when searching. I think all I >>>> need >>>> to do is to subclass Solr's LuceneQParserPlugin uses SolrQueryParser >>>> under >>>> the cover. I think all I need to do is to write my own query parser >>>> plugin >>>> that uses a custom query parser, with the only difference being in the >>>> getFieldQuery() method where a BoostingTermQuery is used instead of a >>>> TermQuery. >>>> >>>> >>> The BTQ is now deprecated in favor of the BoostingFunctionTermQuery, >>> which >>> gives some more flexibility in terms of how the spans in a single >>> document >>> are scored. >>> >>> >>> Am I on the right track? >>>> >>>> >>> Yes. >>> >>> Has anyone done something like this already? >>> >>>> >>>> >>> I intend to, but haven't started. >>> >>> Since Solr already has indexing support for payload, I was hoping that >>> >>>> query >>>> support is already in the works if not available already. If not, I am >>>> willing to contribute but will probably need some guidance since my >>>> knowledge in Solr query parser is weak. >>>> >>>> >>> >>> https://issues.apache.org/jira/browse/SOLR-1337 >>> >>> > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > >