Re: Lucene indexing questions

Roman Chyla Thu, 7 Oct 2010 16:52:49 +0200

Hi,

Apologies for spamming, I was tired and thus I jumped away from the
normal work, reading Lucen White Paper.
http://www.lucidimagination.com/files/file/whitepaper/LIWP_WhatsNew_Lucene3.0%2B2.9.pdf


on page 17, they actually say that in the new lucene it is possible to
use the payload for searching
http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/search/payloads/PayloadNearQuery.html

which is the contrary to the info below, got me wondering..., oh yeah,
panta rei!
cheers,


roman

On Wed, Oct 6, 2010 at 10:57 PM, Jay Luker <[email protected]> wrote:
> On Wed, Oct 6, 2010 at 4:08 PM, Roman Chyla <[email protected]> wrote:
>
>> 1)
>> -- is it possible to use payload for search? [i know it can influence
>> scoring and be useful for display, but as i understand it, it is a
>> metadata about the given position]
>>
>> example, if we assume situation when we index authors <-- and add
>> payload to them
>>
>> field:author | payload [affiliation,field_of_study,email]
>> ------------------------------
>> ellis            | cern,umi  hep-theory [email protected]
>> swank        | umi  hep-ex  [email protected]
>>
>> is it possible to query this structure directly? ex.
>>
>> "author:swink~4 and author:affiliation:cern"
>>
>> (I want to find all names similar to swink, schwink, sink... and i
>> also know the person is working at cern -- but i am not interested in
>> a record which was written by swink@umi, and ellis@cern --> i want
>> only swink@cern and for that i need payload)
>
>
> The answer to the specific question is no, you can't query the payload
> directly.
> Suggested alternatives:
>   * Index the author and the affiliation at the same position. There
> should then be a way to query for "swink~4" and "cern" and specify
> there must be zero distance between the terms.
>   * index the author and the affiliation with a delimiter, like "swink_cern",
>
>>
>> 2)
>> What would be the best strategy to have several separate indexes? Ie.
>> to have a separate index for metadata, for recently-changed-metadata,
>> fulltext, citation-pairs?
>>
>> presumably, all those indexes contain only records (so the results
>> from them are mergeable on the recid match), but obviously the scoring
>> function makes sense only inside the index; but if one would like to
>> combine results (in a meaningful way) from the several indexes, what
>> would be the best strategy?
>>
>
> Grant says something called ParallelReader could be used in this case.
>
> I need more time to digest your first answer to the original question.
>
> --
> ******************************************************
> Jay Luker               Astrophysics Data System (ADS)
> [email protected]  Center for Astrophysics
> 617-495-4588            60 Garden Street  MS 67
> 617-495-7356 fax        Cambridge, MA  02138
> ******************************************************
>

Re: Lucene indexing questions

Reply via email to