Re: [lucy-user] Lucy Benchmarking

Nick Wellnhofer Thu, 09 Feb 2017 09:15:06 -0800

On 09/02/2017 01:46, Kasi Lakshman Karthi Anbumony wrote:

(1) Plan is to report the below metrics:


   - Index creation: tokens/second
      -  Can I know how to obtain the tokens in the lucy_index created? Do
      you think a better metric will be  (Number of terms in the posting
      list/second)? If so, how to obtain the number of terms in the
posting list?

AFAIK, the total number of terms in all input documents isn't availablebecause the term frequencies aren't stored separately. I'd simply use thetotal size of the input documents in bytes.

(2) What are the different query types possible?

   - vary document weighting
      - Is it possible or is it fixed for a given lucy_index generated?


You can apply a boost to queries at query time:

    http://lucy.apache.org/docs/c/Lucy/Search/Query.html#func_Set_Boost

And to fields and documents at indexing time:

    http://lucy.apache.org/docs/c/Lucy/Plan/FieldType.html#func_Set_Boost
    http://lucy.apache.org/docs/c/Lucy/Index/Indexer.html#func_Add_Doc

But for benchmarking purposes, it mostly matters whether you sort by score,document id, or a field value. See


    http://lucy.apache.org/docs/c/Lucy/Search/SortSpec.html

   - vary relationship of terms (e.g., proximity)
      - How to do it? Is there an operator like NEAR?


There's ProximityQuery but I'm not sure how it works:

    http://lucy.apache.org/docs/c/LucyX/Search/ProximityQuery.html

   - vary operations (e.g., AND, OR)
      - I see that the support is available for boolean query parser. Can I
      know whether for a given search instance I can have multiple boolean
      queries like below?


Yes, that's possible.

Nick

Re: [lucy-user] Lucy Benchmarking

Reply via email to