Hi,

Thanks for the help. From my investigations, it seems it's not possible to meet the requirements in a super-straightforward way - a query that uses several text indexes adds each individual score together, so the only output available is the total score.


Trying to separate the scores out (for example so it's a tuple (title_score, description_score, body_text_score) that I can sort on) looks quite hard - it looks like it would mean changing the indexes to return the scores in this different format.

My latest approach is to do something like the following (untested):

from BTrees.IIBTree import difference

def specialSearch(words):

   # i'm going to manipulate the indexes directly
   getIndex = portal_catalog._catalog.getIndex

   r1, id1 = getIndex('Title')._apply_index( {'Title':words} )
   r2, id2 = getIndex('Description')._apply_index( {'Description':words} )
r3, id3 = getIndex('SearchableText')._apply_index( {'SearchableText':words} )

   # de-dupe this set of results
   r3 = difference(r3, r2)
   r2 = difference(r2, r1)

   # now i have 3 IIBuckets, consisting of (docid, score) tuples
   # i sort them into order on score
   r1 = r1.byValue(0)
   r2 = r2.byValue(0)
   r3 = r3.byValue(0)

   # concatenate them, preserving the order
   res = r1 + r2 + r3

   # return something catalog brain-like
   return LazyMap(catalog.__getitem__, rs, len(rs))

My debug-prompt tests seem to indicate that this should work. I don't know if anyone who knows more about lists and btrees can comment if there's a better way to do the sorting and concatenation of the different result sets.

Thanks,

Miles




Jonathan wrote:

----- Original Message ----- From: "Miles Waller" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, May 31, 2006 10:59 AM
Subject: [Zope] relevance ranking in ZCTextIndex or equivalent


Hi,

I'm planning to implement a text search where

(match against the title)
 ranks more highly than
(match in the description)
 ranks more highly than
(matches against the body text).

Titles and descriptions are short bits of text, so results in these
categories can be ranked just by the frequency that the word appears in
that part of the text.  Matches against the body text should ideally be
ranked more like ZCTextIndex rather than plain frequency.

My ideas are:

- do three separate searches, and then concatenate the result sets
together.
problem: making sure there are no duplicates in the list without parsing
all the results in their entirety.

- hijack the 'scoring' part of the index, so those results with matches
in the title can have their scores artificially heightened to achieve
the ordering i want
problem: it's compleletely opaque without a lot of study whether this
would achieve what i want.  i'd also need to index the items so the
index knew what was in the title, which could be a problem.

- index title, description and text separately, and then use dieter's
AdvancedQuery product to do the query and combine results
problem: is it possible to get at the scores when the documents are
returned from the index to be able to order them?  are the scores
returned separately, or will each query overwrite the last one?

Has anyone ever tried to do this - or got any pointers - at all?


A definitely non-trivial task, but here are some ideas to get you pointed in the right (I hope) direction:

Try googling, or looking in the zope source for:

data_record_normalized_score_
BaseIndex.py
OkapiIndex.py
SetOps.py
okascore.c


Good Luck!

Jonathan
_______________________________________________
Zope maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope-dev )


_______________________________________________
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope-dev )

Reply via email to