Re: [PERFORM] Search for fixed set of keywords

Oleg Bartunov Wed, 09 Jan 2008 03:01:39 -0800

Did you try integer arrays with GIN (inverted index) ?


Oleg
On Wed, 9 Jan 2008, J?rg Kiegeland wrote:

Hello,
I have an interesting generic search task, for which I have done differentperformance tests and I would like to share and discuss my results on thisnewsgroup.
So I begin to describe the search task:

=========
You have a set of N unique IDs. Every ID is associated with an integerscoring value. Every ID is also associated with up to K different keywords(there are totally K different keywords K1 ... Kn). Now find the first Zbest-scored IDs which are associated with a given set of keywords in one oftwo ways:
(C1) The ID must be associated with all keywords of the given set ofkeywords.(C2) The ID must be associated with at least one keyword of the given set ofkeywords.
=========
My tests showed that only a Multiple-Column-approach resulted in a acceptablequery response time. I also tried out an int-array approach using gist, asub-string approach, a bit-column approach, and even a sub-string approachusing Solr.Actually, the int-array approach was 20% faster for Z=infinity, but it becamelinear for the test case [Z=1000 and *all* IDs matches the search condition].(To be not misunderstood, "acceptable time" means: having a fixed Z, a fixedset of keywords K, a fixed query, and an increasing N, results in constant upto logarithmic response time; linear or worser-than-linear time is notaccepted)
In the Multiple-Column-approach, there is one table. The table has a booleancolumn for each keyword. It has also a column for the ID and for the scoring.Now, for each keyword column and for the scoring column a separate index iscreated.C1 is implemented by an AND-query on the keyword columns, C2 by and OR query,and the result is sorted for the scoring column, cutting of after the first Zresults.
However our requirements for the search task have changed and I not yetmanaged to find a search approach with acceptable response time for followingvariation:Namely that one uses C2 and do not sort for a scoring column but use asscoring value the number of matched keywords for a given ID.The difficulty in this query type is that the scoring is dependent on thequery itself..
So has anyone an idea how to solve this query type with acceptable responsetime, or can anybody tell/prove, that this is theoretically not possible?
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Re: [PERFORM] Search for fixed set of keywords

Reply via email to