Re: [HACKERS] ts_rank
I'm sorry, my plane to Nepal is waiting me :) I'll be back in the midst of November. In short, ts_rank is based only on frequencies of lexems and doesn't count distance between query lexems. Also, it supports only primitive queries. Oleg On Wed, 12 Oct 2011, Bruce Momjian wrote: Bruce Momjian wrote: Mark wrote: There's some potentially useful information here: http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING Thanks for reply. I was reading the documentation of PostgreSQL, but there it is not written the name of the used methods. Everywhere there is written, that ts_rank use standard ranking function. But it is difficult to say which is the standard function. Somewhere I found that it is maybe based on Vector space model and it seems to be truth, because in the code of tsrank.c is counted the frequency of words, but I am not sure of that :-( Oleg, Teodor, can you give me a description of how ts_rank decided how to rank items? Thanks. Any news on this question? Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ts_rank
Oleg Bartunov wrote: I'm sorry, my plane to Nepal is waiting me :) I'll be back in the midst of November. In short, ts_rank is based only on frequencies of lexems and doesn't count distance between query lexems. Also, it supports only primitive queries. Thanks. Attached doc patch applied to head and 9.1.X. --- Oleg On Wed, 12 Oct 2011, Bruce Momjian wrote: Bruce Momjian wrote: Mark wrote: There's some potentially useful information here: http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING Thanks for reply. I was reading the documentation of PostgreSQL, but there it is not written the name of the used methods. Everywhere there is written, that ts_rank use standard ranking function. But it is difficult to say which is the standard function. Somewhere I found that it is maybe based on Vector space model and it seems to be truth, because in the code of tsrank.c is counted the frequency of words, but I am not sure of that :-( Oleg, Teodor, can you give me a description of how ts_rank decided how to rank items? Thanks. Any news on this question? Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml new file mode 100644 index ef228e3..46db103 *** a/doc/src/sgml/textsearch.sgml --- b/doc/src/sgml/textsearch.sgml *** ts_rank(optional replaceable class=P *** 867,873 listitem para ! Standard ranking function.!-- TODO document this better -- /para /listitem /varlistentry --- 867,873 listitem para ! Ranks vectors based on the frequency of their matching lexems. /para /listitem /varlistentry -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ts_rank
Bruce Momjian wrote: Mark wrote: There's some potentially useful information here: http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING Thanks for reply. I was reading the documentation of PostgreSQL, but there it is not written the name of the used methods. Everywhere there is written, that ts_rank use standard ranking function. But it is difficult to say which is the standard function. Somewhere I found that it is maybe based on Vector space model and it seems to be truth, because in the code of tsrank.c is counted the frequency of words, but I am not sure of that :-( Oleg, Teodor, can you give me a description of how ts_rank decided how to rank items? Thanks. Any news on this question? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ts_rank
Mark wrote: There's some potentially useful information here: http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING Thanks for reply. I was reading the documentation of PostgreSQL, but there it is not written the name of the used methods. Everywhere there is written, that ts_rank use standard ranking function. But it is difficult to say which is the standard function. Somewhere I found that it is maybe based on Vector space model and it seems to be truth, because in the code of tsrank.c is counted the frequency of words, but I am not sure of that :-( Oleg, Teodor, can you give me a description of how ts_rank decided how to rank items? Thanks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ts_rank
There's some potentially useful information here: http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING Thanks for reply. I was reading the documentation of PostgreSQL, but there it is not written the name of the used methods. Everywhere there is written, that ts_rank use standard ranking function. But it is difficult to say which is the standard function. Somewhere I found that it is maybe based on Vector space model and it seems to be truth, because in the code of tsrank.c is counted the frequency of words, but I am not sure of that :-( -- View this message in context: http://postgresql.1045698.n5.nabble.com/ts-rank-tp4384614p4414631.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ts_rank
On Thu, May 19, 2011 at 10:42 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas wrote: Mark wrote: Could somebody explain me on which methods is based ts_rank and how it works? I would appreciate some articles, if exist. As far as I can tell, our documentation contains no useful information on this topic whatsoever. :-( There's some potentially useful information here: http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING Ah, yes. I didn't read that carefully enough. That is useful, but it doesn't really explain how it works. Although I don't know if it addresses Mark's question very well. Personally, I wonder how relevant ts_rank will be after knn-giswt is out I don't see why it would be any less useful... though if someone could find a way to KNN-ify such searches, I'm sure there would be a lot of very happy users. Seems pretty difficult, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ts_rank
On Tue, May 10, 2011 at 6:21 AM, Mark marek.bal...@seznam.cz wrote: Could somebody explain me on which methods is based ts_rank and how it works? I would appreciate some articles, if exist. Thanks a lot for reply. As far as I can tell, our documentation contains no useful information on this topic whatsoever. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ts_rank
Robert Haas wrote: Mark wrote: Could somebody explain me on which methods is based ts_rank and how it works? I would appreciate some articles, if exist. As far as I can tell, our documentation contains no useful information on this topic whatsoever. :-( There's some potentially useful information here: http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING Although I don't know if it addresses Mark's question very well. Personally, I wonder how relevant ts_rank will be after knn-giswt is out -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ts_rank
Could somebody explain me on which methods is based ts_rank and how it works? I would appreciate some articles, if exist. Thanks a lot for reply. Mark -- View this message in context: http://postgresql.1045698.n5.nabble.com/ts-rank-tp4384120p4384120.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ts_rank internals
On Tue, 11 Sep 2007, Teodor Sigaev wrote: I tried to understand how ts_rank works, but I failed. What does Cover function do? How does it work? What is the DocRepresentation data structure like? I can see the definition of the struct, and the get_docrep function to convert to that format, but by reading those I can't figure out what the resulting DocRepresentation looks like. I wonder if we could get rid of the istrue flag in QueryOperand, and use a local BitmapSet variable instead? It seems wrong to have a temporary flag that's only used in one function, in a struct that's used everywhere. It's a play around CDR algorithms (Cover Density Ranking). Based on paper Clarke et al., Relevance Ranking for One to Three Term Queries. (http://citeseer.ist.psu.edu/clarke00relevance.html. Sorry, I lost the article itself, but may be Oleg has it. Simple and short description is placed at http://www2002.org/CDROM/refereed/643/node7.html. We change original algorithm to support weight of lexeme, details are on Oleg's site: http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking Actually, we used two papers http://citeseer.ist.psu.edu/clarke00relevance.html and http://portal.acm.org/ft_gateway.cfm?id=333137type=pdfdl=GUIDEdl=ACM I can send you the latter if you have no access to the ACM. Array of DocRepresentation is a representation of document, it contains only lexemes from both tsvector and tsquery, and lexemes are ordered by position - as in original doc. Each DocRepresentation has links to corresponding QueryOperand to optimize query execution while extent search. When we enlarge current extent for one word then we set istrue flag for corresponding QueryOperand and execution tsquery from cover becomes very simple task. It's possible to eliminate istrue flag, but it's needed to implement algorithm to execute tsquery over continuos part of document, not over whole document. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Ts_rank internals
Hi, I tried to understand how ts_rank works, but I failed. What does Cover function do? How does it work? What is the DocRepresentation data structure like? I can see the definition of the struct, and the get_docrep function to convert to that format, but by reading those I can't figure out what the resulting DocRepresentation looks like. I wonder if we could get rid of the istrue flag in QueryOperand, and use a local BitmapSet variable instead? It seems wrong to have a temporary flag that's only used in one function, in a struct that's used everywhere. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Ts_rank internals
I tried to understand how ts_rank works, but I failed. What does Cover function do? How does it work? What is the DocRepresentation data structure like? I can see the definition of the struct, and the get_docrep function to convert to that format, but by reading those I can't figure out what the resulting DocRepresentation looks like. I wonder if we could get rid of the istrue flag in QueryOperand, and use a local BitmapSet variable instead? It seems wrong to have a temporary flag that's only used in one function, in a struct that's used everywhere. It's a play around CDR algorithms (Cover Density Ranking). Based on paper Clarke et al., “Relevance Ranking for One to Three Term Queries.” (http://citeseer.ist.psu.edu/clarke00relevance.html. Sorry, I lost the article itself, but may be Oleg has it. Simple and short description is placed at http://www2002.org/CDROM/refereed/643/node7.html. We change original algorithm to support weight of lexeme, details are on Oleg's site: http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking Array of DocRepresentation is a representation of document, it contains only lexemes from both tsvector and tsquery, and lexemes are ordered by position - as in original doc. Each DocRepresentation has links to corresponding QueryOperand to optimize query execution while extent search. When we enlarge current extent for one word then we set istrue flag for corresponding QueryOperand and execution tsquery from cover becomes very simple task. It's possible to eliminate istrue flag, but it's needed to implement algorithm to execute tsquery over continuos part of document, not over whole document. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend