Re: [HACKERS] ts_rank

2011-10-13 Thread Oleg Bartunov

I'm sorry, my plane to Nepal is waiting me :) I'll be back in the
midst of November. In short, ts_rank is based only on frequencies of lexems
and doesn't count distance between query lexems. Also, it supports only
primitive queries.

Oleg
On Wed, 12 Oct 2011, Bruce Momjian wrote:


Bruce Momjian wrote:

Mark wrote:

There's some potentially useful information here:
http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING


Thanks for reply. I was reading the documentation of PostgreSQL, but there
it is not written the name of the used methods. Everywhere there is written,
that ts_rank use standard ranking function. But it is difficult to say which
is the standard function.
Somewhere I found that it is maybe based on Vector space model and it seems
to be truth, because in the code of tsrank.c is counted the frequency of
words, but I am not sure of that :-(


Oleg, Teodor, can you give me a description of how ts_rank decided how
to rank items?  Thanks.


Any news on this question?




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ts_rank

2011-10-13 Thread Bruce Momjian
Oleg Bartunov wrote:
 I'm sorry, my plane to Nepal is waiting me :) I'll be back in the
 midst of November. In short, ts_rank is based only on frequencies of lexems
 and doesn't count distance between query lexems. Also, it supports only
 primitive queries.

Thanks.  Attached doc patch applied to head and 9.1.X.

---


 
 Oleg
 On Wed, 12 Oct 2011, Bruce Momjian wrote:
 
  Bruce Momjian wrote:
  Mark wrote:
  There's some potentially useful information here:
  http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING
 
  Thanks for reply. I was reading the documentation of PostgreSQL, but there
  it is not written the name of the used methods. Everywhere there is 
  written,
  that ts_rank use standard ranking function. But it is difficult to say 
  which
  is the standard function.
  Somewhere I found that it is maybe based on Vector space model and it 
  seems
  to be truth, because in the code of tsrank.c is counted the frequency of
  words, but I am not sure of that :-(
 
  Oleg, Teodor, can you give me a description of how ts_rank decided how
  to rank items?  Thanks.
 
  Any news on this question?
 
 
 
   Regards,
   Oleg
 _
 Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
 Sternberg Astronomical Institute, Moscow University, Russia
 Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
 phone: +007(495)939-16-83, +007(495)939-23-83

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
new file mode 100644
index ef228e3..46db103
*** a/doc/src/sgml/textsearch.sgml
--- b/doc/src/sgml/textsearch.sgml
*** ts_rank(optional replaceable class=P
*** 867,873 
  
listitem
 para
! Standard ranking function.!-- TODO document this better --
 /para
/listitem
   /varlistentry
--- 867,873 
  
listitem
 para
! Ranks vectors based on the frequency of their matching lexems.
 /para
/listitem
   /varlistentry

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ts_rank

2011-10-12 Thread Bruce Momjian
Bruce Momjian wrote:
 Mark wrote:
  There's some potentially useful information here:
  http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING
  
  Thanks for reply. I was reading the documentation of PostgreSQL, but there
  it is not written the name of the used methods. Everywhere there is written,
  that ts_rank use standard ranking function. But it is difficult to say which
  is the standard function. 
  Somewhere I found that it is maybe based on Vector space model and it seems
  to be truth, because in the code of tsrank.c is counted the frequency of
  words, but I am not sure of that :-(
 
 Oleg, Teodor, can you give me a description of how ts_rank decided how
 to rank items?  Thanks.

Any news on this question?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ts_rank

2011-09-10 Thread Bruce Momjian
Mark wrote:
 There's some potentially useful information here:
 http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING
 
 Thanks for reply. I was reading the documentation of PostgreSQL, but there
 it is not written the name of the used methods. Everywhere there is written,
 that ts_rank use standard ranking function. But it is difficult to say which
 is the standard function. 
 Somewhere I found that it is maybe based on Vector space model and it seems
 to be truth, because in the code of tsrank.c is counted the frequency of
 words, but I am not sure of that :-(

Oleg, Teodor, can you give me a description of how ts_rank decided how
to rank items?  Thanks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ts_rank

2011-05-21 Thread Mark
There's some potentially useful information here:
http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING

Thanks for reply. I was reading the documentation of PostgreSQL, but there
it is not written the name of the used methods. Everywhere there is written,
that ts_rank use standard ranking function. But it is difficult to say which
is the standard function. 
Somewhere I found that it is maybe based on Vector space model and it seems
to be truth, because in the code of tsrank.c is counted the frequency of
words, but I am not sure of that :-(



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/ts-rank-tp4384614p4414631.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ts_rank

2011-05-20 Thread Robert Haas
On Thu, May 19, 2011 at 10:42 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas  wrote:
 Mark  wrote:

 Could somebody explain me on which methods is based ts_rank and
 how it works?  I would appreciate some articles, if exist.

 As far as I can tell, our documentation contains no useful
 information on this topic whatsoever. :-(

 There's some potentially useful information here:

 http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING

Ah, yes.  I didn't read that carefully enough.  That is useful, but it
doesn't really explain how it works.

 Although I don't know if it addresses Mark's question very well.
 Personally, I wonder how relevant ts_rank will be after knn-giswt
 is out

I don't see why it would be any less useful... though if someone could
find a way to KNN-ify such searches, I'm sure there would be a lot of
very happy users.  Seems pretty difficult, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ts_rank

2011-05-19 Thread Robert Haas
On Tue, May 10, 2011 at 6:21 AM, Mark marek.bal...@seznam.cz wrote:
 Could somebody explain me on which methods is based ts_rank and how it works?
 I would appreciate some articles, if exist.
 Thanks a lot for reply.

As far as I can tell, our documentation contains no useful information
on this topic whatsoever.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ts_rank

2011-05-19 Thread Kevin Grittner
Robert Haas  wrote:
 Mark  wrote:
 
 Could somebody explain me on which methods is based ts_rank and
 how it works?  I would appreciate some articles, if exist.
 
 As far as I can tell, our documentation contains no useful
 information on this topic whatsoever. :-(
 
There's some potentially useful information here:
 
http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING
 
Although I don't know if it addresses Mark's question very well.
Personally, I wonder how relevant ts_rank will be after knn-giswt
is out
 
-Kevin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ts_rank internals

2007-09-11 Thread Oleg Bartunov

On Tue, 11 Sep 2007, Teodor Sigaev wrote:


I tried to understand how ts_rank works, but I failed. What does Cover
function do? How does it work? What is the DocRepresentation data
structure like? I can see the definition of the struct, and the
get_docrep function to convert to that format, but by reading those I
can't figure out what the resulting DocRepresentation looks like.
I wonder if we could get rid of the istrue flag in QueryOperand, and use
a local BitmapSet variable instead? It seems wrong to have a temporary
flag that's only used in one function, in a struct that's used everywhere.

It's a play around CDR algorithms (Cover Density Ranking).

Based on paper Clarke et al., Relevance Ranking for One to Three Term 
Queries.   (http://citeseer.ist.psu.edu/clarke00relevance.html. Sorry, I 
lost the article itself, but may be Oleg has it. Simple and short description 
is placed at http://www2002.org/CDROM/refereed/643/node7.html.


We change original algorithm to support weight of lexeme, details are on 
Oleg's site: http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking


Actually, we used two papers
http://citeseer.ist.psu.edu/clarke00relevance.html
and 
http://portal.acm.org/ft_gateway.cfm?id=333137type=pdfdl=GUIDEdl=ACM

I can send you the latter if you have no access to the ACM.




Array of DocRepresentation is a representation of document, it contains only 
lexemes from both tsvector and tsquery, and lexemes are ordered by position - 
as in original doc. Each DocRepresentation has links to corresponding 
QueryOperand   to optimize query execution while extent search. When we 
enlarge current extent for one word then we set istrue flag for corresponding 
QueryOperand and execution tsquery from cover becomes very simple task.


It's possible to eliminate istrue flag, but it's needed to implement 
algorithm to execute tsquery over continuos part of document, not over whole 
document.







Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Ts_rank internals

2007-09-10 Thread Teodor Sigaev

I tried to understand how ts_rank works, but I failed. What does Cover
function do? How does it work? What is the DocRepresentation data
structure like? I can see the definition of the struct, and the
get_docrep function to convert to that format, but by reading those I
can't figure out what the resulting DocRepresentation looks like.
I wonder if we could get rid of the istrue flag in QueryOperand, and use
a local BitmapSet variable instead? It seems wrong to have a temporary
flag that's only used in one function, in a struct that's used everywhere.

It's a play around CDR algorithms (Cover Density Ranking).

Based on paper Clarke et al., “Relevance Ranking for One to Three Term Queries.” 
  (http://citeseer.ist.psu.edu/clarke00relevance.html. Sorry, I lost the 
article itself, but may be Oleg has it. Simple and short description is placed 
at http://www2002.org/CDROM/refereed/643/node7.html.


We change original algorithm to support weight of lexeme, details are on Oleg's 
site: http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking


Array of DocRepresentation is a representation of document, it contains only 
lexemes from both tsvector and tsquery, and lexemes are ordered by position - as 
in original doc. Each DocRepresentation has links to corresponding QueryOperand 
  to optimize query execution while extent search. When we enlarge current 
extent for one word then we set istrue flag for corresponding QueryOperand and 
execution tsquery from cover becomes very simple task.


It's possible to eliminate istrue flag, but it's needed to implement algorithm 
to execute tsquery over continuos part of document, not over whole document.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend