I have applied the attached documentation patch to show ts_headline() using a configuration name.
--------------------------------------------------------------------------- Oleg Bartunov wrote: > On Sat, 23 Feb 2008, Stephen Davies wrote: > > > As it turns out, all I needed was in the doco but the key element - the > > first > > config arg to ts_headline - was not in any of the examples so I missed it. > > aha, Original one were based on default > configuration, but then concept was changed, but the examples were not > modified. > > > > > Would it be possible for ts_headline to work with the pre-parsed ts_vector? > > it's impossible, Richard already explained you the reasons. > > > > > I see references to future plans for phrase searching in ts. Is there a date > > for this? > > Not yet. The problem mostly algebraical :) Simple 'exact search' is doable, > but > we need something more, since we support boolean operators, > pluggable dictionaries (which could produce several lexemes, for example), > and document structure (lexem weights). So, we need to define consistent > algebra for text, to have predictable results. This is quite a complex task, > which require a lot of dedicated time, which we don't have. > > > > > Cheers and thanks, > > Stephen > > Davies > > > > > > On Friday 22 February 2008 22:54, Oleg Bartunov wrote: > >> On Fri, 22 Feb 2008, Stephen Davies wrote: > >>> Hmmmm! > >>> I think I now understand the ts position better, thank you. > >>> > >>> Part of my problem has been that I am used to the functionality of Open > >>> Text's LCS (aka BASIS) product which handles text differently. > >>> > >>> It includes the position (and context) information in the index and does > >>> "remember" how the text was parsed so does not need to reparse to insert > >>> hit navigation tags nor need pointers as to how to parse queries. (It > >>> also supports phrase searching.) > >>> > >>> Now that I have a better understanding of ts, I think I will be able to > >>> make it do at least most of what I hoped for. > >> > >> I'm wondering if it was not described in the text search documentation :) > >> > >>> Thank you again for your help with this. > >>> > >>> Cheers, > >>> Stephen Davies > >>> > >>> On Friday 22 February 2008 20:45, Richard Huxton wrote: > >>>> Stephen Davies wrote: > >>>>> Unfortunately, my link to the box with the test database is down due to > >>>>> lack of maintenance by our local telco (Telstra) but I think that I > >>>>> also missed the optional config arg to ts_headline. > >>>>> > >>>>> The lack of link also means that I cannot confirm your findings but > >>>>> your logic looks good. > >>>> > >>>> Looks like ALTER DATABASE SET default_text_config='english' is what you > >>>> need. > >>>> > >>>>> It begs the question, however, as to why ts-headline needs to reparse > >>>>> the raw text. > >>>> > >>>> It needs to line up tsvector lexemes with actual characters in the text. > >>>> The tsvector is missing punctuation, any stopwords (the, it, a) as well > >>>> as being stemmed (if your dictionary does that). > >>>> > >>>> Also, it's looking for a short span of words that provide the best > >>>> match. That might not be a complete match of course, and is different to > >>>> how you'd normally look to use a tsvector. > >>>> > >>>>> At least in my case, I am using a trigger to parse the combination of > >>>>> Title and Abstract to a ts_vector field in the table row (as suggested > >>>>> in 12.2.2 and 12.4.3 in the doco) so that the ts_vector is already > >>>>> available to ts_headline. > >>>>> > >>>>> If ts_headline had the ability to use that pre-parsed ts_vector, my > >>>>> problem would never have arisen - and the performance of ts_headline > >>>>> would be improved. > >>>> > >>>> Maybe. It would still have to parse the text to some degree though, just > >>>> to get the original words & punctuation into the headline. > >> > >> Regards, > >> Oleg > >> _____________________________________________________________ > >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > >> Sternberg Astronomical Institute, Moscow University, Russia > >> Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ > >> phone: +007(495)939-16-83, +007(495)939-23-83 > > > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/textsearch.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/textsearch.sgml,v retrieving revision 1.40 diff -c -c -r1.40 textsearch.sgml *** doc/src/sgml/textsearch.sgml 13 Dec 2007 06:32:47 -0000 1.40 --- doc/src/sgml/textsearch.sgml 4 Mar 2008 02:55:17 -0000 *************** *** 1102,1108 **** For example: <programlisting> ! SELECT ts_headline('The most common type of search is to find all documents containing given query terms and return them in order of their similarity to the query.', to_tsquery('query & similarity')); --- 1102,1108 ---- For example: <programlisting> ! SELECT ts_headline('english', 'The most common type of search is to find all documents containing given query terms and return them in order of their similarity to the query.', to_tsquery('query & similarity')); *************** *** 1112,1118 **** and return them in order of their <b>similarity</b> to the <b>query</b>. ! SELECT ts_headline('The most common type of search is to find all documents containing given query terms and return them in order of their similarity to the query.', --- 1112,1118 ---- and return them in order of their <b>similarity</b> to the <b>query</b>. ! SELECT ts_headline('english', 'The most common type of search is to find all documents containing given query terms and return them in order of their similarity to the query.',
-- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your Subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-patches