Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2012-08-15 Thread Sushant Sinha
: Sushant Sinha sushant...@gmail.com writes: Doesn't this force the headline to be taken from the first N words of the document, independent of where the match was? That seems rather unworkable, or at least unhelpful. In headline generation function, we don't have any index or knowledge

Re: [HACKERS] TS: Limited cover density ranking

2012-01-27 Thread Sushant Sinha
The rank counts 1/coversize. So bigger covers will not have much impact anyway. What is the need of the patch? -Sushant. On Fri, 2012-01-27 at 18:06 +0200, karave...@mail.bg wrote: Hello, I have developed a variation of cover density ranking functions that counts only covers that are

[HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Sushant Sinha
I recently upgraded my postgres server from 9.0 to 9.1.2 and I am finding a peculiar problem.I have a program that periodically adds rows to this table using INSERT. Typically the number of rows is just 1-2 thousand when the table already has 500K rows. Whenever the program is adding rows, the

Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Sushant Sinha
On Mon, 2011-12-19 at 19:08 +0200, Marti Raudsepp wrote: Another thought -- have you read about the GIN fast updates feature? This existed in 9.0 too. Instead of updating the index directly, GIN appends all changes to a sequential list, which needs to be scanned in whole for read queries. The

Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Sushant Sinha
On Mon, 2011-12-19 at 12:41 -0300, Euler Taveira de Oliveira wrote: On 19-12-2011 12:30, Sushant Sinha wrote: I recently upgraded my postgres server from 9.0 to 9.1.2 and I am finding a peculiar problem.I have a program that periodically adds rows to this table using INSERT. Typically

Re: [HACKERS] lexemes in prefix search going through dictionary modifications

2011-11-08 Thread Sushant Sinha
, 2011-10-25 at 23:45 +0530, Sushant Sinha wrote: On Tue, 2011-10-25 at 19:27 +0200, Florian Pflug wrote: Assume, for example, that the postgres mailing list archive search used tsearch (which I think it does, but I'm not sure). It'd then probably make sense to add postgres to the list

Re: [HACKERS] a tsearch issue

2011-11-06 Thread Sushant Sinha
On Fri, 2011-11-04 at 11:22 +0100, Pavel Stehule wrote: Hello I found a interesting issue when I checked a tsearch prefix searching. We use a ispell based dictionary CREATE TEXT SEARCH DICTIONARY cspell (template=ispell, dictfile = czech, afffile=czech, stopwords=czech); CREATE TEXT

[HACKERS] lexemes in prefix search going through dictionary modifications

2011-10-25 Thread Sushant Sinha
I am currently using the prefix search feature in text search. I find that the prefix characters are treated the same as a normal lexeme and passed through stemming and stopword dictionaries. This seems like a bug to me. db=# select to_tsquery('english', 's:*'); NOTICE: text-search query

Re: [HACKERS] lexemes in prefix search going through dictionary modifications

2011-10-25 Thread Sushant Sinha
On Tue, 2011-10-25 at 18:05 +0200, Florian Pflug wrote: On Oct25, 2011, at 17:26 , Sushant Sinha wrote: I am currently using the prefix search feature in text search. I find that the prefix characters are treated the same as a normal lexeme and passed through stemming and stopword

Re: [HACKERS] lexemes in prefix search going through dictionary modifications

2011-10-25 Thread Sushant Sinha
On Tue, 2011-10-25 at 19:27 +0200, Florian Pflug wrote: Assume, for example, that the postgres mailing list archive search used tsearch (which I think it does, but I'm not sure). It'd then probably make sense to add postgres to the list of stopwords, because it's bound to appear in nearly

[HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Sushant Sinha
Given a document and a query, the goal of headline generation is to produce text excerpts in which the query appears. Currently the headline generation in postgres follows the following steps: 1. Tokenize the documents and obtain the lexemes 2. Decide on lexemes that should be the part of the

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Sushant Sinha
Here is a simple patch that limits the number of words during the tokenization phase and puts an upper-bound on the headline generation. Doesn't this force the headline to be taken from the first N words of the document, independent of where the match was? That seems rather unworkable,

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Sushant Sinha
Actually, this code seems probably flat-out wrong: won't every successful call of hlCover() on a given document return exactly the same q value (end position), namely the last token occurrence in the document? How is that helpful? regards, tom lane There is a line

[HACKERS] PL/Python: No stack trace for an exception

2011-07-21 Thread Sushant Sinha
I am using plpythonu on postgres 9.0.2. One of my python functions was throwing a TypeError exception. However, I only see the exception in the database and not the stack trace. It becomes difficult to debug if the stack trace is absent in Python. logdb=# select get_words(forminput) from fi;

Re: [HACKERS] PL/Python: No stack trace for an exception

2011-07-21 Thread Sushant Sinha
On Thu, 2011-07-21 at 15:31 +0200, Jan Urbański wrote: On 21/07/11 15:27, Sushant Sinha wrote: I am using plpythonu on postgres 9.0.2. One of my python functions was throwing a TypeError exception. However, I only see the exception in the database and not the stack trace. It becomes

[HACKERS] pg_trgm: unicode string not working

2011-06-12 Thread Sushant Sinha
I am using pg_trgm for spelling correction as prescribed in the documentation. But I see that it does not work for unicode sring. The database was initialized with utf8 encoding and the C locale. Here is the table: \d words Table public.words Column | Type | Modifiers

Re: [HACKERS] tsearch Parser Hacking

2011-02-14 Thread Sushant Sinha
I agree that it will be a good idea to rewrite the entire thing. However, in the mean time, I sent a proposal earlier http://archives.postgresql.org/pgsql-hackers/2010-08/msg00019.php And a patch later: http://archives.postgresql.org/pgsql-hackers/2010-09/msg00476.php Tom asked me to look into

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2011-01-06 Thread Sushant Sinha
Do not know if this mail got lost in between or no one noticed it! On Thu, 2010-12-23 at 11:05 +0530, Sushant Sinha wrote: Just a reminder that this patch is discussing how to break url, emails etc into its components. On Mon, Oct 4, 2010 at 3:54 AM, Tom Lane t...@sss.pgh.pa.us wrote

Re: [HACKERS] bug in ts_rank_cd

2010-12-22 Thread Sushant Sinha
Sorry for sounding the false alarm. I was not running the vanilla postgres and that is why I was seeing that problem. Should have checked with the vanilla one. -Sushant On Tue, 2010-12-21 at 23:03 -0500, Tom Lane wrote: Sushant Sinha sushant...@gmail.com writes: There is a bug in ts_rank_cd

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-12-22 Thread Sushant Sinha
Just a reminder that this patch is discussing how to break url, emails etc into its components. On Mon, Oct 4, 2010 at 3:54 AM, Tom Lane t...@sss.pgh.pa.us wrote: [ sorry for not responding on this sooner, it's been hectic the last couple weeks ] Sushant Sinha sushant...@gmail.com writes

[HACKERS] bug in ts_rank_cd

2010-12-21 Thread Sushant Sinha
There is a bug in ts_rank_cd. It does not correctly give rank when the query lexeme is the first one in the tsvector. Example: select ts_rank_cd(to_tsvector('english', 'abc sdd'), plainto_tsquery('english', 'abc')); ts_rank_cd 0 select

[HACKERS] bug in ts_rank_cd

2010-12-21 Thread Sushant Sinha
MY PREV EMAIL HAD A PROBLEM. Please reply to this one == There is a bug in ts_rank_cd. It does not correctly give rank when the query lexeme is the first one in the tsvector. Example: select ts_rank_cd(to_tsvector('english', 'abc sdd'),

[HACKERS] planner row-estimates for tsvector seems horribly wrong

2010-10-24 Thread Sushant Sinha
I am using gin index on a tsvector and doing basic search. I see the row-estimate of the planner to be horribly wrong. It is returning row-estimate as 4843 for all queries whether it matches zero rows, a medium number of rows (88,000) or a large number of rows (726,000). The table has roughly a

Re: [HACKERS] planner row-estimates for tsvector seems horribly wrong

2010-10-24 Thread Sushant Sinha
: On 24/10/10 14:44, Sushant Sinha wrote: I am using gin index on a tsvector and doing basic search. I see the row-estimate of the planner to be horribly wrong. It is returning row-estimate as 4843 for all queries whether it matches zero rows, a medium number of rows (88,000) or a large number

Re: [HACKERS] Re: [GENERAL] Text search parser's treatment of URLs and emails

2010-10-12 Thread Sushant Sinha
On Tue, 2010-10-12 at 19:31 -0400, Tom Lane wrote: This seems much of a piece with the existing proposal to allow individual words of a URL to be reported separately: https://commitfest.postgresql.org/action/patch_view?id=378 As I said in that thread, this could be done in a

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-28 Thread Sushant Sinha
Any updates on this? On Tue, Sep 21, 2010 at 10:47 PM, Sushant Sinha sushant...@gmail.comwrote: I looked at this patch a bit. I'm fairly unhappy that it seems to be inventing a brand new mechanism to do something the ts parser can already do. Why didn't you code the url-part mechanism

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-21 Thread Sushant Sinha
I looked at this patch a bit. I'm fairly unhappy that it seems to be inventing a brand new mechanism to do something the ts parser can already do. Why didn't you code the url-part mechanism using the existing support for compound words? I am not familiar with compound word implementation

Re: [HACKERS] Configuring Text Search parser?

2010-09-21 Thread Sushant Sinha
Your changes are somewhat fine. It will get you tokens with _ characters in it. However, it is not nice to mix your new token with existing token like NUMWORD. Give a new name to your new type of token .. probably UnderscoreWord. Then on seeing _, move to a state that can identify the new token.

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-08 Thread Sushant Sinha
For the headline generation to work properly, email/file/url/host need to become skip tokens. Updating the patch with that change. -Sushant. On Sat, 2010-09-04 at 13:25 +0530, Sushant Sinha wrote: Updating the patch with emitting parttoken and registering it with snowball config. -Sushant

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-04 Thread Sushant Sinha
Updating the patch with emitting parttoken and registering it with snowball config. -Sushant. On Fri, 2010-09-03 at 09:44 -0400, Robert Haas wrote: On Wed, Sep 1, 2010 at 2:42 AM, Sushant Sinha sushant...@gmail.com wrote: I have attached a patch that emits parts of a host token, a url token

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-01 Thread Sushant Sinha
complicate the patch with that, I wanted to get feedback on any other major problem with the patch. -Sushant. On Mon, 2010-08-02 at 10:20 -0400, Tom Lane wrote: Sushant Sinha sushant...@gmail.com writes: This would needlessly increase the number of tokens. Instead you'd better make it work like

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Sushant Sinha
On 08/01/2010 08:04 PM, Sushant Sinha wrote: 1. We do not have separate tokens wikipedia and org 2. If we have the two tokens we should have them at adjacent position so that a phrase search for wikipedia org should work. This would needlessly increase the number of tokens. Instead you'd

Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-08-02 Thread Sushant Sinha
On Mon, 2010-08-02 at 09:32 -0400, Robert Haas wrote: On Mon, Aug 2, 2010 at 9:12 AM, Sushant Sinha sushant...@gmail.com wrote: The current text parser already returns url and url_path. That already increases the number of unique tokens. I am only asking for adding of normal english words

[HACKERS] english parser in text search: support for multiple words in the same position

2010-08-01 Thread Sushant Sinha
Currently the english parser in text search does not support multiple words in the same position. Consider a word wikipedia.org. The text search would return a single token wikipedia.org. However if someone searches for wikipedia org then there will not be a match. There are two problems here: 1.

[HACKERS] lexeme ordering in tsvector

2009-11-30 Thread Sushant Sinha
It seems like the ordering of lexemes in tsvector has changed from 8.3 to 8.4. For example in 8.3.1, postgres=# select to_tsvector('english', 'quit everytime'); to_tsvector --- 'quit':1 'everytim':2 The lexemes are arranged by length and then by string

Re: [HACKERS] Very bad FTS performance with the Polish config

2009-11-18 Thread Sushant Sinha
ts_headline calls ts_lexize equivalent to break the text. Off course there is algorithm to process the tokens and generate the headline. I would be really surprised if the algorithm to generate the headline is somehow dependent on language (as it only processes the tokens). So Oleg is right when

Re: [HACKERS] dot to be considered as a word delimiter?

2009-06-02 Thread Sushant Sinha
Thanks, Sushant. On Tue, Jun 2, 2009 at 8:47 AM, Kenneth Marshall k...@rice.edu wrote: On Mon, Jun 01, 2009 at 08:22:23PM -0500, Kevin Grittner wrote: Sushant Sinha sushant...@gmail.com wrote: I think that dot should be considered by as a word delimiter because when dot is not followed

Re: [HACKERS] It's June 1; do you know where your release is?

2009-06-02 Thread Sushant Sinha
On Tue, 2009-06-02 at 17:26 -0700, Josh Berkus wrote: * possible bug in cover density ranking? -- From Teodor's response, this is maybe a doc patch and not a code patch. Teodor? Oleg? I personally think that this is a bug, because we are assigning very high rank when we are not

[HACKERS] dot to be considered as a word delimiter?

2009-05-30 Thread Sushant Sinha
Currently it seems like that dot is not considered as a word delimiter by the english parser. lawdb=# select to_tsvector('english', 'Mr.J.Sai Deepak'); to_tsvector - 'deepak':2 'mr.j.sai':1 (1 row) So the word obtained is mr.j.sai rather than three words

Re: [HACKERS] possible bug in cover density ranking?

2009-05-01 Thread Sushant Sinha
I see this as open items here http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items Any interest in fixing this? -Sushant. On Thu, 2009-01-29 at 13:54 -0500, Sushant Sinha wrote: On Thu, Jan 29, 2009 at 12:38 PM, Teodor Sigaev teo...@sigaev.ru wrote: Is this what

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2009-04-13 Thread Sushant Sinha
:57 -0400, Tom Lane wrote: Sushant Sinha sushant...@gmail.com writes: Sorry for the delay. Here is the patch with FragmentDelimiter option. It requires an extra option in HeadlineParsedText and uses that option during generateHeadline. I did some editing of the documentation for this patch

[HACKERS] patch for space around the FragmentDelimiter

2009-03-01 Thread Sushant Sinha
FragmentDelimiter is an argument for ts_headline function to separates different headline fragments. The default delimiter is ... . Currently if someone specifies the delimiter as an option to the function, no extra space is added around the delimiter. However, it does not look good without space

Re: [HACKERS] patch for space around the FragmentDelimiter

2009-03-01 Thread Sushant Sinha
yeah you are right. I did not know that you can pass space using double quotes. -Sushant. On Sun, 2009-03-01 at 20:49 -0500, Tom Lane wrote: Sushant Sinha sushant...@gmail.com writes: FragmentDelimiter is an argument for ts_headline function to separates different headline fragments

Re: [HACKERS] Ellipses around result fragment of ts_headline

2009-02-14 Thread Sushant Sinha
I think we currently do that. We add ellipses only when we encounter a new fragment. So there should not be ellipses if we are at the end of the document or if that is the first fragment (includes the beginning of the document). Here is the code in generateHeadline, ts_parse.c that adds the

Re: [HACKERS] Ellipses around result fragment of ts_headline

2009-02-14 Thread Sushant Sinha
the fragments. I hope that you're correct and that it is implemented, and not documented -Original Message- From: Sushant Sinha [mailto:sushant...@gmail.com] Sent: Saturday, February 14, 2009 4:07 PM To: Asher Snyder Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Ellipses around

Re: [HACKERS] Ellipses around result fragment of ts_headline

2009-02-14 Thread Sushant Sinha
Sorry ... I thought you were running the development branch. -Sushant. On Sat, 2009-02-14 at 16:34 -0500, Tom Lane wrote: Sushant Sinha sushant...@gmail.com writes: I think we currently do that. ... since about four months ago. 2008-10-17 14:05 teodor * doc/src/sgml

Re: [HACKERS] possible bug in cover density ranking?

2009-01-29 Thread Sushant Sinha
On Thu, Jan 29, 2009 at 12:38 PM, Teodor Sigaev teo...@sigaev.ru wrote: Is this what is desired? It seems to me that Wdoc is getting a high ranking even when we are not sure of the position information. 0.1 is not very high rank, and we could not suggest any reasonable rank in this case.

[HACKERS] possible bug in cover density ranking?

2009-01-28 Thread Sushant Sinha
I am running postgres 8.3.1. In tsrank.c I am looking at the cover density function used for ranking while doing text search: float4 calc_rank_cd(float4 *arrdata, TSVector txt, TSQuery query, int method) Here is the excerpt of code that I think may possibly have bug when document is big enough

Re: [HACKERS] text search patch status update?

2009-01-07 Thread Sushant Sinha
. ;-) --- Heikki Linnakangas wrote: Sushant Sinha wrote: Patch #2. I think this is a straigt forward bug fix. Yes, I think you're right. In hlCover(), *q is 0 when the only match is the first item in the text, and we shouldn't bail out with return false in that case

[HACKERS] text search patch status update?

2008-09-16 Thread Sushant Sinha
Any status updates on the following patches? 1. Fragments in tsearch2 headlines: http://archives.postgresql.org/pgsql-hackers/2008-08/msg00043.php 2. Bug in hlCover: http://archives.postgresql.org/pgsql-hackers/2008-08/msg00089.php -Sushant. -- Sent via pgsql-hackers mailing list

Re: [HACKERS] text search patch status update?

2008-09-16 Thread Sushant Sinha
PROTECTED] wrote: Sushant Sinha escribió: Any status updates on the following patches? 1. Fragments in tsearch2 headlines: http://archives.postgresql.org/pgsql-hackers/2008-08/msg00043.php 2. Bug in hlCover: http://archives.postgresql.org/pgsql-hackers/2008-08/msg00089.php

Re: [HACKERS] small bug in hlCover

2008-08-03 Thread Sushant Sinha
Has any one noticed this? -Sushant. On Wed, 2008-07-16 at 23:01 -0400, Sushant Sinha wrote: I think there is a slight bug in hlCover function in wparser_def.c If there is only one query item and that is the first word in the text, then hlCover does not returns any cover. This is evident

Re: [HACKERS] small bug in hlCover

2008-08-03 Thread Sushant Sinha
On Mon, 2008-08-04 at 00:36 -0300, Euler Taveira de Oliveira wrote: Sushant Sinha escreveu: I think there is a slight bug in hlCover function in wparser_def.c The bug is not in the hlCover. In prsd_headline, if we didn't find a suitable bestlen (i.e. = 0), than it includes up to document

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-08-02 Thread Sushant Sinha
file that tests different aspects of the new headline generation function. Let me know if anything else is needed. -Sushant. On Thu, 2008-07-24 at 00:28 +0400, Oleg Bartunov wrote: On Wed, 23 Jul 2008, Sushant Sinha wrote: I guess it is more readable to add cover separator at the end

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-23 Thread Sushant Sinha
I guess it is more readable to add cover separator at the end of a fragment than in the front. Let me know what you think and I can update it. I think the right place for cover separator is in the structure HeadlineParsedText just like startsel and stopsel. This will enable users to specify their

Re: [HACKERS] phrase search

2008-07-18 Thread Sushant Sinha
I looked at query operators for tsquery and here are some of the new query operators for position based queries. I am just proposing some changes and the questions I have. 1. What is the meaning of such a query operator? foo #5 bar - true if the document has word foo followed by bar at 5th

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-17 Thread Sushant Sinha
, Sushant Sinha wrote: I will add test queries and their results for the corner cases in a separate file. I guess the only thing I am confused about is what should be the behavior of headline generation when Query items have words of size less than ShortWord. I guess the answer is to ignore

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-16 Thread Sushant Sinha
'::tsquery,'maxfragments=2'); ts_headline -- ... 2 ... and so on Oleg On Tue, 15 Jul 2008, Sushant Sinha wrote: Attached a new patch that: 1. fixes previous bug 2. better handles the case when cover size is greater than the MaxWords. Basically it divides

[HACKERS] small bug in hlCover

2008-07-16 Thread Sushant Sinha
I think there is a slight bug in hlCover function in wparser_def.c If there is only one query item and that is the first word in the text, then hlCover does not returns any cover. This is evident in this example when ts_headline only generates the min_words: testdb=# select ts_headline('1 2 3 4

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-15 Thread Sushant Sinha
attached are two patches: 1. documentation 2. regression tests for headline with fragments. -Sushant. On Tue, 2008-07-15 at 13:29 +0400, Teodor Sigaev wrote: Attached a new patch that: 1. fixes previous bug 2. better handles the case when cover size is greater than the MaxWords.

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-14 Thread Sushant Sinha
Attached a new patch that: 1. fixes previous bug 2. better handles the case when cover size is greater than the MaxWords. Basically it divides a cover greater than MaxWords into fragments of MaxWords, resizes each such fragment so that each end of the fragment contains a query word and then

[HACKERS] initdb in current cvs head broken?

2008-07-10 Thread Sushant Sinha
I am trying to generate a patch with respect to the current CVS head. So ai rsynced the tree, then did cvs up and installed the db. However, when I did initdb on a data directory it is stuck: It is stuck after printing creating template1 creating template1 database in /home/postgres/data/base/1

Re: [HACKERS] initdb in current cvs head broken?

2008-07-10 Thread Sushant Sinha
You are right. I did not do make clean last time. After make clean, make all, and make install it works fine. -Sushant. On Thu, 2008-07-10 at 17:55 +0530, Pavan Deolasee wrote: On Thu, Jul 10, 2008 at 5:36 PM, Sushant Sinha [EMAIL PROTECTED] wrote: Seems like a bug to me. Is the tree

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-21 Thread Sushant Sinha
I have an attached an updated patch with following changes: 1. Respects ShortWord and MinWords 2. Uses hlCover instead of Cover 3. Does not store norm (or lexeme) for headline marking 4. Removes ts_rank.h 5. Earlier it was counting even NONWORDTOKEN in the headline. Now it only counts the actual

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-03 Thread Sushant Sinha
My main argument for using Cover instead of hlCover was that Cover will be faster. I tested the default headline generation that uses hlCover with the current patch that uses Cover. There was not much difference. So I think you are right in that we do not need norms and we can just use hlCover. I

Re: [HACKERS] phrase search

2008-06-03 Thread Sushant Sinha
On Tue, 2008-06-03 at 22:16 +0400, Teodor Sigaev wrote: This is far more complicated than I thought. Of course, phrase search should be able to use indexes. I can probably look into how to use index. Any pointers on this? src/backend/utils/adt/tsginidx.c, if you invent operation # in

Re: [HACKERS] phrase search

2008-06-02 Thread Sushant Sinha
On Mon, 2008-06-02 at 19:39 +0400, Teodor Sigaev wrote: I have attached a patch for phrase search with respect to the cvs head. Basically it takes a a phrase (text) and a TSVector. It checks if the relative positions of lexeme in the phrase are same as in their positions in TSVector.

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-06-02 Thread Sushant Sinha
Efficiency: I realized that we do not need to store all norms. We need to only store store norms that are in the query. So I moved the addition of norms from addHLParsedLex to hlfinditem. This should add very little memory overhead to existing headline generation. If this is still not acceptable

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-05-31 Thread Sushant Sinha
I have attached a new patch with respect to the current cvs head. This produces headline in a document for a given query. Basically it identifies fragments of text that contain the query and displays them. DESCRIPTION HeadlineParsedText contains an array of actual words but not information

[HACKERS] phrase search

2008-05-31 Thread Sushant Sinha
I have attached a patch for phrase search with respect to the cvs head. Basically it takes a a phrase (text) and a TSVector. It checks if the relative positions of lexeme in the phrase are same as in their positions in TSVector. If the configuration for text search is simple, then this will

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-05-24 Thread Sushant Sinha
to pass TSVector to headline function? -Sushant. On Sat, 2008-05-24 at 07:57 +0400, Teodor Sigaev wrote: [moved to -hackers, because talk is about implementation details] I've ported the patch of Sushant Sinha for fragmented headlines to pg8.3.1 (http://archives.postgresql.org/pgsql-general/2007