Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2014-08-06 Thread Bruce Momjian
FYI, I have kept this email from 2011 about poor performance of parsed words in headline generation. If someone wants to research it, please do so: http://www.postgresql.org/message-id/1314117620.3700.12.camel@dragflick

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2013-01-24 Thread Bruce Momjian
On Wed, Aug 15, 2012 at 11:09:18PM +0530, Sushant Sinha wrote: I will do the profiling and present the results. Sushant, do you have any profiling results on this issue from August? --- On Wed, 2012-08-15 at 12:45

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2012-08-15 Thread Bruce Momjian
Is this a TODO? --- On Tue, Aug 23, 2011 at 10:31:42PM -0400, Tom Lane wrote: Sushant Sinha sushant...@gmail.com writes: Doesn't this force the headline to be taken from the first N words of the document, independent

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2012-08-15 Thread Bruce Momjian
This might indicate that the hlCover() item is resolved. --- On Wed, Aug 24, 2011 at 10:08:11AM +0530, Sushant Sinha wrote: Actually, this code seems probably flat-out wrong: won't every successful call of

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2012-08-15 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: Is this a TODO? AFAIR nothing's been done about the speed issue, so yes. I didn't like the idea of creating a user-visible knob when the speed issue might be fixable with internal algorithm improvements, but we never followed up on this in either fashion.

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2012-08-15 Thread Sushant Sinha
I will do the profiling and present the results. On Wed, 2012-08-15 at 12:45 -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Is this a TODO? AFAIR nothing's been done about the speed issue, so yes. I didn't like the idea of creating a user-visible knob when the speed issue

[HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Sushant Sinha
Given a document and a query, the goal of headline generation is to produce text excerpts in which the query appears. Currently the headline generation in postgres follows the following steps: 1. Tokenize the documents and obtain the lexemes 2. Decide on lexemes that should be the part of the

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Tom Lane
Sushant Sinha sushant...@gmail.com writes: Given a document and a query, the goal of headline generation is to produce text excerpts in which the query appears. ... right ... Here is a simple patch that limits the number of words during the tokenization phase and puts an upper-bound on the

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mar ago 23 15:59:18 -0300 2011: Sushant Sinha sushant...@gmail.com writes: Given a document and a query, the goal of headline generation is to produce text excerpts in which the query appears. ... right ... Here is a simple patch that limits the

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Sushant Sinha
Here is a simple patch that limits the number of words during the tokenization phase and puts an upper-bound on the headline generation. Doesn't this force the headline to be taken from the first N words of the document, independent of where the match was? That seems rather unworkable,

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Tom Lane
Sushant Sinha sushant...@gmail.com writes: Doesn't this force the headline to be taken from the first N words of the document, independent of where the match was? That seems rather unworkable, or at least unhelpful. In headline generation function, we don't have any index or knowledge of

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

2011-08-23 Thread Sushant Sinha
Actually, this code seems probably flat-out wrong: won't every successful call of hlCover() on a given document return exactly the same q value (end position), namely the last token occurrence in the document? How is that helpful? regards, tom lane There is a line