Re: Missing CFI in hlCover()?

2020-07-31 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Here's a proposed patch along that line.
> 
> > I've back-patched this to 11 (which was just a bit of fuzz) and tested
> > it out with a couple of different queries that were causing issues
> > previously on the archive server, and they finish in a much more
> > reasonable time and react faster to cancel requests/signals.
> 
> Yeah, I'd tried this locally using the data from the one test case you
> showed me, and it seemed to fix that.

Good stuff.

> > So, looks good to me, and would certainly be nice to get this into the
> > next set of releases, so the archive server doesn't get stuck anymore.
> 
> I'll push this tomorrow if nobody has objected to it.

Sounds good.

> BTW, I had noticed last night that hlFirstIndex is being unreasonably
> stupid.  Many of the "words" have null item pointers and hence can't
> possibly match any query item (I think because we have "words" for
> inter-word spaces/punctuation as well as the actual words).  Checking
> that, as in the attached v2 patch, makes things a bit faster yet.

Nice, looks good to me.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Missing CFI in hlCover()?

2020-07-30 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Here's a proposed patch along that line.

> I've back-patched this to 11 (which was just a bit of fuzz) and tested
> it out with a couple of different queries that were causing issues
> previously on the archive server, and they finish in a much more
> reasonable time and react faster to cancel requests/signals.

Yeah, I'd tried this locally using the data from the one test case you
showed me, and it seemed to fix that.

> So, looks good to me, and would certainly be nice to get this into the
> next set of releases, so the archive server doesn't get stuck anymore.

I'll push this tomorrow if nobody has objected to it.

BTW, I had noticed last night that hlFirstIndex is being unreasonably
stupid.  Many of the "words" have null item pointers and hence can't
possibly match any query item (I think because we have "words" for
inter-word spaces/punctuation as well as the actual words).  Checking
that, as in the attached v2 patch, makes things a bit faster yet.

regards, tom lane

diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 76b6f9aef0..92517e4c17 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -2011,13 +2011,18 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos)
 	for (i = pos; i < prs->curwords; i++)
 	{
 		/* ... scan the query to see if this word matches any operand */
-		QueryItem  *item = GETQUERY(query);
+		QueryOperand *worditem = prs->words[i].item;
+		QueryItem  *item;
 		int			j;
 
+		if (worditem == NULL)
+			continue;			/* certainly not a match */
+
+		item = GETQUERY(query);
 		for (j = 0; j < query->size; j++)
 		{
 			if (item->type == QI_VAL &&
-prs->words[i].item == >qoperand)
+worditem == >qoperand)
 return i;
 			item++;
 		}
@@ -2028,8 +2033,14 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos)
 /*
  * hlCover: try to find a substring of prs' word list that satisfies query
  *
- * At entry, *p must be the first word index to consider (initialize this to
- * zero, or to the next index after a previous successful search).
+ * At entry, *p must be the first word index to consider (initialize this
+ * to zero, or to the next index after a previous successful search).
+ * We will consider all substrings starting at or after that word, and
+ * containing no more than max_cover words.  (We need a length limit to
+ * keep this from taking O(N^2) time for a long document with many query
+ * words but few complete matches.  Actually, since checkcondition_HL is
+ * roughly O(N) in the length of the substring being checked, it's even
+ * worse than that.)
  *
  * On success, sets *p to first word index and *q to last word index of the
  * cover substring, and returns true.
@@ -2038,7 +2049,8 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos)
  * words used in the query.
  */
 static bool
-hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q)
+hlCover(HeadlineParsedText *prs, TSQuery query, int max_cover,
+		int *p, int *q)
 {
 	int			pmin,
 pmax,
@@ -2084,7 +2096,7 @@ hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q)
 nextpmin = nextpmax;
 			pmax = nextpmax;
 		}
-		while (pmax >= 0);
+		while (pmax >= 0 && pmax - pmin < max_cover);
 		/* No luck here, so try next feasible startpoint */
 		pmin = nextpmin;
 	}
@@ -2186,7 +2198,7 @@ get_next_fragment(HeadlineParsedText *prs, int *startpos, int *endpos,
 static void
 mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
   int shortword, int min_words,
-  int max_words, int max_fragments)
+  int max_words, int max_fragments, int max_cover)
 {
 	int32		poslen,
 curlen,
@@ -2213,7 +2225,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
 	covers = palloc(maxcovers * sizeof(CoverPos));
 
 	/* get all covers */
-	while (hlCover(prs, query, , ))
+	while (hlCover(prs, query, max_cover, , ))
 	{
 		startpos = p;
 		endpos = q;
@@ -2368,7 +2380,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
  */
 static void
 mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall,
-			  int shortword, int min_words, int max_words)
+			  int shortword, int min_words, int max_words, int max_cover)
 {
 	int			p = 0,
 q = 0;
@@ -2386,7 +2398,7 @@ mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall,
 	if (!highlightall)
 	{
 		/* examine all covers, select a headline using the best one */
-		while (hlCover(prs, query, , ))
+		while (hlCover(prs, query, max_cover, , ))
 		{
 			/*
 			 * Count words (curlen) and interesting words (poslen) within
@@ -2542,6 +2554,7 @@ prsd_headline(PG_FUNCTION_ARGS)
 	int			shortword = 3;
 	int			max_fragments = 0;
 	bool		highlightall = false;
+	int			max_cover;
 	ListCell   *l;
 
 	/* Extract configuration option values */
@@ -2581,6 +2594,15 @@ 

Re: Missing CFI in hlCover()?

2020-07-30 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Stephen Frost  writes:
> >>> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>  We could hard-code a rule like that, or we could introduce a new
>  explicit parameter for the maximum cover length.  The latter would be
>  more flexible, but we need something back-patchable and I'm concerned
>  about the compatibility hazards of adding a new parameter in minor
>  releases.  So on the whole I propose hard-wiring a multiplier of,
>  say, 10 for both these cases.
> 
> >>> That sounds alright to me, though I do think we should probably still
> >>> toss a CFI (or two) in this path somewhere as we don't know how long
> >>> some of these functions might take...
> 
> >> Yeah, of course.  I'm still leaning to doing that in TS_execute_recurse.
> 
> > Works for me.
> 
> Here's a proposed patch along that line.

I've back-patched this to 11 (which was just a bit of fuzz) and tested
it out with a couple of different queries that were causing issues
previously on the archive server, and they finish in a much more
reasonable time and react faster to cancel requests/signals.

If you'd like to play with it more, the PG11 installed on ark2 now has
this patch built into it.

So, looks good to me, and would certainly be nice to get this into the
next set of releases, so the archive server doesn't get stuck anymore.
:)

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Missing CFI in hlCover()?

2020-07-30 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Stephen Frost  writes:
>>> * Tom Lane (t...@sss.pgh.pa.us) wrote:
 We could hard-code a rule like that, or we could introduce a new
 explicit parameter for the maximum cover length.  The latter would be
 more flexible, but we need something back-patchable and I'm concerned
 about the compatibility hazards of adding a new parameter in minor
 releases.  So on the whole I propose hard-wiring a multiplier of,
 say, 10 for both these cases.

>>> That sounds alright to me, though I do think we should probably still
>>> toss a CFI (or two) in this path somewhere as we don't know how long
>>> some of these functions might take...

>> Yeah, of course.  I'm still leaning to doing that in TS_execute_recurse.

> Works for me.

Here's a proposed patch along that line.

regards, tom lane

diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 76b6f9aef0..1df1c0362d 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -2028,8 +2028,10 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos)
 /*
  * hlCover: try to find a substring of prs' word list that satisfies query
  *
- * At entry, *p must be the first word index to consider (initialize this to
- * zero, or to the next index after a previous successful search).
+ * At entry, *p must be the first word index to consider (initialize this
+ * to zero, or to the next index after a previous successful search).
+ * We will consider all substrings starting at or after that word, and
+ * containing no more than max_cover words.
  *
  * On success, sets *p to first word index and *q to last word index of the
  * cover substring, and returns true.
@@ -2038,7 +2040,8 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos)
  * words used in the query.
  */
 static bool
-hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q)
+hlCover(HeadlineParsedText *prs, TSQuery query, int max_cover,
+		int *p, int *q)
 {
 	int			pmin,
 pmax,
@@ -2084,7 +2087,7 @@ hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q)
 nextpmin = nextpmax;
 			pmax = nextpmax;
 		}
-		while (pmax >= 0);
+		while (pmax >= 0 && pmax - pmin < max_cover);
 		/* No luck here, so try next feasible startpoint */
 		pmin = nextpmin;
 	}
@@ -2186,7 +2189,7 @@ get_next_fragment(HeadlineParsedText *prs, int *startpos, int *endpos,
 static void
 mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
   int shortword, int min_words,
-  int max_words, int max_fragments)
+  int max_words, int max_fragments, int max_cover)
 {
 	int32		poslen,
 curlen,
@@ -2213,7 +2216,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
 	covers = palloc(maxcovers * sizeof(CoverPos));
 
 	/* get all covers */
-	while (hlCover(prs, query, , ))
+	while (hlCover(prs, query, max_cover, , ))
 	{
 		startpos = p;
 		endpos = q;
@@ -2368,7 +2371,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
  */
 static void
 mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall,
-			  int shortword, int min_words, int max_words)
+			  int shortword, int min_words, int max_words, int max_cover)
 {
 	int			p = 0,
 q = 0;
@@ -2386,7 +2389,7 @@ mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall,
 	if (!highlightall)
 	{
 		/* examine all covers, select a headline using the best one */
-		while (hlCover(prs, query, , ))
+		while (hlCover(prs, query, max_cover, , ))
 		{
 			/*
 			 * Count words (curlen) and interesting words (poslen) within
@@ -2542,6 +2545,7 @@ prsd_headline(PG_FUNCTION_ARGS)
 	int			shortword = 3;
 	int			max_fragments = 0;
 	bool		highlightall = false;
+	int			max_cover;
 	ListCell   *l;
 
 	/* Extract configuration option values */
@@ -2581,6 +2585,15 @@ prsd_headline(PG_FUNCTION_ARGS)
 			defel->defname)));
 	}
 
+	/*
+	 * We might eventually make max_cover a user-settable parameter, but for
+	 * now, just compute a reasonable value based on max_words and
+	 * max_fragments.
+	 */
+	max_cover = Max(max_words * 10, 100);
+	if (max_fragments > 0)
+		max_cover *= max_fragments;
+
 	/* in HighlightAll mode these parameters are ignored */
 	if (!highlightall)
 	{
@@ -2605,10 +2618,10 @@ prsd_headline(PG_FUNCTION_ARGS)
 	/* Apply appropriate headline selector */
 	if (max_fragments == 0)
 		mark_hl_words(prs, query, highlightall, shortword,
-	  min_words, max_words);
+	  min_words, max_words, max_cover);
 	else
 		mark_hl_fragments(prs, query, highlightall, shortword,
-		  min_words, max_words, max_fragments);
+		  min_words, max_words, max_fragments, max_cover);
 
 	/* Fill in default values for string options */
 	if (!prs->startsel)
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index f01b1ee253..756a48a167 100644
--- 

Re: Missing CFI in hlCover()?

2020-07-30 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> We could hard-code a rule like that, or we could introduce a new
> >> explicit parameter for the maximum cover length.  The latter would be
> >> more flexible, but we need something back-patchable and I'm concerned
> >> about the compatibility hazards of adding a new parameter in minor
> >> releases.  So on the whole I propose hard-wiring a multiplier of,
> >> say, 10 for both these cases.
> 
> > That sounds alright to me, though I do think we should probably still
> > toss a CFI (or two) in this path somewhere as we don't know how long
> > some of these functions might take...
> 
> Yeah, of course.  I'm still leaning to doing that in TS_execute_recurse.

Works for me.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Missing CFI in hlCover()?

2020-07-30 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> We could hard-code a rule like that, or we could introduce a new
>> explicit parameter for the maximum cover length.  The latter would be
>> more flexible, but we need something back-patchable and I'm concerned
>> about the compatibility hazards of adding a new parameter in minor
>> releases.  So on the whole I propose hard-wiring a multiplier of,
>> say, 10 for both these cases.

> That sounds alright to me, though I do think we should probably still
> toss a CFI (or two) in this path somewhere as we don't know how long
> some of these functions might take...

Yeah, of course.  I'm still leaning to doing that in TS_execute_recurse.

regards, tom lane




Re: Missing CFI in hlCover()?

2020-07-30 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> We could hard-code a rule like that, or we could introduce a new
> explicit parameter for the maximum cover length.  The latter would be
> more flexible, but we need something back-patchable and I'm concerned
> about the compatibility hazards of adding a new parameter in minor
> releases.  So on the whole I propose hard-wiring a multiplier of,
> say, 10 for both these cases.

That sounds alright to me, though I do think we should probably still
toss a CFI (or two) in this path somewhere as we don't know how long
some of these functions might take...

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Missing CFI in hlCover()?

2020-07-29 Thread Tom Lane
I wrote:
> (I wonder if we need to try to make it faster.  I'd supposed that the
> loop was cheap enough to be a non-problem, but with large enough
> documents maybe not?  It seems like converting to a hash table could
> be worthwhile for a large doc.)

OK, I dug into Stephen's test case off-list.  While some CFIs would
be a good idea, that's just band-aid'ing the symptom.  The actual
problem is that hlCover() is taking way too much time.  The test case
boils down to "common_word & rare_word" matched to a very long document,
wherein the rare_word appears only near the front.  Once we're past
that match, hlCover() tries all the remaining matches for common_word,
of which there are plenty ... and for each one, it scans clear to the
end of the document, looking vainly for a substring that will satisfy
the "common_word & rare_word" query.  So obviously, this is O(N^2)
in the length of the document :-(.

I have to suppose that I introduced this problem in c9b0c678d, since
AFAIR we weren't getting ts_headline-takes-forever complaints before
that.  It's not immediately obvious why the preceding algorithm doesn't
have a similar issue, but then there's not anything at all that was
obvious about the preceding algorithm.

The most obvious way to fix the O(N^2) hazard is to put a limit on the
length of "cover" (matching substring) that we'll consider.  For the
mark_hl_words caller, we won't highlight more than max_words words
anyway, so it would be reasonable to bound covers to that length or
some small multiple of it.  The other caller mark_hl_fragments is
willing to highlight up to max_fragments of up to max_words each, and
there can be some daylight between the fragments, so it's not quite
clear what the longest reasonable match is.  Still, I doubt it's
useful to show a headline consisting of a few words from the start of
the document and a few words from thousands of words later, so a limit
of max_fragments times max_words times something would probably be
reasonable.

We could hard-code a rule like that, or we could introduce a new
explicit parameter for the maximum cover length.  The latter would be
more flexible, but we need something back-patchable and I'm concerned
about the compatibility hazards of adding a new parameter in minor
releases.  So on the whole I propose hard-wiring a multiplier of,
say, 10 for both these cases.

regards, tom lane




Re: Missing CFI in hlCover()?

2020-07-24 Thread Tom Lane
I tried to duplicate a multiple-second ts_headline call here, and
failed to, so there must be something I'm missing.  Can you provide
a concrete example?  I'd like to do some analysis with perf.

regards, tom lane




Re: Missing CFI in hlCover()?

2020-07-24 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Hm.  I'd vote for a CFI within the recursion in TS_execute(), if there's
>> not one there yet.  Maybe hlFirstIndex needs one too --- if there are
>> a lot of words in the query, maybe that could be slow?  Did you pin the
>> blame down any more precisely than hlCover?

> I've definitely seen hlFirstIndex take a few seconds to run (while
> running this under gdb and stepping through), so that could be a good
> choice to place one (perhaps even additionally to this...).  I have to
> admit to wondering if we shouldn't consider having one in
> check_stack_depth() to try and reduce the risk of us forgetting to have
> one in sensible places, though I've not really looked at all the callers
> and that might not be reasonable in some cases (though I wonder if maybe
> we consider having a 'default' version that has a CFI, and an alternate
> that doesn't...).

Adding it to check_stack_depth doesn't really seem like a reasonable
proposal to me; aside from failing to separate concerns, running a
long time is quite distinct from taking a lot of stack.

The reason I'm eyeing TS_execute is that it involves callbacks to
functions that might be pretty complex in themselves, eg during index
scans.  So that would guard a lot of territory besides hlCover.  But
hlFirstIndex could use a CFI too, if you've seen it take that long.
(I wonder if we need to try to make it faster.  I'd supposed that the
loop was cheap enough to be a non-problem, but with large enough
documents maybe not?  It seems like converting to a hash table could
be worthwhile for a large doc.)

regards, tom lane




Re: Missing CFI in hlCover()?

2020-07-24 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > I'm looking into an issue that we're seeing on the PG archives server
> > with runaway queries that don't seem to ever want to end- and ignore
> > signals.
> 
> > This is PG11, 11.8-1.pgdg100+1 specifically on Debian/buster and what
> > we're seeing is the loop in hlCover() (wparser_def.c:2071 to 2093) is
> > lasting an awful long time without any CFI call.  It's possible the CFI
> > call should actually go elsewhere, but the complete lack of any CFI in
> > wparser_def.c or tsvector_op.c seems a bit concerning.
> 
> > I'm suspicious there's something else going on here that's causing this
> > to take a long time but I don't have any smoking gun as yet.
> 
> Hm.  I'd vote for a CFI within the recursion in TS_execute(), if there's
> not one there yet.  Maybe hlFirstIndex needs one too --- if there are
> a lot of words in the query, maybe that could be slow?  Did you pin the
> blame down any more precisely than hlCover?

I've definitely seen hlFirstIndex take a few seconds to run (while
running this under gdb and stepping through), so that could be a good
choice to place one (perhaps even additionally to this...).  I have to
admit to wondering if we shouldn't consider having one in
check_stack_depth() to try and reduce the risk of us forgetting to have
one in sensible places, though I've not really looked at all the callers
and that might not be reasonable in some cases (though I wonder if maybe
we consider having a 'default' version that has a CFI, and an alternate
that doesn't...).

The depth of recursion for TS_execute_recurse() is actually not all that
bad either though (only 6 or so, as the query string here is:
"ERROR: The required file is not available"), so maybe that isn't really
the right thing to be thinking here.

Down in checkcondition_HL(), checkval->len is 213601, and it seems to
pretty much always end up with a result of TS_NO, but doesn't seem to
take all that long to arrive at that.

Over in hlFirstIndex():

hlFirstIndex (prs=0x7ffc2d4b16c0, prs=0x7ffc2d4b16c0, pos=219518, 
query=0x559619f81528) at ./build/../src/backend/tsearch/wparser_def.c:2013
2013hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos)
(gdb) n
2026if (item->type == QI_VAL &&
(gdb) 
2029item++;
(gdb) p pos
$72 = 219518
(gdb) p prs->curwords
$73 = 583766
(gdb) 

Here's a full backtrace down to the checkcondition_HL():

(gdb) i s
#0  checkcondition_HL (opaque=0x7ffc2d4b11f0, val=0x559619f815c0, data=0x0) at 
./build/../src/backend/tsearch/wparser_def.c:1981
#1  0x559617eced2b in TS_execute_recurse (curitem=0x559619f815c0, 
arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0, 
chkcond=chkcond@entry=0x559617df0120 )
at ./build/../src/backend/utils/adt/tsvector_op.c:1872  

   #2  0x559617ecedd1 in 
TS_execute_recurse (curitem=0x559619f815a8, arg=arg@entry=0x7ffc2d4b11f0, 
flags=flags@entry=0, chkcond=chkcond@entry=0x559617df0120 )
at ./build/../src/backend/utils/adt/tsvector_op.c:1892
#3  0x559617ecedd1 in TS_execute_recurse (curitem=0x559619f81590, 
arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0, 
chkcond=chkcond@entry=0x559617df0120 )   
  at ./build/../src/backend/utils/adt/tsvector_op.c:1892
#4  0x559617ecedd1 in TS_execute_recurse (curitem=0x559619f81578, 
arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0, 
chkcond=chkcond@entry=0x559617df0120 )
at ./build/../src/backend/utils/adt/tsvector_op.c:1892
#5  0x559617ecedd1 in TS_execute_recurse (curitem=0x559619f81560, 
arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0, 
chkcond=chkcond@entry=0x559617df0120 )   
  at ./build/../src/backend/utils/adt/tsvector_op.c:1892
#6  0x559617ecedd1 in TS_execute_recurse (curitem=0x559619f81548, 
arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0, 
chkcond=chkcond@entry=0x559617df0120 )
at ./build/../src/backend/utils/adt/tsvector_op.c:1892
#7  0x559617ecedd1 in TS_execute_recurse 
(curitem=curitem@entry=0x559619f81530, arg=arg@entry=0x7ffc2d4b11f0, 
flags=flags@entry=0, chkcond=chkcond@entry=0x559617df0120 )
at ./build/../src/backend/utils/adt/tsvector_op.c:1892
#8  0x559617ed26d9 in TS_execute (curitem=curitem@entry=0x559619f81530, 
arg=arg@entry=0x7ffc2d4b11f0, flags=flags@entry=0, 
chkcond=chkcond@entry=0x559617df0120 )
at ./build/../src/backend/utils/adt/tsvector_op.c:1854
#9  0x559617df041e in hlCover (prs=prs@entry=0x7ffc2d4b16c0, 
query=query@entry=0x559619f81528, p=p@entry=0x7ffc2d4b12a0, 
q=q@entry=0x7ffc2d4b12a4) at ./build/../src/backend/tsearch/wparser_def.c:2075  
  #10 0x559617df1a2d in mark_hl_words (max_words=35, min_words=15, 
shortword=3, highlightall=, query=, 
prs=0x7ffc2d4b16c0) at 

Re: Missing CFI in hlCover()?

2020-07-24 Thread Tom Lane
Stephen Frost  writes:
> I'm looking into an issue that we're seeing on the PG archives server
> with runaway queries that don't seem to ever want to end- and ignore
> signals.

> This is PG11, 11.8-1.pgdg100+1 specifically on Debian/buster and what
> we're seeing is the loop in hlCover() (wparser_def.c:2071 to 2093) is
> lasting an awful long time without any CFI call.  It's possible the CFI
> call should actually go elsewhere, but the complete lack of any CFI in
> wparser_def.c or tsvector_op.c seems a bit concerning.

> I'm suspicious there's something else going on here that's causing this
> to take a long time but I don't have any smoking gun as yet.

Hm.  I'd vote for a CFI within the recursion in TS_execute(), if there's
not one there yet.  Maybe hlFirstIndex needs one too --- if there are
a lot of words in the query, maybe that could be slow?  Did you pin the
blame down any more precisely than hlCover?

regards, tom lane