On Thursday 10 December 2009 19:10:24 Kevin Grittner wrote:
> I wrote:
> > Thanks for the sample which shows the difference.
> 
> Ah, once I got on the right track, there is no problem seeing
> dramatic improvements with the patch.  It changes some nasty O(N^2)
> cases to O(N).  In particular, the fixes affect parsing of large
> strings encoded with multi-byte character encodings and containing
> email addresses or URLs with a non-IP-address host component.  It
> strikes me as odd that URLs without a slash following the host
> portion, or with an IP address, are treated so differently in the
> parser, but if we want to address that, it's a matter for another
> patch.
Same here. Generally I do have to say that I dont find that parser very nice - 
and actually I think its not very efficient as well because it searches a big 
part of the search space all the time.
I think a generated parser would be more efficient and way much easier to 
read...

> I'm inclined to think that the minimal differences found in some of
> my tests probably have more to do with happenstance of code
> alignment than the particulars of the patch.
Yes, I think that as well.

> I did find one significant (although easily solved) problem.  In the
> patch, the recursive setup of usewide, pgwstr, and wstr are not
> conditioned by #ifdef USE_WIDE_UPPER_LOWER in the non-patched
> version.  Unless there's a good reason for that, the #ifdef should
> be added.
Uh. Sorry. Fixed.

> Less critical, but worth fixing one way or the other, TParserClose
> does not drop breadcrumbs conditioned on #ifdef WPARSER_TRACE, but
> TParserCopyClose does.  I think this should be consistent.
Actually there was *some* thinking behind that: The end of parsing is quite 
obvious (the call returns, and its so verbose you will never do more than one 
call anyway) - but knowing where the copy ends is quite important to 
understand the parse flow.
I added a log there because in the end that line is not going to make any 
difference ;-)

> Sorry for the delay in review.  I hope there's still time to get
> this committed in this CF.
Thanks for your reviewing!

Actually I dont mind very much if it gets delayed or not. Its trivial enough 
that it shouldnt cause much work/conflicts/whatever next round and I am running 
patched versions anyway, so ...



Andres
From e01bac641f318b378c4353aa6ccebc76b3071166 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Thu, 10 Dec 2009 19:54:22 +0100
Subject: [PATCH] Fix TSearch inefficiency because of repeated copying of strings

---
 src/backend/tsearch/wparser_def.c |   69 ++++++++++++++++++++++++++++++++++--
 1 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 72b435c..46e86ee 100644
*** a/src/backend/tsearch/wparser_def.c
--- b/src/backend/tsearch/wparser_def.c
*************** TParserInit(char *str, int len)
*** 328,333 ****
--- 328,371 ----
  	return prs;
  }
  
+ /*
+  * As an alternative to a full TParserInit one can create a
+  * TParserCopy which basically is a normally TParser without a private
+  * copy of the string - instead it uses the one from another TParser.
+  * This is useful because at some places TParsers are created
+  * recursively and the repeated copying around of the strings can
+  * cause major inefficiency.
+  * Obviously one may not close the original TParser before the copy.
+  */
+ static TParser *
+ TParserCopyInit(const TParser const* orig)
+ {
+ 	TParser    *prs = (TParser *) palloc0(sizeof(TParser));
+ 
+ 	prs->charmaxlen = orig->charmaxlen;
+ 	prs->str = orig->str + orig->state->posbyte;
+ 	prs->lenstr = orig->lenstr - orig->state->posbyte;
+ 
+ #ifdef USE_WIDE_UPPER_LOWER
+ 	prs->usewide = orig->usewide;
+ 
+ 	if(orig->pgwstr)
+ 		prs->pgwstr = orig->pgwstr + orig->state->poschar;
+ 	if(orig->wstr)
+ 		prs->wstr = orig->wstr + orig->state->poschar;
+ #endif
+ 
+ 	prs->state = newTParserPosition(NULL);
+ 	prs->state->state = TPS_Base;
+ 
+ #ifdef WPARSER_TRACE
+ 	fprintf(stderr, "parsing copy of \"%.*s\"\n", len, str);
+ #endif
+ 
+ 	return prs;
+ }
+ 
+ 
  static void
  TParserClose(TParser *prs)
  {
*************** TParserClose(TParser *prs)
*** 346,354 ****
--- 384,415 ----
  		pfree(prs->pgwstr);
  #endif
  
+ #ifdef WPARSER_TRACE
+ 	fprintf(stderr, "closing parser");
+ #endif
+ 	pfree(prs);
+ }
+ 
+ /*
+  * See TParserCopyInit
+  */
+ static void
+ TParserCopyClose(TParser *prs)
+ {
+ 	while (prs->state)
+ 	{
+ 		TParserPosition *ptr = prs->state->prev;
+ 
+ 		pfree(prs->state);
+ 		prs->state = ptr;
+ 	}
+ #ifdef WPARSER_TRACE
+ 	fprintf(stderr, "closing parser copy");
+ #endif
  	pfree(prs);
  }
  
+ 
  /*
   * Character-type support functions, equivalent to is* macros, but
   * working with any possible encodings and locales. Notes:
*************** p_isignore(TParser *prs)
*** 617,623 ****
  static int
  p_ishost(TParser *prs)
  {
! 	TParser    *tmpprs = TParserInit(prs->str + prs->state->posbyte, prs->lenstr - prs->state->posbyte);
  	int			res = 0;
  
  	tmpprs->wanthost = true;
--- 678,684 ----
  static int
  p_ishost(TParser *prs)
  {
! 	TParser *tmpprs = TParserCopyInit(prs);
  	int			res = 0;
  
  	tmpprs->wanthost = true;
*************** p_ishost(TParser *prs)
*** 631,637 ****
  		prs->state->charlen = tmpprs->state->charlen;
  		res = 1;
  	}
! 	TParserClose(tmpprs);
  
  	return res;
  }
--- 692,698 ----
  		prs->state->charlen = tmpprs->state->charlen;
  		res = 1;
  	}
! 	TParserCopyClose(tmpprs);
  
  	return res;
  }
*************** p_ishost(TParser *prs)
*** 639,645 ****
  static int
  p_isURLPath(TParser *prs)
  {
! 	TParser    *tmpprs = TParserInit(prs->str + prs->state->posbyte, prs->lenstr - prs->state->posbyte);
  	int			res = 0;
  
  	tmpprs->state = newTParserPosition(tmpprs->state);
--- 700,706 ----
  static int
  p_isURLPath(TParser *prs)
  {
! 	TParser *tmpprs = TParserCopyInit(prs);
  	int			res = 0;
  
  	tmpprs->state = newTParserPosition(tmpprs->state);
*************** p_isURLPath(TParser *prs)
*** 654,660 ****
  		prs->state->charlen = tmpprs->state->charlen;
  		res = 1;
  	}
! 	TParserClose(tmpprs);
  
  	return res;
  }
--- 715,721 ----
  		prs->state->charlen = tmpprs->state->charlen;
  		res = 1;
  	}
! 	TParserCopyClose(tmpprs);
  
  	return res;
  }
-- 
1.6.5.12.gd65df24

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to