On Thursday 10 December 2009 19:10:24 Kevin Grittner wrote: > I wrote: > > Thanks for the sample which shows the difference. > > Ah, once I got on the right track, there is no problem seeing > dramatic improvements with the patch. It changes some nasty O(N^2) > cases to O(N). In particular, the fixes affect parsing of large > strings encoded with multi-byte character encodings and containing > email addresses or URLs with a non-IP-address host component. It > strikes me as odd that URLs without a slash following the host > portion, or with an IP address, are treated so differently in the > parser, but if we want to address that, it's a matter for another > patch. Same here. Generally I do have to say that I dont find that parser very nice - and actually I think its not very efficient as well because it searches a big part of the search space all the time. I think a generated parser would be more efficient and way much easier to read...
> I'm inclined to think that the minimal differences found in some of > my tests probably have more to do with happenstance of code > alignment than the particulars of the patch. Yes, I think that as well. > I did find one significant (although easily solved) problem. In the > patch, the recursive setup of usewide, pgwstr, and wstr are not > conditioned by #ifdef USE_WIDE_UPPER_LOWER in the non-patched > version. Unless there's a good reason for that, the #ifdef should > be added. Uh. Sorry. Fixed. > Less critical, but worth fixing one way or the other, TParserClose > does not drop breadcrumbs conditioned on #ifdef WPARSER_TRACE, but > TParserCopyClose does. I think this should be consistent. Actually there was *some* thinking behind that: The end of parsing is quite obvious (the call returns, and its so verbose you will never do more than one call anyway) - but knowing where the copy ends is quite important to understand the parse flow. I added a log there because in the end that line is not going to make any difference ;-) > Sorry for the delay in review. I hope there's still time to get > this committed in this CF. Thanks for your reviewing! Actually I dont mind very much if it gets delayed or not. Its trivial enough that it shouldnt cause much work/conflicts/whatever next round and I am running patched versions anyway, so ... Andres
From e01bac641f318b378c4353aa6ccebc76b3071166 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Thu, 10 Dec 2009 19:54:22 +0100 Subject: [PATCH] Fix TSearch inefficiency because of repeated copying of strings --- src/backend/tsearch/wparser_def.c | 69 ++++++++++++++++++++++++++++++++++-- 1 files changed, 65 insertions(+), 4 deletions(-) diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c index 72b435c..46e86ee 100644 *** a/src/backend/tsearch/wparser_def.c --- b/src/backend/tsearch/wparser_def.c *************** TParserInit(char *str, int len) *** 328,333 **** --- 328,371 ---- return prs; } + /* + * As an alternative to a full TParserInit one can create a + * TParserCopy which basically is a normally TParser without a private + * copy of the string - instead it uses the one from another TParser. + * This is useful because at some places TParsers are created + * recursively and the repeated copying around of the strings can + * cause major inefficiency. + * Obviously one may not close the original TParser before the copy. + */ + static TParser * + TParserCopyInit(const TParser const* orig) + { + TParser *prs = (TParser *) palloc0(sizeof(TParser)); + + prs->charmaxlen = orig->charmaxlen; + prs->str = orig->str + orig->state->posbyte; + prs->lenstr = orig->lenstr - orig->state->posbyte; + + #ifdef USE_WIDE_UPPER_LOWER + prs->usewide = orig->usewide; + + if(orig->pgwstr) + prs->pgwstr = orig->pgwstr + orig->state->poschar; + if(orig->wstr) + prs->wstr = orig->wstr + orig->state->poschar; + #endif + + prs->state = newTParserPosition(NULL); + prs->state->state = TPS_Base; + + #ifdef WPARSER_TRACE + fprintf(stderr, "parsing copy of \"%.*s\"\n", len, str); + #endif + + return prs; + } + + static void TParserClose(TParser *prs) { *************** TParserClose(TParser *prs) *** 346,354 **** --- 384,415 ---- pfree(prs->pgwstr); #endif + #ifdef WPARSER_TRACE + fprintf(stderr, "closing parser"); + #endif + pfree(prs); + } + + /* + * See TParserCopyInit + */ + static void + TParserCopyClose(TParser *prs) + { + while (prs->state) + { + TParserPosition *ptr = prs->state->prev; + + pfree(prs->state); + prs->state = ptr; + } + #ifdef WPARSER_TRACE + fprintf(stderr, "closing parser copy"); + #endif pfree(prs); } + /* * Character-type support functions, equivalent to is* macros, but * working with any possible encodings and locales. Notes: *************** p_isignore(TParser *prs) *** 617,623 **** static int p_ishost(TParser *prs) { ! TParser *tmpprs = TParserInit(prs->str + prs->state->posbyte, prs->lenstr - prs->state->posbyte); int res = 0; tmpprs->wanthost = true; --- 678,684 ---- static int p_ishost(TParser *prs) { ! TParser *tmpprs = TParserCopyInit(prs); int res = 0; tmpprs->wanthost = true; *************** p_ishost(TParser *prs) *** 631,637 **** prs->state->charlen = tmpprs->state->charlen; res = 1; } ! TParserClose(tmpprs); return res; } --- 692,698 ---- prs->state->charlen = tmpprs->state->charlen; res = 1; } ! TParserCopyClose(tmpprs); return res; } *************** p_ishost(TParser *prs) *** 639,645 **** static int p_isURLPath(TParser *prs) { ! TParser *tmpprs = TParserInit(prs->str + prs->state->posbyte, prs->lenstr - prs->state->posbyte); int res = 0; tmpprs->state = newTParserPosition(tmpprs->state); --- 700,706 ---- static int p_isURLPath(TParser *prs) { ! TParser *tmpprs = TParserCopyInit(prs); int res = 0; tmpprs->state = newTParserPosition(tmpprs->state); *************** p_isURLPath(TParser *prs) *** 654,660 **** prs->state->charlen = tmpprs->state->charlen; res = 1; } ! TParserClose(tmpprs); return res; } --- 715,721 ---- prs->state->charlen = tmpprs->state->charlen; res = 1; } ! TParserCopyClose(tmpprs); return res; } -- 1.6.5.12.gd65df24
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers