Re: [PATCHES] Win32 shmem
On Tue, Mar 20, 2007 at 12:12:45PM -0400, Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: I think you do still need the on_shmem_exit detach callback. Ok, will look into that. Haven't tested that scenario. That was indeed so. Added in new version, attached. If it handles the restart-after-backend-crash scenario and correctly locks out starting a fresh postmaster (after postmaster crash) until all old backends are gone, then it's probably ready to commit for more-widespread testing. It does, at least in my tests. I have found one thing that needs to be chagned for terminal server sessions, and then I need to update the build system to use it on mingw as well. Will do that and then commit. I note that sysv_shmem contains some #ifdef WIN32 and #ifdef __CYGWIN__ kluges; will it now be possible to remove those, or will the Cygwin build still be using that code? I *think* so. I *think* the CYGWIN port does not rely on #ifdef WIN32s anymore (which is corret given that it's not really win32). If I do a grep of the sourcecode, I get a bunch of things like ./utils/mmgr/mcxt.c:#if defined(WIN32) || defined(__CYGWIN__) which would indicate that at least some places know they're different. I can include removal of those in my change, but I'm not in a position to test them myself. I guess we could do it and see if the buildfarm breaks, and if that revert it. //Magnus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] patch adding new regexp functions
Jeremy Drake [EMAIL PROTECTED] writes: BTW, should I be calling get_typlenbyvalalign on TEXTOID or are there macros for those also? Hardcoding -1 for typlen of varlenas is one of the few (the only?) magic constants used throughout the source code. I'm surprised there isn't a macro for it though. Do you need the alignment? If so I want to check the code against the packed varlena patch. Just in case. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] patch adding new regexp functions
Gregory Stark [EMAIL PROTECTED] writes: Jeremy Drake [EMAIL PROTECTED] writes: BTW, should I be calling get_typlenbyvalalign on TEXTOID or are there macros for those also? Hardcoding -1 for typlen of varlenas is one of the few (the only?) magic constants used throughout the source code. I'm surprised there isn't a macro for it though. Do you need the alignment? If so I want to check the code against the packed varlena patch. Just in case. Ah, it's just to construct an array, that's not a concern at all. And you're detoasting the text data types before using or storing them so that's fine. The only thing I would say is that this should maybe be a TextGetDatum() just for code hygiene. It should be exactly equivalent though: + PointerGetDatum(matchctx-orig_str), -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] patch adding new regexp functions
Jeremy Drake [EMAIL PROTECTED] writes: BTW, should I be calling get_typlenbyvalalign on TEXTOID or are there macros for those also? By and large we tend to hard-wire those properties too, eg there are plenty of places that do things like this: /* XXX: this hardcodes assumptions about the regtype type */ result = construct_array(tmp_ary, num_params, REGTYPEOID, 4, true, 'i'); Some are better commented than others ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Make CLUSTER MVCC-safe
On Tue, 2007-03-20 at 18:20 +, Heikki Linnakangas wrote: This patch makes CLUSTER MVCC-safe. Visibility information and update chains are preserved like in VACUUM FULL. Sounds good. CLUSTER is currently the only user of the facility, but I'm envisioning we might have other users in the future. For example, a version of VACUUM FULL that rewrites the whole table. I would be very much in favour of that, as discussed on -hackers. There might be some requirement for the older VACUUM FULL behaviour, so I'd like to suggest the syntax: VACUUM FULL tablename [REPLACE | PRESERVE [STORAGE]]; where VACUUM FULL foo REPLACE STORAGE would be the new default, using your new functions, and PRESERVE STORAGE would implement the old method. One complication in the implementation was the fact that heap_insert overwrites the visibility information, and it doesn't write the full tuple header to WAL. I ended up implementing a special-purpose raw_heap_insert function instead, which is optimized for bulk inserting a lot of tuples, knowing that we have exclusive access to the heap. raw_heap_insert keeps the current buffer locked over calls, until it gets full, and inserts the whole page to WAL as a single record using the existing XLOG_HEAP_NEWPAGE record type. I submitted Fast CLUSTER patch earlier which avoided writing WAL in the same way that has been done for COPY, CREATE INDEX and CTAS. Would you like to update your patch to do this also, or would you like me to re-write the patch to fit with yours? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Make CLUSTER MVCC-safe
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --- Heikki Linnakangas wrote: This patch makes CLUSTER MVCC-safe. Visibility information and update chains are preserved like in VACUUM FULL. I created a new generic rewriteheap-facility to handle rewriting tables in a visibility-preserving manner. All the update chain tracking is done in rewriteheap.c, the caller is responsible for supplying the stream of tuples. CLUSTER is currently the only user of the facility, but I'm envisioning we might have other users in the future. For example, a version of VACUUM FULL that rewrites the whole table. We could also use it to make ALTER TABLE MVCC-safe, but there's some issues with that. For example, what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint. One complication in the implementation was the fact that heap_insert overwrites the visibility information, and it doesn't write the full tuple header to WAL. I ended up implementing a special-purpose raw_heap_insert function instead, which is optimized for bulk inserting a lot of tuples, knowing that we have exclusive access to the heap. raw_heap_insert keeps the current buffer locked over calls, until it gets full, and inserts the whole page to WAL as a single record using the existing XLOG_HEAP_NEWPAGE record type. This makes CLUSTER a more viable alternative to VACUUM FULL. One motivation for making CLUSTER MVCC-safe is that if some poor soul runs pg_dump to make a backup concurrently with CLUSTER, the clustered tables will appear to be empty in the dump file. The documentation doesn't anything about CLUSTER not being MVCC-safe, so I suppose there's no need to touch the docs. I sent a doc patch earlier to add a note about it, that doc patch should still be applied to older release branches, IMO. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] patch adding new regexp functions
On Wed, 21 Mar 2007, Tom Lane wrote: Jeremy Drake [EMAIL PROTECTED] writes: BTW, should I be calling get_typlenbyvalalign on TEXTOID or are there macros for those also? By and large we tend to hard-wire those properties too, eg there are plenty of places that do things like this: /* XXX: this hardcodes assumptions about the regtype type */ result = construct_array(tmp_ary, num_params, REGTYPEOID, 4, true, 'i'); Some are better commented than others ... So, what action (if any) should be taken? Should all (or some) of these values be hardcoded, or should the current calls to determine them be left in place? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Fortune's Real-Life Courtroom Quote #19: Q: Doctor, how many autopsies have you performed on dead people? A: All my autopsies have been performed on dead people. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Stats processor not restarting
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Bah, sorry about the noise. It was the effect of PGSTAT_RESTART_INTERVAL. Do we want to add some logging when we don't restart it due to repeated failures? Not really, but maybe it would be sensible to reset last_pgstat_start_time when doing a database-wide restart? The motivation for the timeout was to reduce cycle wastage if pgstat crashed by itself, but when you've deliberately SIGQUITed it, that hardly seems to apply ... You mean like this, attached? -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: src/backend/postmaster/pgstat.c === RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v retrieving revision 1.149 diff -c -c -r1.149 pgstat.c *** src/backend/postmaster/pgstat.c 16 Mar 2007 17:57:36 - 1.149 --- src/backend/postmaster/pgstat.c 22 Mar 2007 02:03:31 - *** *** 1929,1934 --- 1929,1937 static void pgstat_exit(SIGNAL_ARGS) { + /* allow stats to restart immediately after SIGQUIT */ + last_pgstat_start_time = 0; + need_exit = true; } ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] Stats processor not restarting
Bruce Momjian [EMAIL PROTECTED] writes: Tom Lane wrote: Not really, but maybe it would be sensible to reset last_pgstat_start_time when doing a database-wide restart? You mean like this, attached? I'd be fairly surprised if resetting the variable in the child process had any effect on the postmaster... I think a correct fix would involve exposing either the variable or a routine to zero it from pgstat.c, and having postmaster.c do that at the points where it intentionally SIGQUITs the stats collector. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] patch adding new regexp functions
Jeremy Drake [EMAIL PROTECTED] writes: On Wed, 21 Mar 2007, Tom Lane wrote: By and large we tend to hard-wire those properties too, eg there are plenty of places that do things like this: So, what action (if any) should be taken? Should all (or some) of these values be hardcoded, or should the current calls to determine them be left in place? I'd vote for making this new code look like the rest of it, to wit hardwire the values. As-is, you're expending code space and cycles on flexibility that's entirely illusory --- if we ever decided to change these properties of TEXT, we'd have way more work to do than just fixing the new regexp functions. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] patch adding new regexp functions
On Wed, 21 Mar 2007, Gregory Stark wrote: The only thing I would say is that this should maybe be a TextGetDatum() just for code hygiene. It should be exactly equivalent though: I could not find such a thing using ctags, nor TextPGetDatum(). I looked at PG_RETURN_TEXT_P and it just uses PG_RETURN_POINTER, which in turn uses PointerGetDatum. If there is such a thing, it is well camoflaged... -- If you think the problem is bad now, just wait until we've solved it. -- Arthur Kasspe ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] patch adding new regexp functions
On Thu, 22 Mar 2007, Tom Lane wrote: I'd vote for making this new code look like the rest of it, to wit hardwire the values. Attached please find a patch which does this. -- Although written many years ago, Lady Chatterley's Lover has just been reissued by the Grove Press, and this pictorial account of the day-to-day life of an English gamekeeper is full of considerable interest to outdoor minded readers, as it contains many passages on pheasant-raising, the apprehending of poachers, ways to control vermin, and other chores and duties of the professional gamekeeper. Unfortunately, one is obliged to wade through many pages of extraneous material in order to discover and savour those sidelights on the management of a midland shooting estate, and in this reviewer's opinion the book cannot take the place of J. R. Miller's Practical Gamekeeping. -- Ed Zern, Field and Stream (Nov. 1959)Index: src/backend/utils/adt/regexp.c === RCS file: /home/jeremyd/local/postgres/cvsuproot/pgsql/src/backend/utils/adt/regexp.c,v retrieving revision 1.70 diff -c -r1.70 regexp.c *** src/backend/utils/adt/regexp.c 20 Mar 2007 05:44:59 - 1.70 --- src/backend/utils/adt/regexp.c 22 Mar 2007 05:17:15 - *** *** 30,35 --- 30,36 #include postgres.h #include access/heapam.h + #include catalog/pg_type.h #include funcapi.h #include regex/regex.h #include utils/builtins.h *** *** 95,106 size_toffset; re_comp_flags flags; - - /* text type info */ - Oid param_type; - int16 typlen; - bool typbyval; - char typalign; } regexp_matches_ctx; typedef struct regexp_split_ctx --- 96,101 *** *** 835,845 matchctx-pmatch = palloc(sizeof(regmatch_t) * (matchctx-cpattern-re_nsub + 1)); matchctx-offset = 0; - /* get text type oid, too lazy to do it some other way */ - matchctx-param_type = get_fn_expr_argtype(fcinfo-flinfo, 0); - get_typlenbyvalalign(matchctx-param_type, matchctx-typlen, -matchctx-typbyval, matchctx-typalign); - matchctx-wide_str = palloc(sizeof(pg_wchar) * (matchctx-orig_len + 1)); matchctx-wide_len = pg_mb2wchar_with_len(VARDATA(matchctx-orig_str), matchctx-wide_str, matchctx-orig_len); --- 830,835 *** *** 915,923 dims[0] = 1; } return construct_md_array(elems, nulls, ndims, dims, lbs, ! matchctx-param_type, matchctx-typlen, ! matchctx-typbyval, matchctx-typalign); } Datum --- 905,913 dims[0] = 1; } + /* XXX: this hardcodes assumptions about the text type */ return construct_md_array(elems, nulls, ndims, dims, lbs, ! TEXTOID, -1, false, 'i'); } Datum *** *** 976,991 { ArrayBuildState *astate = NULL; regexp_split_ctx*splitctx; - Oid param_type; int nitems; splitctx = setup_regexp_split(PG_GETARG_TEXT_P(0), PG_GETARG_TEXT_P(1), PG_GETARG_TEXT_P_IF_EXISTS(2)); - /* get text type oid, too lazy to do it some other way */ - param_type = get_fn_expr_argtype(fcinfo-flinfo, 0); - for (nitems = 0; splitctx-offset splitctx-wide_len; nitems++) { if (nitems splitctx-wide_len) --- 966,977 *** *** 995,1001 astate = accumArrayResult(astate, get_next_split(splitctx), false, ! param_type, CurrentMemoryContext); } --- 981,987 astate = accumArrayResult(astate, get_next_split(splitctx), false, ! TEXTOID, CurrentMemoryContext); } ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] patch adding new regexp functions
Jeremy Drake [EMAIL PROTECTED] writes: I could not find such a thing using ctags, nor TextPGetDatum(). I looked at PG_RETURN_TEXT_P and it just uses PG_RETURN_POINTER, which in turn uses PointerGetDatum. If there is such a thing, it is well camoflaged... AFAIR, the reason there's no TextPGetDatum (and ditto for lots of other datatypes) is lack of obvious usefulness. A function dealing with a text * doesn't normally have reason to convert that to a Datum until it returns --- and at that point PG_RETURN_TEXT_P is the thing to use. Do you have a counterexample, or does this just suggest that the regexp function patch needs some refactoring? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PATCHES] LIKE optimization in UTF-8 and locale-C
Hello, I found LIKE operators are slower on multi-byte encoding databases than single-byte encoding ones. It comes from difference between MatchText() and MBMatchText(). We've had an optimization for single-byte encodings using pg_database_encoding_max_length() == 1 test. I'll propose to extend it in UTF-8 with locale-C case. All of trailing bytes are different from first bytes in UTF-8 multi-byte characters, so we can use functions for single-bytes and byte-wise comparison in the case. With the attached patch, the performance of UTF-8 LIKE operators are pushed up to near other single-bytes encodings. Databases initialized with locale-C are widely used in Japan, because Japanese locale are broken in almost of platforms. Japanese user can choose EUC-jp or UTF-8 as a server encoding, but I think UTF-8 will be more and more used in the future. test initdb --no-locale --encoding=encoding [HEAD] SQL_ASCII : 7171 ms / 7203 ms / 7187 ms LATIN1: 7172 ms / 7156 ms / 7141 ms UTF8 : 16235 ms / 16281 ms / 16281 ms EUC_JP: 17454 ms / 17453 ms / 17438 ms [with patch] SQL_ASCII : 7062 ms / 7125 ms / 7125 ms LATIN1: 7047 ms / 7063 ms / 7047 ms UTF8 : 7188 ms / 7234 ms / 7235 ms EUC_JP: 17468 ms / 17453 ms / 17453 ms CREATE OR REPLACE FUNCTION test() RETURNS INTEGER AS $$ DECLARE cnt integer; BEGIN FOR i IN 1..1000 LOOP SELECT count(*) INTO cnt FROM item WHERE i_title LIKE '%BABABABABARIBA%' LIMIT 50; END LOOP; RETURN cnt; END; $$ LANGUAGE plpgsql; SELECT count(*) FROM item; -- borrowed from DBT-1 (TPC-W) count --- 1 (1 row) patch Index: src/backend/utils/adt/like.c === --- src/backend/utils/adt/like.c(head) +++ src/backend/utils/adt/like.c(working copy) @@ -21,6 +21,7 @@ #include mb/pg_wchar.h #include utils/builtins.h +#include utils/pg_locale.h #define LIKE_TRUE 1 @@ -119,6 +120,13 @@ /* + * true iff match functions for single-byte characters are available. + */ +#define sb_match_available() \ + (pg_database_encoding_max_length() == 1 || \ +(lc_collate_is_c() GetDatabaseEncoding() == PG_UTF8)) + +/* * interface routines called by the function manager */ @@ -138,7 +146,7 @@ p = VARDATA(pat); plen = (VARSIZE(pat) - VARHDRSZ); - if (pg_database_encoding_max_length() == 1) + if (sb_match_available()) result = (MatchText(s, slen, p, plen) == LIKE_TRUE); else result = (MBMatchText(s, slen, p, plen) == LIKE_TRUE); @@ -162,7 +170,7 @@ p = VARDATA(pat); plen = (VARSIZE(pat) - VARHDRSZ); - if (pg_database_encoding_max_length() == 1) + if (sb_match_available()) result = (MatchText(s, slen, p, plen) != LIKE_TRUE); else result = (MBMatchText(s, slen, p, plen) != LIKE_TRUE); @@ -186,7 +194,7 @@ p = VARDATA(pat); plen = (VARSIZE(pat) - VARHDRSZ); - if (pg_database_encoding_max_length() == 1) + if (sb_match_available()) result = (MatchText(s, slen, p, plen) == LIKE_TRUE); else result = (MBMatchText(s, slen, p, plen) == LIKE_TRUE); @@ -210,7 +218,7 @@ p = VARDATA(pat); plen = (VARSIZE(pat) - VARHDRSZ); - if (pg_database_encoding_max_length() == 1) + if (sb_match_available()) result = (MatchText(s, slen, p, plen) != LIKE_TRUE); else result = (MBMatchText(s, slen, p, plen) != LIKE_TRUE); @@ -275,7 +283,7 @@ int slen, plen; - if (pg_database_encoding_max_length() == 1) + if (sb_match_available()) { s = NameStr(*str); slen = strlen(s); @@ -316,7 +324,7 @@ int slen, plen; - if (pg_database_encoding_max_length() == 1) + if (sb_match_available()) { s = NameStr(*str); slen = strlen(s); @@ -357,7 +365,7 @@ int slen, plen; - if (pg_database_encoding_max_length() == 1) + if (sb_match_available()) { s = VARDATA(str); slen = (VARSIZE(str) - VARHDRSZ); @@ -393,7 +401,7 @@ int slen, plen; - if (pg_database_encoding_max_length() == 1) + if (sb_match_available()) { s = VARDATA(str); slen = (VARSIZE(str) - VARHDRSZ); @@ -429,7 +437,7 @@ text *esc = PG_GETARG_TEXT_P(1); text *result; - if (pg_database_encoding_max_length() == 1) + if (sb_match_available()) result = do_like_escape(pat, esc); else
Re: [PATCHES] patch adding new regexp functions
On Thu, 22 Mar 2007, Tom Lane wrote: AFAIR, the reason there's no TextPGetDatum (and ditto for lots of other datatypes) is lack of obvious usefulness. A function dealing with a text * doesn't normally have reason to convert that to a Datum until it returns --- and at that point PG_RETURN_TEXT_P is the thing to use. Do you have a counterexample, or does this just suggest that the regexp function patch needs some refactoring? If you are asking why I have reason to convert text * to a Datum in cases other than PG_RETURN_TEXT_P, it is used for calling text_substr functions using DirectFunctionCallN. BTW, this usage of text_substr using PointerGetDatum was copied from the pre-existing textregexsubstr function. -- Malek's Law: Any simple idea will be worded in the most complicated way. ---(end of broadcast)--- TIP 6: explain analyze is your friend