[HACKERS] Re: patch-3 (3-allow-wal-record-header-to-be-split.patch)WAL Format Changes
From: Heikki Linnakangas [heikki.linnakan...@enterprisedb.com] Sent: Sunday, July 01, 2012 1:54 AM On 30.06.2012 10:11, Amit kapila wrote: 3. General observation, not related to your changes XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) . . if (freespace == 0) { updrqst = AdvanceXLInsertBuffer(false); In the code, AdvanceXLInsertBuffer(false); is called after starting critical section and acquiring WALInsertLock, now if any error occurs inside AdvanceXLInsertBuffer() (in one of the paths it calls XLogWrite(), from which it can call XLogFileInit() where error can occur); how will it release WALInsertLock or end critical section. Yep, if an I/O error happens while writing a WAL record - running out of disk space is the typical example - we PANIC. Nope, it's not ideal. Even if we somehow managed to avoid doing I/O in the critical section in XLogInsert(), most callers of XLogInsert() surround the call in a critical section anyway. For example, when a new tuple is inserted to heap, once you've modified the page to add the new tuple, it's too late to back out. The WAL record is written while holding the lock on the page, and if something goes wrong with writing the WAL record, we have no choice but PANIC. PANIC is understandable as after this user cannot perform operation without restart. However if the level is ERROR, then there might be other problems as user can perform operations after that through same session The case which I am highlighting is of ERROR, please refer the code of XLogFileInit(). For Example: fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method), S_IRUSR | S_IWUSR); if (fd 0) { if (errno != ENOENT) ereport(ERROR, (errcode_for_file_access(), errmsg(could not open file \%s\ (log file %u, segment %u): %m, path, log, seg))); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte
On Wed, Jun 27, 2012 at 11:35 PM, Robert Haas robertmh...@gmail.com wrote: It looks to me like pg_wchar2utf_with_len will not work, because unicode_to_utf8 returns its second argument unmodified - not, as your code seems to assume, the byte following what was already written. Fixed. MULE also looks problematic. The code that you've written isn't symmetric with the opposite conversion, unlike what you did in all other cases, and I don't understand why. I'm also somewhat baffled by the reverse conversion: it treats a multi-byte sequence beginning with a byte for which IS_LCPRV1(x) returns true as invalid if there are less than 3 bytes available, but it only reads two; similarly, for IS_LCPRV2(x), it demands 4 bytes but converts only 3. Should we save existing pg_wchar representation for MULE encoding? Probably, we can modify it like in 0.1 version of patch in order to make it more transparent. -- With best regards, Alexander Korotkov. wchar2mb-0.4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Make pg_basebackup configure and start standby
Hi, attached is a patch that does $SUBJECT. It's a usability enhancement, to take a backup, write a minimalistic recovery.conf and start the streaming standby in one go. Comments? Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ diff -durpN postgresql.orig/doc/src/sgml/ref/pg_basebackup.sgml postgresql/doc/src/sgml/ref/pg_basebackup.sgml --- postgresql.orig/doc/src/sgml/ref/pg_basebackup.sgml 2012-06-18 07:06:42.774009894 +0200 +++ postgresql/doc/src/sgml/ref/pg_basebackup.sgml 2012-07-01 12:51:51.050652170 +0200 @@ -186,6 +186,28 @@ PostgreSQL documentation /varlistentry varlistentry + termoption-R/option/term + termoption--write-recovery-conf/option/term + listitem + para +Write a minimal recovery.conf into the output directory using +the connection parameters from the command line to ease +setting up the standby. Conflicts with option--xlog/option + /para + /listitem + /varlistentry + + varlistentry + termoption-S/option/term + termoption--start-standby/option/term + listitem + para +Start the standby database server. Implies option-R/option. + /para + /listitem + /varlistentry + + varlistentry termoption-X replaceable class=parametermethod/replaceable/option/term termoption--xlog-method=replaceable class=parametermethod/replaceable/option/term listitem diff -durpN postgresql.orig/src/bin/pg_basebackup/pg_basebackup.c postgresql/src/bin/pg_basebackup/pg_basebackup.c --- postgresql.orig/src/bin/pg_basebackup/pg_basebackup.c 2012-06-26 09:10:21.301759598 +0200 +++ postgresql/src/bin/pg_basebackup/pg_basebackup.c 2012-07-01 12:44:36.920813960 +0200 @@ -46,6 +46,8 @@ int compresslevel = 0; bool includewal = false; bool streamwal = false; bool fastcheckpoint = false; +bool writerecoveryconf = false; +bool startstandby = false; int standby_message_timeout = 10 * 1000; /* 10 sec = default */ /* Progress counters */ @@ -107,6 +109,10 @@ usage(void) printf(_(\nOptions controlling the output:\n)); printf(_( -D, --pgdata=DIRECTORY receive base backup into directory\n)); printf(_( -F, --format=p|t output format (plain (default), tar)\n)); + printf(_( -R, --write-recovery-conf\n + write recovery.conf after backup\n)); + printf(_( -S, --start-standby start standby after backup, implies -R\n + conflicts with -x\n)); printf(_( -x, --xlog include required WAL files in backup (fetch mode)\n)); printf(_( -X, --xlog-method=fetch|stream\n include required WAL files with specified method\n)); @@ -1193,6 +1199,85 @@ BaseBackup(void) fprintf(stderr, %s: base backup completed\n, progname); } +static void WriteRecoveryConf(void) +{ + char filename[MAXPGPATH]; + char cfhost[MAXPGPATH] = ; + char cfport[MAXPGPATH] = ; + char cfuser[MAXPGPATH] = ; + char cfpass[MAXPGPATH] = ; + FILE *cf; + + if (!writerecoveryconf) + return; + + memset(cfhost, 0, MAXPGPATH); + if (dbhost) + snprintf(cfhost, MAXPGPATH-1, host=%s, dbhost); + + memset(cfport, 0, MAXPGPATH); + if (dbport) + snprintf(cfport, MAXPGPATH-1, port=%s, dbport); + + memset(cfuser, 0, MAXPGPATH); + if (dbuser) + snprintf(cfuser, MAXPGPATH-1, user=%s, dbuser); + + memset(cfpass, 0, MAXPGPATH); + if (dbpassword) + snprintf(cfpass, MAXPGPATH-1, password='%s', dbpassword); + else + printf(add password to primary_conninfo in %s if needed\n, filename); + + sprintf(filename, %s/recovery.conf, basedir); + + cf = fopen(filename, w); + if (cf == NULL) + { + fprintf(stderr, _(cannot create %s), filename); + exit(1); + } + + fprintf(cf, standby_mode = 'on'\n); + fprintf(cf, primary_conninfo = '%s %s %s'\n, cfhost, cfport, cfuser); + + fclose(cf); +} + +static void StartStandby(const char *command) +{ + char *path; + char *args[5]; + int pos, len; + + if (!startstandby) + return; + + path = xstrdup(command); + len = strlen(path); + + for (pos = len; pos 0; pos--) + { + if (path[pos - 1] == '/' +#ifdef WIN32 + || path[pos - 1] == '\\' +#endif + ) + break; + } + + sprintf(path[pos], pg_ctl); + + args[0] = path; + args[1] = -D; + args[2] = basedir; + args[3] = start; + args[4] = NULL; + + if (execvp(path, args) 0) + fprintf(stderr, _(Cannot execute pg_ctl: %s), + strerror(errno)); +} int main(int argc, char **argv) @@ -1203,6 +1288,8 @@ main(int argc, char **argv) {pgdata, required_argument, NULL, 'D'}, {format, required_argument, NULL, 'F'}, {checkpoint, required_argument, NULL, 'c'}, + {write-recovery-conf, no_argument, NULL, 'R'}, + {start-standby, no_argument, NULL, 'S'}, {xlog, no_argument, NULL, 'x'}, {xlog-method,
Re: [HACKERS] [PATCH 14/16] Add module to apply changes from an apply-cache using low-level functions
Hi, Andres! There is my review of this patch. 1) Patches don't apply cleanly to head. So I used commit bed88fceac04042f0105eb22a018a4f91d64400d as the base for patches, then all the patches close to this apply cleanly. Regression tests pass OK, but it seems that new functionality isn't covered by regression tests. 2) Patch needs more comments. I think we need at least one comment in head of each function describing its behaviour, even if it is evident from function name. 4) There is significant code duplication in APPLY_CACHE_CHANGE_UPDATE and APPLY_CACHE_CHANGE_DELETE branches of case in apply_change function. I think this could be refactored to reduce code duplication. 5) Apply mechanism requires PK from each table. So, throwing error here if we don't find PK is necessary. But we need to prevent user from run logical replication earlier than failing applying received messages. AFACS patch which is creating corresponding log messages is here: http://archives.postgresql.org/message-id/1339586927-13156-7-git-send-email-and...@2ndquadrant.com. And it throws any warning if it fails to find PK. On which stage we prevent user from running logical replication on tables which doesn't have PK? 6) I've seen comment /* FIXME: locking */. But you open index with command index_rel = index_open(indexoid, AccessShareLock); and close it with command heap_close(index_rel, NoLock); Shouldn't we use same level of locking on close as on open? Also, heap_close doesn't looks good to me for closing index. Why don't use index_close or relation_close? 7) We find each updated and deleted tuple by PK. Imagine we update significant part of the table (for example, 10%) in single query and planner choose sequential scan for it. Then applying of this changes could be more expensive than doing original changes. This it probably ok. But, we could do some heuristics: detect that sequential scan is cheaper because of large amount of updates or deletes in one table. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux
On Fri, Jun 29, 2012 at 1:07 PM, Nils Goroll sl...@schokola.de wrote: FWIW, I kicked off a looong benchmarking run on this a couple of days ago on the IBM POWER7 box, testing pgbench -S, regular pgbench, and pgbench --unlogged-tables at various client counts with and without the patch; three half-hour test runs for each test configuration. It should be done tonight and I will post the results once they're in. Sounds great! I am really curious. Here are the results. Each result is the median of three 30-minute test runs on an IBM POWER7 system with 16 cores, 64 hardware threads. Configuration was shared_buffers = 8GB, maintenance_work_mem = 1GB, synchronous_commit = off, checkpoint_segments = 300, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9, wal_writer_delay = 20ms, log_line_prefix = '%t [%p] '. Lines beginning with m show performance on master; lines beginning with p show performance with patch; the following number is the # of clients used for the test. Permanent Tables m01 tps = 1364.521373 (including connections establishing) m08 tps = 9175.281381 (including connections establishing) m32 tps = 14770.652793 (including connections establishing) m64 tps = 14183.495875 (including connections establishing) p01 tps = 1366.447001 (including connections establishing) p08 tps = 9406.181857 (including connections establishing) p32 tps = 14608.766540 (including connections establishing) p64 tps = 14182.576636 (including connections establishing) Unlogged Tables === m01 tps = 1459.649000 (including connections establishing) m08 tps = 11872.102025 (including connections establishing) m32 tps = 32834.258026 (including connections establishing) m64 tps = 33404.988834 (including connections establishing) p01 tps = 1481.876584 (including connections establishing) p08 tps = 11787.657258 (including connections establishing) p32 tps = 32959.342248 (including connections establishing) p64 tps = 33672.008244 (including connections establishing) SELECT-only === m01 tps = 8777.971832 (including connections establishing) m08 tps = 70695.558964 (including connections establishing) m32 tps = 201762.696020 (including connections establishing) m64 tps = 310137.544470 (including connections establishing) p01 tps = 8914.165586 (including connections establishing) p08 tps = 71351.501358 (including connections establishing) p32 tps = 201946.425301 (including connections establishing) p64 tps = 305627.413716 (including connections establishing) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] Row-Level Security
2012/6/28 Tom Lane t...@sss.pgh.pa.us: Kohei KaiGai kai...@kaigai.gr.jp writes: 2012/6/27 Florian Pflug f...@phlo.org: Hm, what happens if a SECURITY DEFINER functions returns a refcursor? My impression is, here is no matter even if SECURITY DEFINER function returns refcursor. I think Florian has a point: it *should* work, but *will* it? I believe it works today, because the executor only applies permissions checks during query startup. So those checks are executed while still within the SECURITY DEFINER context, and should behave as expected. Subsequently, the cursor portal is returned to caller and caller can execute it to completion, no problem. However, with RLS security-related checks will happen throughout the execution of the portal. They might do the wrong thing once the SECURITY DEFINER function has been exited. I tried the scenario that Florian pointed out. postgres=# CREATE OR REPLACE FUNCTION f_test(refcursor) RETURNS refcursor postgres-# SECURITY DEFINER LANGUAGE plpgsql postgres-# AS 'BEGIN OPEN $1 FOR SELECT current_user, * FROM t1; RETURN $1; END'; CREATE FUNCTION postgres=# BEGIN; BEGIN postgres=# SELECT f_test('abc'); f_test abc (1 row) postgres=# FETCH abc; current_user | a | b --+---+- kaigai | 1 | aaa (1 row) postgres=# SET SESSION AUTHORIZATION alice; SET postgres= FETCH abc; current_user | a | b --+---+- alice| 2 | bbb (1 row) postgres= SET SESSION AUTHORIZATION bob; SET postgres= FETCH abc; current_user | a | b --+---+- bob | 3 | ccc (1 row) Indeed, the output of current_user depends on the setting of session authorization, even though it was declared within security definer functions (thus, its security checks are applied on the privileges of function owner). We might need to consider that a portal has a local value of current_user, which is kind of a pain, but probably doable. It seems to me, it is an individual matter to be fixed independent from RLS feature. How about your opinion? If we go ahead, an idea to tackle this matter is switch user-id into saved one in the Portal at the beginning of PortanRun or PortalRunFetch. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux
Thank you, Robert. as this patch was not targeted towards increasing tps, I am at happy to hear that your benchmarks also suggest that performance is comparable. But my main question is: how about resource consumption? For the issue I am working on, my current working hypothesis is that spinning on locks saturates resources and brings down overall performance in a high-contention situation. Do you have any getrusage figures or anything equivalent? Thanks, Nils -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux
test runs on an IBM POWER7 system with 16 cores, 64 hardware threads. Could you add the CPU Type / clock speed please? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby
On Sun, Jul 1, 2012 at 8:02 PM, Boszormenyi Zoltan z...@cybertec.at wrote: Hi, attached is a patch that does $SUBJECT. It's a usability enhancement, to take a backup, write a minimalistic recovery.conf and start the streaming standby in one go. Comments? Could you add the patch to the next CommitFest? If the backup is taken from the standby server, the standby's recovery.conf is included in the backup. What happens in this case? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby
Hi, 2012-07-01 17:38 keltezéssel, Fujii Masao írta: On Sun, Jul 1, 2012 at 8:02 PM, Boszormenyi Zoltan z...@cybertec.at wrote: Hi, attached is a patch that does $SUBJECT. It's a usability enhancement, to take a backup, write a minimalistic recovery.conf and start the streaming standby in one go. Comments? Could you add the patch to the next CommitFest? I will. If the backup is taken from the standby server, the standby's recovery.conf is included in the backup. What happens in this case? As documented, the command line parameters of pg_basebackup will be used for recovery.conf. So, the new standby will replicate the previous one. Cascading replication works since 9.2. Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby
On Sun, Jul 1, 2012 at 1:02 PM, Boszormenyi Zoltan z...@cybertec.at wrote: Hi, attached is a patch that does $SUBJECT. It's a usability enhancement, to take a backup, write a minimalistic recovery.conf and start the streaming standby in one go. I like the writing of recovery.conf. In fact, I had it in my code at one very early point and took it out in order to get a clean patch ready :) But I think that part is lacking in functionality: AFAICT it's hardcoded to only handle host, port, user and password. What about other connection parameters, likely passed to pg_basebackup through the environment in that case? isn't that quite likely to break the server later? Maybe the proper way around that is to provide the ability for pg_basebackup to take a full connection string, just like we allow psql to do? I'm not sure we should go the way of providing the start slave. Given thta how you want to start the slave differs so much on platforms. The most glaring example is on windows you really need to *start the service* rather than use pg_ctl. Sure, you can document your way around that, but I'm not sure the functionality added is really worth it. What about all the other potential connection parameters? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 14/16] Add module to apply changes from an apply-cache using low-level functions
On Sun, Jul 1, 2012 at 3:11 PM, Alexander Korotkov aekorot...@gmail.comwrote: 1) Patches don't apply cleanly to head. So I used commit bed88fceac04042f0105eb22a018a4f91d64400d as the base for patches, then all the patches close to this apply cleanly. Regression tests pass OK, but it seems that new functionality isn't covered by regression tests. 2) Patch needs more comments. I think we need at least one comment in head of each function describing its behaviour, even if it is evident from function name. 4) There is significant code duplication in APPLY_CACHE_CHANGE_UPDATE and APPLY_CACHE_CHANGE_DELETE branches of case in apply_change function. I think this could be refactored to reduce code duplication. 5) Apply mechanism requires PK from each table. So, throwing error here if we don't find PK is necessary. But we need to prevent user from run logical replication earlier than failing applying received messages. AFACS patch which is creating corresponding log messages is here: http://archives.postgresql.org/message-id/1339586927-13156-7-git-send-email-and...@2ndquadrant.com. And it throws any warning if it fails to find PK. On which stage we prevent user from running logical replication on tables which doesn't have PK? 6) I've seen comment /* FIXME: locking */. But you open index with command index_rel = index_open(indexoid, AccessShareLock); and close it with command heap_close(index_rel, NoLock); Shouldn't we use same level of locking on close as on open? Also, heap_close doesn't looks good to me for closing index. Why don't use index_close or relation_close? 7) We find each updated and deleted tuple by PK. Imagine we update significant part of the table (for example, 10%) in single query and planner choose sequential scan for it. Then applying of this changes could be more expensive than doing original changes. This it probably ok. But, we could do some heuristics: detect that sequential scan is cheaper because of large amount of updates or deletes in one table. 8) If we can't find tuple for update or delete we likely need to put PK value into the log message. -- With best regards, Alexander Korotkov.
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby
On Mon, Jul 2, 2012 at 12:42 AM, Boszormenyi Zoltan z...@cybertec.at wrote: Hi, 2012-07-01 17:38 keltezéssel, Fujii Masao írta: On Sun, Jul 1, 2012 at 8:02 PM, Boszormenyi Zoltan z...@cybertec.at wrote: Hi, attached is a patch that does $SUBJECT. It's a usability enhancement, to take a backup, write a minimalistic recovery.conf and start the streaming standby in one go. Comments? Could you add the patch to the next CommitFest? I will. If the backup is taken from the standby server, the standby's recovery.conf is included in the backup. What happens in this case? As documented, the command line parameters of pg_basebackup will be used for recovery.conf. So, the new standby will replicate the previous one. Cascading replication works since 9.2. So pg_basebackup overwrites the recovery.conf which was backed up from the standby with the recovery.conf which was created by using the command line parameters of pg_basebackup? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby
On Mon, Jul 2, 2012 at 12:44 AM, Magnus Hagander mag...@hagander.net wrote: On Sun, Jul 1, 2012 at 1:02 PM, Boszormenyi Zoltan z...@cybertec.at wrote: Hi, attached is a patch that does $SUBJECT. It's a usability enhancement, to take a backup, write a minimalistic recovery.conf and start the streaming standby in one go. I like the writing of recovery.conf. Agreed. In fact, I had it in my code at one very early point and took it out in order to get a clean patch ready :) But I think that part is lacking in functionality: AFAICT it's hardcoded to only handle host, port, user and password. What about other connection parameters, likely passed to pg_basebackup through the environment in that case? isn't that quite likely to break the server later? What about something like PQconninfo which returns the connection string value established at connection? Maybe the proper way around that is to provide the ability for pg_basebackup to take a full connection string, just like we allow psql to do? +1 I'm not sure we should go the way of providing the start slave. Given thta how you want to start the slave differs so much on platforms. The most glaring example is on windows you really need to *start the service* rather than use pg_ctl. Sure, you can document your way around that, but I'm not sure the functionality added is really worth it. Agreed. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] compiler warnings on the buildfarm
seeing some of the latest commits about fixing compiler warnings I took a look at the buildfarm to see if there are any interesting ones there (in total we have a thousends of warnings on the buildfarm but most of those are from very noisy compilers). so in case anybody is interested those are a selection of the ones that at least look somewhat interesting(duplicates mostly removed, windows ignored): animal: grebe Snapshot: 2012-07-01 150224 scan.c: In function 'yy_try_NUL_trans': scan.c:16243: warning: unused variable 'yyg' auth.c: In function 'auth_peer': auth.c:1775: warning: implicit declaration of function 'getpeereid' ip.c: In function 'getaddrinfo_unix': ip.c:228: warning: large integer implicitly truncated to unsigned type Extra instructions are being generated for each reference to a TOC symbol if the symbol is in the TOC overflow area. fe-connect.c: In function 'PQconnectPoll': fe-connect.c:1913: warning: implicit declaration of function 'getpeereid' ip.c: In function 'getaddrinfo_unix': ip.c:228: warning: large integer implicitly truncated to unsigned type animal: spoonbill Snapshot: 2012-07-01 110005 tuptoaster.c: In function 'heap_tuple_untoast_attr_slice': tuptoaster.c:198: warning: array size (1) smaller than bound length (16) tuptoaster.c:198: warning: array size (1) smaller than bound length (16) tuptoaster.c: In function 'toast_raw_datum_size': tuptoaster.c:275: warning: array size (1) smaller than bound length (16) tuptoaster.c:275: warning: array size (1) smaller than bound length (16) tuptoaster.c: In function 'toast_datum_size': tuptoaster.c:320: warning: array size (1) smaller than bound length (16) tuptoaster.c:320: warning: array size (1) smaller than bound length (16) tuptoaster.c: In function 'toast_save_datum': tuptoaster.c:1344: warning: array size (1) smaller than bound length (16) tuptoaster.c:1344: warning: array size (1) smaller than bound length (16) tuptoaster.c:1458: warning: array size (1) smaller than bound length (16) tuptoaster.c:1458: warning: array size (1) smaller than bound length (16) tuptoaster.c: In function 'toast_delete_datum': tuptoaster.c:1485: warning: array size (1) smaller than bound length (16) tuptoaster.c:1485: warning: array size (1) smaller than bound length (16) tuptoaster.c: In function 'toast_fetch_datum': tuptoaster.c:1610: warning: array size (1) smaller than bound length (16) tuptoaster.c:1610: warning: array size (1) smaller than bound length (16) tuptoaster.c: In function 'toast_fetch_datum_slice': tuptoaster.c:1779: warning: array size (1) smaller than bound length (16) tuptoaster.c:1779: warning: array size (1) smaller than bound length (16) relmapper.c: In function 'relmap_redo': relmapper.c:876: warning: array size (1) smaller than bound length (512) relmapper.c:876: warning: array size (1) smaller than bound length (512) elog.c: In function 'write_pipe_chunks': elog.c:2541: warning: array size (1) smaller than bound length (503) elog.c:2541: warning: array size (1) smaller than bound length (503) animal: jaguarundi Snapshot: 2012-07-01 031500 plpy_exec.c: In function 'PLy_procedure_call': plpy_exec.c:818: warning: null format string animal: rover_firefly Snapshot: 2012-07-01 030400 float.c: In function 'is_infinite': float.c:167:2: warning: implicit declaration of function 'isinf' [-Wimplicit-function-declaration] geo_ops.c: In function 'pg_hypot': geo_ops.c:5455:2: warning: implicit declaration of function 'isinf' [-Wimplicit-function-declaration] execute.c: In function 'sprintf_double_value': execute.c:473:2: warning: implicit declaration of function 'isinf' [-Wimplicit-function-declaration] animal: nightjar Snapshot: 2012-07-01 023700 In file included from /usr/home/andrew/bf/root/HEAD/pgsql.66715/../pgsql/src/backend/parser/gram.y:13338: scan.c: In function 'yy_try_NUL_trans': scan.c:16243: warning: unused variable 'yyg' /usr/home/andrew/bf/root/HEAD/pgsql.66715/../pgsql/src/pl/plpython/plpy_exec.c: In function 'PLy_procedure_call': /usr/home/andrew/bf/root/HEAD/pgsql.66715/../pgsql/src/pl/plpython/plpy_exec.c:818: warning: null format string /usr/home/andrew/bf/root/HEAD/pgsql.66715/../pgsql/src/pl/tcl/pltcl.c: In function '_PG_init': /usr/home/andrew/bf/root/HEAD/pgsql.66715/../pgsql/src/pl/tcl/pltcl.c:343: warning: assignment from incompatible pointer type /usr/home/andrew/bf/root/HEAD/pgsql.66715/../pgsql/src/pl/tcl/pltcl.c:344: warning: assignment from incompatible pointer type animal: locust Snapshot: 2012-07-01 023122 xlog.c: In function 'StartupXLOG': xlog.c:5988: warning: 'checkPointLoc' may be used uninitialized in this function pgstat.c: In function 'pgstat_report_activity': pgstat.c:2538: warning: passing argument 1 of '__dtrace_probe$postgresql$statement__status$v1$63686172202a' discards qualifiers from pointer target type In file included from repl_gram.y:172: postgres.c: In function 'pg_parse_query': postgres.c:559: warning: passing argument 1 of
Re: [HACKERS] [ADMIN] pg_basebackup blocking all queries with horrible performance
Thanks for the review! On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander mag...@hagander.net wrote: On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander mag...@hagander.net wrote: You agreed to add something like NOSYNC option into START_REPLICATION command? I'm on the fence. I was hoping somebody else would chime in with an opinion as well. +1 Nobody else with any opinion on this? :( I don't think we really need a NOSYNC flag at this point. Just not setting the flush location in clients that make a point of flushing in a timely fashion seems fine. Okay, I'm in the minority, so I'm writing the patch that way. WIP patch attached. In the patch, pg_basebackup background process and pg_receivexlog always return invalid location as flush one, and will never become sync standby even if their name is in synchronous_standby_names. The timing of their sending That doesn't match with the patch, afaics. The patch always sets the correct write location, which means it can become a remote_write synchronous standby, no? It will only send it back when timeout expires, but it will be sent back. No. Though correct write location is sent back, they don't become sync standby because flush location is always invalid. While flush location is invalid, the master will never regard the remote server as sync one even if synchronous_commit is set to remote_write. I wonder if that might actually be a more reasonable mode of operation in general: * always send back the write position, at the write interval * always send back the flush position, when we're flushing (meaning when we switch xlog) have an option that makes it possible to: * always send back the write position as soon as it changes (making for a reasonable remote_write sync standby) * actually flush the log after each write instead of end of file (making for a reasonable full sync standby) meaning you'd have something like pg_receivexlog --sync=write and pg_receivexlog --sync=flush controlling it instead. Yeah, in this way, pg_receivexlog can become sync even if synchronous_commit is on, which seems more useful. But I'm thinking that the synchronous pg_receivexlog stuff should be postponed to 9.3 because its patch seems to become too big to apply at this beta stage. So, in 9.2, to fix the problem, what about just applying the simple patch which prevents pg_basebackup background process and pg_receivexlog from becoming sync standby whatever synchronous_standby_names and synchronous_commit are set to? And deal with the user put * in synchronous_standby_names and accidentally got pg_receivexlog as the sync standby by more clearly warning people not to use * for that parameter... Since it's simply dangerous :) Yep. the reply depends on the standby_message_timeout specified in -s option. So the write position may lag behind the true position. pg_receivexlog accepts new option -S (better option character?). If this option is specified, pg_receivexlog returns true flush position, and can become sync standby. It sends back the reply to the master each time the write position changes or the timeout passes. If synchronous_commit is set to remote_write, synchronous replication to pg_receivexlog would work well. Yeah, I hadn't considered the remote_write mode, but I guess that's why you have to track the current write position across loads, which first confused me. The patch has to track the current write location to decide whether to send back the reply to the master, IOW to know whether the write location has changed, IOW to know whether we've already sent the reply about the latest write location. Looking at some other usecases for this, I wonder if we should also force a status message whenever we switch xlog files, even if we aren't running in sync mode, even if the timeout hasn't expired. I think that would be a reasonable thing to do, since you often want to track things based on files. You mean that the pg_receivexlog should send back the correct flush location whenever it switches xlog files? The patch needs more documentation. But I think that it's worth reviewing the code in advance, so I attached the WIP patch. Comments? Objections? Looking at the code, what exactly prompts the changes to the backend side? That seems unrelated? Are we actually considering picking a standby with InvalidXlogRecPtr as a sync standby today? Isn't it enough to just send the proper write and flush locations from the frontend? No, unless I'm missing something. The problem that we should address first is that the master can pick up pg_basebackup background process and pg_receivexlog as a sync standby even if they always return an invalid write and flush positions. Since they don't send any correct write and flush positions, if they are accidentally regarded as
Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash
On Sat, Jun 30, 2012 at 5:51 AM, Andres Freund and...@2ndquadrant.com wrote: Currently its possible to cause transactions to fail with ALTER ENUM ADD AFTER/BEFORE: psql 1: CREATE TYPE enumcrash AS ENUM('a', 'b'); CREATE FUNCTION randenum() RETURNS enumcrash LANGUAGE sql AS $$SELECT * FROM unnest(enum_range('a'::enumcrash)) ORDER BY random() LIMIT 1$$; CREATE TABLE enumcrash_table(id serial primary key, val enumcrash); CREATE INDEX enumcrash_table__val__id ON enumcrash_table (val, id); INSERT INTO enumcrash_table (val) SELECT randenum() FROM generate_series(1, 1);INSERT 0 1 psql 2: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; INSERT INTO enumcrash_table (val) SELECT randenum() FROM generate_series(1, 1); psql 1: ALTER TYPE enumcrash ADD VALUE 'a_1' AFTER 'a'; INSERT INTO enumcrash_table (val) SELECT randenum() FROM generate_series(1, 1); psql 2: INSERT INTO enumcrash_table (val) SELECT randenum() FROM generate_series(1, 1); ERROR: XX000: enum value 117745 not found in cache for enum enumcrash LOCATION: compare_values_of_enum, typcache.c:957 This is not surprising. psql 2's backend finds rows in the index with enum values that are not visible in its mvcc snapshot. That mvcc snapshot is needed because a_1 is an enum value with an uneven numbered oid because its inserted after the initial creation. I think the problem is that load_enum_cache_data() uses GetTransactionSnapshot() rather than GetLatestSnapshot(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux
On Sun, Jul 1, 2012 at 11:13 AM, Nils Goroll sl...@schokola.de wrote: as this patch was not targeted towards increasing tps, I am at happy to hear that your benchmarks also suggest that performance is comparable. But my main question is: how about resource consumption? For the issue I am working on, my current working hypothesis is that spinning on locks saturates resources and brings down overall performance in a high-contention situation. Do you have any getrusage figures or anything equivalent? Spinlock contentions cause tps to go down. The fact that tps didn't change much in this case suggests that either these workloads don't generate enough spinlock contention to benefit from your patch, or your patch doesn't meaningfully reduce it, or both. We might need a test case that is more spinlock-bound to observe an effect. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux
On Sun, Jul 1, 2012 at 11:18 AM, Nils Goroll sl...@schokola.de wrote: test runs on an IBM POWER7 system with 16 cores, 64 hardware threads. Could you add the CPU Type / clock speed please? cpu : POWER7 (architected), altivec supported clock : 3550.00MHz revision: 2.1 (pvr 003f 0201) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new --maintenance-db options
On Fri, Jun 29, 2012 at 3:32 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Jun 25, 2012 at 11:57:36AM -0400, Robert Haas wrote: In retrospect, it seems as though it might have been a good idea to make the postgres database read-only and undroppable, so that all client utilities could count on being able to connect to it and get a list of databases in the cluster without the need for all this complexity. Or else having some other way for a client to authenticate and list out all the available databases. In the absence of such a mechanism, I don't think we can turn around and say that not having a postgres database is an unsupported configuration, and therefore we need some way to cope with it when it happens. Well, we certainly don't allow 'template1' to be dropped: test= DROP DATABASE template1; ERROR: cannot drop a template database so you could make the argument that making 'postgres' undroppable seem reasonable. I should point out that it was EnterpriseDB that complained about this related to their Advanced Server product, that doesn't have a 'postgres' database, but an 'edb' one. I said that was their problem, but when community users said they also dropped the 'postgres' database, it became a community problem too. Where are we going on this for PG 9.2? 9.3? I hate to ship options in 9.2 that will be gone in 9.3. FYI, we do allow the 'template1' database to be renamed: test= ALTER DATABASE template1 RENAME TO template2; ALTER DATABASE Oops. TODO? Not only that, but you can change datistemplate and then drop it OR rename it. We don't have a rule that says you can't drop template1. We have a rule that says you can't drop template databases. template1 is merely the default template database, but the user can create more, and they can get rid of, rename, or modify that one. I imagine that most people don't, but let's not make up an imaginary rule that template1 always has to exist, because it doesn't. Also, even if it does exist, it may have datallowconn = false (I think I've actually seen this, on a system that also had no postgres database), or pg_hba.conf may exclude connections to it, or it may be screwed up in a hundred other ways (databases that can't be connected to because the system catalogs are screwed up are not terribly rare). So in my opinion, any code that relies on the existence of, ability to connect to, or sane state of a database with any particular name is plain broken, because somebody somewhere is going to have an installation where it isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] pg_basebackup blocking all queries with horrible performance
On Sun, Jul 1, 2012 at 7:14 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander mag...@hagander.net wrote: On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander mag...@hagander.net wrote: You agreed to add something like NOSYNC option into START_REPLICATION command? I'm on the fence. I was hoping somebody else would chime in with an opinion as well. +1 Nobody else with any opinion on this? :( I don't think we really need a NOSYNC flag at this point. Just not setting the flush location in clients that make a point of flushing in a timely fashion seems fine. Okay, I'm in the minority, so I'm writing the patch that way. WIP patch attached. In the patch, pg_basebackup background process and pg_receivexlog always return invalid location as flush one, and will never become sync standby even if their name is in synchronous_standby_names. The timing of their sending That doesn't match with the patch, afaics. The patch always sets the correct write location, which means it can become a remote_write synchronous standby, no? It will only send it back when timeout expires, but it will be sent back. No. Though correct write location is sent back, they don't become sync standby because flush location is always invalid. While flush location is invalid, the master will never regard the remote server as sync one even if synchronous_commit is set to remote_write. Oh. I wasn't aware of that part. I wonder if that might actually be a more reasonable mode of operation in general: * always send back the write position, at the write interval * always send back the flush position, when we're flushing (meaning when we switch xlog) have an option that makes it possible to: * always send back the write position as soon as it changes (making for a reasonable remote_write sync standby) * actually flush the log after each write instead of end of file (making for a reasonable full sync standby) meaning you'd have something like pg_receivexlog --sync=write and pg_receivexlog --sync=flush controlling it instead. Yeah, in this way, pg_receivexlog can become sync even if synchronous_commit is on, which seems more useful. But I'm thinking that the synchronous pg_receivexlog stuff should be postponed to 9.3 because its patch seems to become too big to apply at this beta stage. So, in 9.2, to fix the problem, what about just applying the simple patch which prevents pg_basebackup background process and pg_receivexlog from becoming sync standby whatever synchronous_standby_names and synchronous_commit are set to? Agreed. With the addition that we should set the write location, because that's very useful and per what you said above should be perfectly safe. And deal with the user put * in synchronous_standby_names and accidentally got pg_receivexlog as the sync standby by more clearly warning people not to use * for that parameter... Since it's simply dangerous :) Yep. What would be good wording? Something along the line of Using the * entry is not recommended since it can lead to unexpected results when new standbys are added or something like that? the reply depends on the standby_message_timeout specified in -s option. So the write position may lag behind the true position. pg_receivexlog accepts new option -S (better option character?). If this option is specified, pg_receivexlog returns true flush position, and can become sync standby. It sends back the reply to the master each time the write position changes or the timeout passes. If synchronous_commit is set to remote_write, synchronous replication to pg_receivexlog would work well. Yeah, I hadn't considered the remote_write mode, but I guess that's why you have to track the current write position across loads, which first confused me. The patch has to track the current write location to decide whether to send back the reply to the master, IOW to know whether the write location has changed, IOW to know whether we've already sent the reply about the latest write location. Yeha, makes perfect sense. Looking at some other usecases for this, I wonder if we should also force a status message whenever we switch xlog files, even if we aren't running in sync mode, even if the timeout hasn't expired. I think that would be a reasonable thing to do, since you often want to track things based on files. You mean that the pg_receivexlog should send back the correct flush location whenever it switches xlog files? No, I mean just send back a status message. Meaning that without specifiying the sync modes per above, it would send back the *write* location. This would be useful for tracking xlog filenames between master and pg_receivexlog, without extra delay. The patch needs more documentation. But I think
Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux
Hi Robert, Spinlock contentions cause tps to go down. The fact that tps didn't change much in this case suggests that either these workloads don't generate enough spinlock contention to benefit from your patch, or your patch doesn't meaningfully reduce it, or both. We might need a test case that is more spinlock-bound to observe an effect. Agree. My understanding is that - for no contention, aquiring a futex should almost be as fast as aquiring a spinlock, so we should observe - comparable tps - comparable resource consumption I believe this is what your test has shown for the low concurrency tests. - for light contention, spinning will be faster than syscalling, so we should observe with the patch - slightly worse tps - more syscalls, otherwise comparable resource consumption I believe your test supports the first point for high concurrency tests. - for high contention, spinning should be be - unfair (because the time to aquire a lock is not deterministic - individual threads could starve) - much less efficient and we should see with the patch - slightly better tps if the system is not saturated because the next process to aquire a contended futex gets scheduled immediately, rather than when a process returns from sleeping - much better tps if the system is saturated / oversubscribed due to increased scheduling latency for spinning processes - significantly lower resource consumption - so we should have much more headroom before running into saturation as described above So would it be possible for you to record resource consumption and rerun the test? Thank you, Nils -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby
On Jul 1, 2012, at 5:44 PM, Magnus Hagander wrote: On Sun, Jul 1, 2012 at 1:02 PM, Boszormenyi Zoltan z...@cybertec.at wrote: Hi, attached is a patch that does $SUBJECT. It's a usability enhancement, to take a backup, write a minimalistic recovery.conf and start the streaming standby in one go. I like the writing of recovery.conf. In fact, I had it in my code at one very early point and took it out in order to get a clean patch ready :) But I think that part is lacking in functionality: AFAICT it's hardcoded to only handle host, port, user and password. What about other connection parameters, likely passed to pg_basebackup through the environment in that case? isn't that quite likely to break the server later? one option would be to check the environments and take them if needed. however, i am not sure if this is a good idea either - just thing of PGPASSWORD or so. do we really want to take it and write it to a file straight away? i guess there are arguments for both ideas. still, i guess your argument is a reasonable one. Maybe the proper way around that is to provide the ability for pg_basebackup to take a full connection string, just like we allow psql to do? this would make things redundant. i am quite sure some users might not get the distinction straight away. I'm not sure we should go the way of providing the start slave. Given thta how you want to start the slave differs so much on platforms. The most glaring example is on windows you really need to *start the service* rather than use pg_ctl. Sure, you can document your way around that, but I'm not sure the functionality added is really worth it. What about all the other potential connection parameters. regards, hans -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for array_remove and array_replace functions
On Sat, Jun 30, 2012 at 3:28 PM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: On 30/06/2012 04:16, Alex Hunsaker wrote: Hi, I've been reviewing this patch. Good documentation, and regression tests. The code looked fine but I didn't care for the code duplication between array_replace and array_remove so I merged those into a helper function, array_replace_internal(). Thoughts? It looks reasonable. There was a typo in array_replace which was caught by regression tests. I've fixed the typo and changed a comment in array_replace_internal. Patch v3 attached. Looks good to me, marked ready for commiter. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash
Robert Haas robertmh...@gmail.com writes: On Sat, Jun 30, 2012 at 5:51 AM, Andres Freund and...@2ndquadrant.com wrote: This is not surprising. psql 2's backend finds rows in the index with enum values that are not visible in its mvcc snapshot. I think the problem is that load_enum_cache_data() uses GetTransactionSnapshot() rather than GetLatestSnapshot(). That would only make the race condition window smaller (ie, hard to reproduce manually like this, but not gone). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash
I wrote: Robert Haas robertmh...@gmail.com writes: I think the problem is that load_enum_cache_data() uses GetTransactionSnapshot() rather than GetLatestSnapshot(). That would only make the race condition window smaller (ie, hard to reproduce manually like this, but not gone). No, wait, we made ALTER TYPE ADD VALUE PreventTransactionChain so that uncommitted enum OIDs could never get into tables or indexes. So I think you're right, forcing a new snapshot to be used would fix this. However, I'm a bit worried by the if (!FirstSnapshotSet) restriction in GetLatestSnapshot. Are we sure that enum comparisons could never happen without a snapshot already being set? What's the point of throwing an error there anyway, as opposed to letting it redirect to GetTransactionSnapshot? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux
On Fri, Jun 29, 2012 at 10:07 AM, Nils Goroll sl...@schokola.de wrote: On 06/28/12 05:21 PM, Jeff Janes wrote: It looks like the hacked code is slower than the original. That doesn't seem so good to me. Am I misreading this? No, you are right - in a way. This is not about maximizing tps, this is about maximizing efficiency under load situations But why wouldn't this maximized efficiency present itself as increased TPS? Also, 20 transactions per connection is not enough of a run to make any evaluation on. As you can see I've repeated the tests 10 times. I've tested slight variations as mentioned above, so I was looking for quick results with acceptable variation. Testing it 10 times doesn't necessarily improve things. You have ten times as many transactions, but you also have 10 times as much start-up and tear-down events polluting the numbers. (Unless this start up and tear down are exactly the things you are trying to measure). And once you change your benchmark so that it is testing what you want to be testing, it will probably get even worse. You should use at least -T30, rather than -t20. Anyway, your current benchmark speed of around 600 TPS over such a short time periods suggests you are limited by fsyncs. It is going to be pretty hard to get a spinlock bottleneck in simple queries like pgbench does as long as that is the case. You could turn --fsync=off, or just change your benchmark to a read-only one like -S, or better the -P option I've been trying get into pgbench. Does your production server have fast fsyncs (BBU) while your test server does not? Regarding the actual production issue, I did not manage to synthetically provoke the saturation we are seeing in production using pgbench - I could not even get anywhere near the production load. What metrics/tools are you using to compare the two loads? We've got cpu + load avg statistics for the old+new machine and compared values before/after the migration. The user load presumably is comparable and the main metric is users complaining vs. users happy. The users probably don't care about the load average. Presumably they are unhappy because of lowered throughput (TPS) or higher peak latency (-l switch in pgbench). So I think the only use of load average is to verify that your benchmark is nothing like your production workload. (But it doesn't go the other way around, just because the load averages are similar doesn't mean the actual workloads are.) I wish we had a synthetic benchmark close to the actual load, and I hope that one of the insights from this will be that the customer should have one. If they could simulate a workload close to what they actually do, that would be great. But surely just with fairly simple pgbench configuration you can get much closer to it than what you are currently. What is the production load like? Here's an anonymized excerpt from a pgFouine analysis of 137 seconds worth of query logs at average production user load. Type Count Percentage SELECT 80,217 27.1 INSERT 6,248 2.1 UPDATE 37,159 12.6 DELETE 4,579 1.5 Without knowing how complicated the joins involved in the various statements are, I don't think I can get much info out of this. but I'm not familiar with pgFouine, maybe there is another way to summarize its output that is more informative. Queries that took up the most time (N) ^ Rank Total duration Times executed Av. duration s Query 1 3m39s 83,667 0.00 COMMIT; So fsync's probably are not totally free on production, but I still think they must be much cheaper than on your test box. 2 54.4s 2 27.18 SELECT ... That is interesting. Maybe those two queries are hammering everything else to death. 3 41.1s 281 0.15 UPDATE ... 4 25.4s 18,960 0.00 UPDATE ... 5 21.9s ... the 9th rank is already below 10 seconds Total duration But how does the 9th rank through the final rank, cumulatively, stack up? In other words, how many query-seconds worth of time transpired during the 137 wall seconds? That would give an estimate of how many simultaneously active connections the production server has. Each transaction has to update one of ten pgbench_branch rows, so you can't have more than ten transactions productively active at any given time, even though you have 768 connections. So you need to jack up the pgbench scale, or switch to using -N mode. Sorry for having omitted that detail. I had initialized pgbench with -i -s 100 Are you sure? In an earlier email you reported the entire output of pgbench, and is said it was using 10. Maybe you've changed it since then... Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench--new transaction type
On Wed, Jun 20, 2012 at 12:32 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 01.06.2012 03:02, Jeff Janes wrote: I've attached a new patch which addresses several of your concerns, and adds the documentation. The description is much longer than the descriptions of other nearby options, which mostly just give a simple statement of what they do rather than a description of why that is useful. I don't know if that means I'm starting a good trend, or a bad one, or I'm just putting the exposition in the wrong place. In addition to showing the benefits of coding things on the server side when that is applicable, it also allows hackers to stress parts of the server code that are not easy to stress otherwise. As you mentioned in your original email over a year ago, most of this could be done as a custom script. It's nice to have another workload to test, but then again, there's an infinite number of workloads that might be interesting. I would argue that my specific proposed transaction does occupy a somewhat privileged place, as it uses the default table structure, and it fits on a natural progression along the default, -N, -S continuum. In each step one (or more) source of bottleneck is removed, to allow different ones to bubble up to the top to be inspected and addressed. I've written dozens of custom benchmark scripts, and so far this is the only one I've thought was generally useful enough to think that it belongs inside the core. Do I need to make a better argument for why this particular transaction is as generally useful as I think it is; or is the objection to the entire idea that any new transactions should be added at all? You can achieve the same with this custom script: \set loops 512 do $$ DECLARE sum integer default 0; amount integer; account_id integer; BEGIN FOR i IN 1..:loops LOOP account_id=1 + floor(random() * :scale); SELECT abalance into strict amount FROM pgbench_accounts WHERE aid = account_id; sum := sum + amount; END LOOP; END; $$; It's a bit awkward because it has to be all on one line, and you don't get the auto-detection of scale. But those are really the only differences AFAICS. True. And we could fix the one-line thing by allowing lines ending in \ to be continued. And this might be mostly backwards compatible, as I don't think there is much of a reason to end a statement with a literal \. We could make the scale be auto-detected, but that would be more dangerous as a backward compatibility thing, as people with existing custom scripts might rely :scale being set to 1. But it would defeat my primary purpose of making it easy for us to ask some else posting on this list or on performance to run this. Copy this file someplace, then make sure you use the correct scale, and remember where you put it, etc. is a much higher overhead. Also, my patch changes the output formatting in a way that couldn't be done with a custom script, and might be very confusing if it were not changed. I don't how good of an argument this is. Remember to multiply the reported TPS by 512 is another barrier to easy use. I think we would be better off improving pgbench to support those things in custom scripts. It would be nice to be able to write initialization steps that only run once in the beginning of the test. I can think of 3 possibly desirable behaviors. Run once at -i time, Run once per pgbench run, and run once per connection. Should all of those be implemented? You could then put the SELECT COUNT(*) FROM pgbench_branches there, to do the scale auto-detection. So, we should add a way to do \set variable_name SQL QUERY? Would we use a command other than \set to do that, or look at the thing after the variable to decide if it is a query rather than a simple expression? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux
Hi Jeff, It looks like the hacked code is slower than the original. That doesn't seem so good to me. Am I misreading this? No, you are right - in a way. This is not about maximizing tps, this is about maximizing efficiency under load situations But why wouldn't this maximized efficiency present itself as increased TPS? Because the latency of lock aquision influences TPS, but this is only marginally related to the cost in terms of cpu cyclues to aquire the locks. See my posting as of Sun, 01 Jul 2012 21:02:05 +0200 for an overview of my understanding. Also, 20 transactions per connection is not enough of a run to make any evaluation on. As you can see I've repeated the tests 10 times. I've tested slight variations as mentioned above, so I was looking for quick results with acceptable variation. Testing it 10 times doesn't necessarily improve things. My intention was to average over the imperfections of rusage accounting because I was maily interested in lowering rusage, not maximizing tps. Yes, in order to get reliable results, I'd have to run longer tests, but interestingly the results from my quick tests already approximated those from the huge tests Robert has run with respect to the differences between unpatched and patched. You should use at least -T30, rather than -t20. Thanks for the advice - it is really appreciated and I will take it when I run more test tests. But I don't understand yet how to best provoke high spinlock concurrency with pgbench. Or are there are any other test tools out there for this case? Anyway, your current benchmark speed of around 600 TPS over such a short time periods suggests you are limited by fsyncs. Definitely. I described the setup in my initial posting (why roll-your-own s_lock? / improving scalability - Tue, 26 Jun 2012 19:02:31 +0200) pgbench does as long as that is the case. You could turn --fsync=off, or just change your benchmark to a read-only one like -S, or better the -P option I've been trying get into pgbench. I don't like to make assumptions which I haven't validated. The system showing the behavior is designed to write to persistent SSD storage in order to reduce the risk of data loss by a (BBU) cache failure. Running a test with fsync=off would divert even further from reality. Does your production server have fast fsyncs (BBU) while your test server does not? No, we're writing directly to SSDs (ref: initial posting). The users probably don't care about the load average. Presumably they are unhappy because of lowered throughput (TPS) or higher peak latency (-l switch in pgbench). So I think the only use of load average is to verify that your benchmark is nothing like your production workload. (But it doesn't go the other way around, just because the load averages are similar doesn't mean the actual workloads are.) Fully agree. RankTotal duration Times executed Av. duration s Query 1 3m39s 83,667 0.00 COMMIT; So fsync's probably are not totally free on production, but I still think they must be much cheaper than on your test box. Oh, the two are the same. I ran the tests on the prod machine during quiet periods. 2 54.4s2 27.18 SELECT ... That is interesting. Maybe those two queries are hammering everything else to death. With 64 cores? I should have mentioned that these were simply the result of a missing index when the data was collected. But how does the 9th rank through the final rank, cumulatively, stack up? In other words, how many query-seconds worth of time transpired during the 137 wall seconds? That would give an estimate of how many simultaneously active connections the production server has. Sorry, I should have given you the stats from pgFouine: Number of unique normalized queries: 507 Number of queries: 295,949 Total query duration: 8m38s First query: 2012-06-23 14:51:01 Last query: 2012-06-23 14:53:17 Query peak: 6,532 queries/s at 2012-06-23 14:51:33 Sorry for having omitted that detail. I had initialized pgbench with -i -s 100 Are you sure? In an earlier email you reported the entire output of pgbench, and is said it was using 10. Maybe you've changed it since then... good catch, I was wrong in the email you quoted. Sorry. -bash-4.1$ rsync -av --delete /tmp/test_template_data/ /tmp/data/ ... -bash-4.1$ ./postgres -D /tmp/data -p 55502 [1] 38303 -bash-4.1$ LOG: database system was shut down at 2012-06-26 23:18:42 CEST LOG: database system is ready to accept connections LOG: autovacuum launcher started -bash-4.1$ ./psql -p 55502 psql (9.1.3) Type help for help. postgres=# select count(*) from pgbench_branches; count --- 10 (1 row) Thank you very much, Jeff! The one question remains: Do we really have all we need to provoke very high lock contention? Nils -- Sent via pgsql-hackers mailing list
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
On Thu, Jun 28, 2012 at 6:57 PM, Josh Berkus j...@agliodbs.com wrote: A second obstacle to opportunistic wraparound vacuum is that wraparound vacuum is not interruptable. If you have to kill it off and do something else for a couple hours, it can't pick up where it left off; it needs to scan the whole table from the beginning again. Would recording a different relfrozenxid for each 1GB chunk of the relation solve that? Since your users weren't complaining about performance with one or two autovac workers running (were they?), No, it's when we hit 3 that it fell over. Thresholds vary with memory and table size, of course. Does that mean it worked fine with 2 workers simultaneously in large tables, or did that situation not occur and so it is not known whether it would have worked fine or not? BTW, the primary reason I think (based on a glance at system stats) this drove the system to its knees was that the simultaneous wraparound vacuum of 3 old-cold tables evicted all of the current data out of the FS cache, forcing user queries which would normally hit the FS cache onto disk. I/O throughput was NOT at 100% capacity. Do you know if it was the input or the output that caused that to happen? I would think the kernel has logic similar to BAS to prevent reading a huge amount of data sequentially from evicting all the other data. But that logic might be defeated if all that data is dirtied right after being read. If the partitions had not been touched since the last freeze, then it should generate no dirty blocks (right?), but if they were touched since then you could basically be writing out the entire table. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proof of concept: auto updatable views
Hi, I've been playing around with the idea of supporting automatically updatable views, and I have a working proof of concept. I've taken a different approach than the previous attempts to implement this feature (e.g., http://archives.postgresql.org/pgsql-hackers/2009-01/msg01746.php), instead doing all the work in the rewriter, substituting the view for its base relation rather than attempting to auto-generate any rules or triggers. Basically what it does is this: in the first stage of query rewriting, just after any non-SELECT rules are applied, the new code kicks in - if the target relation is a view, and there were no unqualified INSTEAD rules, and there are no INSTEAD OF triggers, it tests if the view is simply updatable. If so, the target view is replaced by its base relation and columns are re-mapped. Then the remainder of the rewriting process continues, recursively handling any further non-SELECT rules or additional simply updatable views. This handles the case of views on top of views, with or without rules and/or triggers. Here's a simple example: CREATE TABLE my_table(id int primary key, val text); CREATE VIEW my_view AS SELECT * FROM my_table WHERE id 0; then any modifications to the view get redirected to underlying table: EXPLAIN ANALYSE INSERT INTO my_view VALUES(1, 'Test row'); QUERY PLAN Insert on my_table (cost=0.00..0.01 rows=1 width=0) (actual time=0.208..0.208 rows=0 loops=1) - Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1) Total runtime: 0.327 ms (3 rows) EXPLAIN ANALYSE UPDATE my_view SET val='Updated' WHERE id=1; QUERY PLAN --- Update on my_table (cost=0.00..8.27 rows=1 width=10) (actual time=0.039..0.039 rows=0 loops=1) - Index Scan using my_table_pkey on my_table (cost=0.00..8.27 rows=1 width=10) (actual time=0.014..0.015 rows=1 loops=1) Index Cond: ((id 0) AND (id = 1)) Total runtime: 0.090 ms (4 rows) EXPLAIN ANALYSE DELETE FROM my_view; QUERY PLAN Delete on my_table (cost=0.00..1.01 rows=1 width=6) (actual time=0.030..0.030 rows=0 loops=1) - Seq Scan on my_table (cost=0.00..1.01 rows=1 width=6) (actual time=0.015..0.016 rows=1 loops=1) Filter: (id 0) Total runtime: 0.063 ms (4 rows) The patch is currently very strict about what kinds of views can be updated (based on SQL-92), and there is no support for WITH CHECK OPTION, because I wanted to keep this patch as simple as possible. The consensus last time seemed to be that backwards compatibility should be offered through a new GUC variable to allow this feature to be disabled globally, which I've not implemented yet. I'm also aware that my new function ChangeVarAttnos() is almost identical to the function map_variable_attnos() that Tom recently added, but I couldn't immediately see a neat way to merge the two. My function handles whole-row references to the view by mapping them to a generic RowExpr based on the view definition. I don't think a ConvertRowtypeExpr can be used in this case, because the column names don't necessarily match. Obviously there's still more work to do but the early signs seem to be encouraging. Thoughts? Regards, Dean auto-update-views.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update on the spinlock-pthread_mutex patch experimental: replace s_lock spinlock code with pthread_mutex on linux
On Sun, Jul 1, 2012 at 2:28 PM, Nils Goroll sl...@schokola.de wrote: Hi Jeff, It looks like the hacked code is slower than the original. That doesn't seem so good to me. Am I misreading this? No, you are right - in a way. This is not about maximizing tps, this is about maximizing efficiency under load situations But why wouldn't this maximized efficiency present itself as increased TPS? Because the latency of lock aquision influences TPS, but this is only marginally related to the cost in terms of cpu cyclues to aquire the locks. See my posting as of Sun, 01 Jul 2012 21:02:05 +0200 for an overview of my understanding. I still don't see how improving that could not improve TPS. But let's focus on reproducing the problem first, otherwise it is all just talking in the dark. But I don't understand yet how to best provoke high spinlock concurrency with pgbench. Or are there are any other test tools out there for this case? Use pgbench -S, or apply my patch from pgbench--new transaction type and then run pgbench -P. Make sure that the scale is such that all of your data fits in shared_buffers (I find on 64 bit that pgbench takes about 15MB * scale) Anyway, your current benchmark speed of around 600 TPS over such a short time periods suggests you are limited by fsyncs. Definitely. I described the setup in my initial posting (why roll-your-own s_lock? / improving scalability - Tue, 26 Jun 2012 19:02:31 +0200) OK. It looks like several things changed simultaneously. How likely do you think it is that the turning off of the write cache caused the problem? pgbench does as long as that is the case. You could turn --fsync=off, or just change your benchmark to a read-only one like -S, or better the -P option I've been trying get into pgbench. I don't like to make assumptions which I haven't validated. The system showing the behavior is designed to write to persistent SSD storage in order to reduce the risk of data loss by a (BBU) cache failure. Running a test with fsync=off would divert even further from reality. I think that you can't get much farther from reality than your current benchmarks are, I'm afraid. If your goal is the get pgbench closer to being limited by spinlock contention, then fsync=off, or using -S or -P, will certainly do that. So if you have high confidence that spinlock contention is really the problem, fsync=off will get you closer to the thing you want to focus on, even if it takes you further away from the holistic big-picture production environment. And since you went to the trouble of making patches for spinlocks, I assume you are fairly confident that that is the problem. If you are not confident that spinlocks are really the problem, then I agree it would be a mistake to try to craft a simple pgbench run which focuses in on one tiny area which might not actually be the correct area. In that case, you would instead want to either create a very complicated workload that closely simulates your production load (a huge undertaking) or find a way to capture an oprofile of the production server while it is actually in distress. Also, it would help if you could get oprofile to do a call graph so you can see which call sites the contended spin locks are coming from (sorry, I don't know how to do this successfully with oprofile) Does your production server have fast fsyncs (BBU) while your test server does not? No, we're writing directly to SSDs (ref: initial posting). OK. So it seems like the pgbench workload you are doing are limited by fsyncs, and the CPU is basically idle because of that limit. While your real work load needs a much larger amount of processing power per fsync, so it is closer to both limits at the same time. But, since the stats you posted were for the normal rather than the distressed state, maybe I'm way off here. Anyway, the easiest way to increase the pgbench CPU per fsync need is to turn of fsync or synchronous_commit, or to switch to read only queries. 2 54.4s 2 27.18 SELECT ... That is interesting. Maybe those two queries are hammering everything else to death. With 64 cores? Maybe. That is the nature of spin-locks. The more cores you have, the more other things each one interferes with. Except that the duration is not long enough to cover the entire run period. But then again, maybe in the distressed state those same queries did cover the entire duration. But yeah, now that I think about it this would not be my top hypothesis. In other words, how many query-seconds worth of time transpired during the 137 wall seconds? That would give an estimate of how many simultaneously active connections the production server has. Sorry, I should have given you the stats from pgFouine: Number of unique normalized queries: 507 Number of queries: 295,949 Total query duration: 8m38s First query: 2012-06-23 14:51:01 Last query:
Re: [HACKERS] Proof of concept: auto updatable views
My thoughts on this is that it would be a very valuable feature to have, and would make Postgres views behave more like they always were intended to behave, which is indistinguishible to users from tables in behavior where all possible, and that the reverse mapping would be automatic with the DBMS being given only the view-defining SELECT, where possible. -- Darren Duncan Dean Rasheed wrote: I've been playing around with the idea of supporting automatically updatable views, and I have a working proof of concept. I've taken a different approach than the previous attempts to implement this feature (e.g., http://archives.postgresql.org/pgsql-hackers/2009-01/msg01746.php), instead doing all the work in the rewriter, substituting the view for its base relation rather than attempting to auto-generate any rules or triggers. Basically what it does is this: in the first stage of query rewriting, just after any non-SELECT rules are applied, the new code kicks in - if the target relation is a view, and there were no unqualified INSTEAD rules, and there are no INSTEAD OF triggers, it tests if the view is simply updatable. If so, the target view is replaced by its base relation and columns are re-mapped. Then the remainder of the rewriting process continues, recursively handling any further non-SELECT rules or additional simply updatable views. This handles the case of views on top of views, with or without rules and/or triggers. snip Obviously there's still more work to do but the early signs seem to be encouraging. Thoughts? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash
On Jul 1, 2012, at 4:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: However, I'm a bit worried by the if (!FirstSnapshotSet) restriction in GetLatestSnapshot. Are we sure that enum comparisons could never happen without a snapshot already being set? What's the point of throwing an error there anyway, as opposed to letting it redirect to GetTransactionSnapshot? I don't know whether it should set the transaction snapshot or just r -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash
On Jul 2, 2012, at 12:04 AM, Robert Haas robertmh...@gmail.com wrote: On Jul 1, 2012, at 4:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: However, I'm a bit worried by the if (!FirstSnapshotSet) restriction in GetLatestSnapshot. Are we sure that enum comparisons could never happen without a snapshot already being set? What's the point of throwing an error there anyway, as opposed to letting it redirect to GetTransactionSnapshot? I don't know whether it should set the transaction snapshot or just r Argh, sorry. ...or just return a current snapshot, and I also don't know whether it needs to be changed because of this; but I agree with changing it. Erroring out always seemed kind of pointless to me... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash
Robert Haas robertmh...@gmail.com writes: On Jul 2, 2012, at 12:04 AM, Robert Haas robertmh...@gmail.com wrote: On Jul 1, 2012, at 4:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: However, I'm a bit worried by the if (!FirstSnapshotSet) restriction in GetLatestSnapshot. I don't know whether it should set the transaction snapshot or just r Argh, sorry. ...or just return a current snapshot, and I also don't know whether it needs to be changed because of this; but I agree with changing it. Erroring out always seemed kind of pointless to me... I think it was coded like that because the sole original use was in ri_triggers.c, in which it would have been a red flag if no transaction snapshot already existed. However, the restriction clearly doesn't fit with GetLatestSnapshot's API spec, so it seems to me to be sensible to change it (as opposed to, say, inventing a new snapshot function with a subtly different API spec). As for creating an MVCC snapshot without causing a transaction snapshot to be established, no thanks --- that would create a path of control that exists nowhere today and has gotten no testing at all. I suspect that it might actively break some assumptions in snapshot management. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XX000: enum value 117721 not found in cache for enum enumcrash
Excerpts from Tom Lane's message of lun jul 02 00:24:06 -0400 2012: Robert Haas robertmh...@gmail.com writes: On Jul 2, 2012, at 12:04 AM, Robert Haas robertmh...@gmail.com wrote: On Jul 1, 2012, at 4:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: However, I'm a bit worried by the if (!FirstSnapshotSet) restriction in GetLatestSnapshot. I don't know whether it should set the transaction snapshot or just r Argh, sorry. ...or just return a current snapshot, and I also don't know whether it needs to be changed because of this; but I agree with changing it. Erroring out always seemed kind of pointless to me... I think it was coded like that because the sole original use was in ri_triggers.c, in which it would have been a red flag if no transaction snapshot already existed. Yeah, that's correct. I guess I could have made the check an Assert() instead of elog(). However, the restriction clearly doesn't fit with GetLatestSnapshot's API spec, so it seems to me to be sensible to change it (as opposed to, say, inventing a new snapshot function with a subtly different API spec). Sounds sensible. As for creating an MVCC snapshot without causing a transaction snapshot to be established, no thanks --- that would create a path of control that exists nowhere today and has gotten no testing at all. I suspect that it might actively break some assumptions in snapshot management. I think it should work fine as long as the snapshot is registered, but as you say it is untested. Maybe it'd have interactions with the way snapshots connect with resource owners. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers