Re: A new function to wait for the backend exit after termination
On Fri, Jun 11, 2021 at 09:37:50PM -0700, Noah Misch wrote: > On Fri, Jun 11, 2021 at 08:54:08PM -0500, Justin Pryzby wrote: > > On Sat, Jun 05, 2021 at 12:08:01PM -0700, Noah Misch wrote: > > > > > My preference is to remove pg_wait_for_backend_termination(). The > > > > > use case > > > > > that prompted this thread used pg_terminate_backend(pid, 18); it > > > > > doesn't > > > > > need pg_wait_for_backend_termination(). > > > > Is this an Opened Issue ? > > An Open Item? Not really, since there's no objective defect. Nonetheless, > the attached is what I'd like to use. I think of this as a list of stuff to avoid forgetting that needs to be addressed or settled before the release. If the value of the new function is marginal, it may be good to remove it, else we're committed to supporting it. Even if it's not removed, the descriptions should be cleaned up. | src/include/catalog/pg_proc.dat- descr => 'terminate a backend process and if timeout is specified, wait for its exit or until timeout occurs', => I think doesn't need to change or mention the optional timeout at all | src/include/catalog/pg_proc.dat-{ oid => '2137', descr => 'wait for a backend process exit or timeout occurs', => should just say "wait for a backend process to exit". The timeout has a default.
Re: use `proc->pgxactoff` as the value of `index` in `ProcArrayRemove()`
Ok! Thanks -- --原始邮件-- 发件人:"Andres Freund "
Re: use `proc->pgxactoff` as the value of `index` in `ProcArrayRemove()`
Hi, On 2021-05-07 04:36:25 +0800, 盏一 wrote: > Sounds like a plan! Do you want to write a patch? > Add the patch. Thanks for the patch. I finally pushed an edited version of it. There were other loops over ->pgprocnos, so I put assertions in those - that gains us a a good bit more checking than we had before... I also couldn't resist to do some small formatting cleanups - I found the memmove calls just too hard to read. I took the authorship information as you had it in the diff you attached - I hope that's OK? Greetings, Andres Freund
Re: A new function to wait for the backend exit after termination
On Fri, Jun 11, 2021 at 08:54:08PM -0500, Justin Pryzby wrote: > On Sat, Jun 05, 2021 at 12:08:01PM -0700, Noah Misch wrote: > > > > My preference is to remove pg_wait_for_backend_termination(). The use > > > > case > > > > that prompted this thread used pg_terminate_backend(pid, 18); it > > > > doesn't > > > > need pg_wait_for_backend_termination(). > > Is this an Opened Issue ? An Open Item? Not really, since there's no objective defect. Nonetheless, the attached is what I'd like to use. Author: Noah Misch Commit: Noah Misch Remove pg_wait_for_backend_termination(). It was unable wait on a backend that had already left the procarray. Users tolerant of that limitation can poll pg_stat_activity. Other users can employ the "timeout" argument of pg_terminate_backend(). Reviewed by FIXME. Discussion: https://postgr.es/m/20210605013236.ga208...@rfd.leadboat.com diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index fbc80c1..d387a32 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25002,23 +25002,6 @@ SELECT collation for ('foo' COLLATE "de_DE"); false is returned. - - - - - pg_wait_for_backend_termination - -pg_wait_for_backend_termination ( pid integer, timeout bigint DEFAULT 5000 ) -boolean - - -Waits for the backend process with the specified Process ID to -terminate. If the process terminates before -the timeout (in milliseconds) -expires, true is returned. On timeout, a warning -is emitted and false is returned. - - diff --git a/doc/src/sgml/release-14.sgml b/doc/src/sgml/release-14.sgml index a2ad120..045f2b0 100644 --- a/doc/src/sgml/release-14.sgml +++ b/doc/src/sgml/release-14.sgml @@ -611,13 +611,7 @@ Author: Magnus Hagander --> - Add function pg_wait_for_backend_termination() - that waits for session exit (Bharath Rupireddy) - - - - Also add a similar optional wait parameter to pg_terminate_backend() diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index a4373b1..a416e94 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -397,11 +397,6 @@ CREATE OR REPLACE FUNCTION RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend' PARALLEL SAFE; -CREATE OR REPLACE FUNCTION - pg_wait_for_backend_termination(pid integer, timeout int8 DEFAULT 5000) - RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_wait_for_backend_termination' - PARALLEL SAFE; - -- legacy definition for compatibility with 9.3 CREATE OR REPLACE FUNCTION json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false) diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c index 8376994..5ed8b2e 100644 --- a/src/backend/storage/ipc/signalfuncs.c +++ b/src/backend/storage/ipc/signalfuncs.c @@ -231,42 +231,6 @@ pg_terminate_backend(PG_FUNCTION_ARGS) } /* - * Wait for a backend process with the given PID to exit or until the given - * timeout milliseconds occurs. Returns true if the backend has exited. On - * timeout a warning is emitted and false is returned. - * - * We allow any user to call this function, consistent with any user being - * able to view the pid of the process in pg_stat_activity etc. - */ -Datum -pg_wait_for_backend_termination(PG_FUNCTION_ARGS) -{ - int pid; - int64 timeout; - PGPROC *proc = NULL; - - pid = PG_GETARG_INT32(0); - timeout = PG_GETARG_INT64(1); - - if (timeout <= 0) - ereport(ERROR, - (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), -errmsg("\"timeout\" must not be negative or zero"))); - - proc = BackendPidGetProc(pid); - - if (proc == NULL) - { - ereport(WARNING, - (errmsg("PID %d is not a PostgreSQL server process", pid))); - - PG_RETURN_BOOL(false); - } - - PG_RETURN_BOOL(pg_wait_until_termination(pid, timeout)); -} - -/* * Signal to reload the database configuration * * Permission checking for this function is managed through the normal diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index acbcae4..2d45765 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -6191,10 +6191,6 @@ proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool', proargtypes => 'int4 int8', proargnames => '{pid,timeout}', prosrc => 'pg_terminate_backend' }, -{ oid => '2137', descr => 'wait for a backend process exit or timeout occurs', - proname => 'pg_wait_for_backend_termination',
Re: Add client connection check during the execution of the query
On Fri, Apr 30, 2021 at 2:23 PM Thomas Munro wrote: > Here's something I wanted to park here to look into for the next > cycle: it turns out that kqueue's EV_EOF flag also has the right > semantics for this. That leads to the idea of exposing the event via > the WaitEventSet API, and would the bring > client_connection_check_interval feature to 6/10 of our OSes, up from > 2/10. Maybe Windows' FD_CLOSE event could get us up to 7/10, not > sure. Rebased. Added documentation tweak and a check to reject the GUC on unsupported OSes. From 356b0dcbc15353b9bd349972c80a7f2e5c516a0e Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 30 Apr 2021 10:38:40 +1200 Subject: [PATCH v2 1/2] Add WL_SOCKET_CLOSED for socket shutdown events. Provide a way for WaitEventSet to report that the remote peer has shut down its socket, independently of whether there is any buffered data remaining to be read. This works only on systems where the kernel exposes that information, namely: * WAIT_USE_POLL builds, using (non-standard) POLLRDHUP * WAIT_USE_EPOLL builds, using EPOLLRDHUP * WAIT_USE_KQUEUE builds, using EV_EOF Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru --- src/backend/storage/ipc/latch.c | 79 + src/include/storage/latch.h | 6 ++- 2 files changed, 74 insertions(+), 11 deletions(-) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index 1d893cf863..54e928c564 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -841,6 +841,7 @@ FreeWaitEventSet(WaitEventSet *set) * - WL_SOCKET_CONNECTED: Wait for socket connection to be established, * can be combined with other WL_SOCKET_* events (on non-Windows * platforms, this is the same as WL_SOCKET_WRITEABLE) + * - WL_SOCKET_CLOSED: Wait for socket to be closed by remote peer. * - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies * * Returns the offset in WaitEventSet->events (starting from 0), which can be @@ -1043,12 +1044,16 @@ WaitEventAdjustEpoll(WaitEventSet *set, WaitEvent *event, int action) else { Assert(event->fd != PGINVALID_SOCKET); - Assert(event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)); + Assert(event->events & (WL_SOCKET_READABLE | +WL_SOCKET_WRITEABLE | +WL_SOCKET_CLOSED)); if (event->events & WL_SOCKET_READABLE) epoll_ev.events |= EPOLLIN; if (event->events & WL_SOCKET_WRITEABLE) epoll_ev.events |= EPOLLOUT; + if (event->events & WL_SOCKET_CLOSED) + epoll_ev.events |= EPOLLRDHUP; } /* @@ -1087,12 +1092,18 @@ WaitEventAdjustPoll(WaitEventSet *set, WaitEvent *event) } else { - Assert(event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)); + Assert(event->events & (WL_SOCKET_READABLE | +WL_SOCKET_WRITEABLE | +WL_SOCKET_CLOSED)); pollfd->events = 0; if (event->events & WL_SOCKET_READABLE) pollfd->events |= POLLIN; if (event->events & WL_SOCKET_WRITEABLE) pollfd->events |= POLLOUT; +#ifdef POLLRDHUP + if (event->events & WL_SOCKET_CLOSED) + pollfd->events |= POLLRDHUP; +#endif } Assert(event->fd != PGINVALID_SOCKET); @@ -1165,7 +1176,9 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events) Assert(event->events != WL_LATCH_SET || set->latch != NULL); Assert(event->events == WL_LATCH_SET || event->events == WL_POSTMASTER_DEATH || - (event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE))); + (event->events & (WL_SOCKET_READABLE | + WL_SOCKET_WRITEABLE | + WL_SOCKET_CLOSED))); if (event->events == WL_POSTMASTER_DEATH) { @@ -1188,9 +1201,9 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events) * old event mask to the new event mask, since kevent treats readable * and writable as separate events. */ - if (old_events & WL_SOCKET_READABLE) + if (old_events & (WL_SOCKET_READABLE | WL_SOCKET_CLOSED)) old_filt_read = true; - if (event->events & WL_SOCKET_READABLE) + if (event->events & (WL_SOCKET_READABLE | WL_SOCKET_CLOSED)) new_filt_read = true; if (old_events & WL_SOCKET_WRITEABLE) old_filt_write = true; @@ -1210,7 +1223,10 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events) event); } - Assert(count > 0); + /* For WL_SOCKET_READ -> WL_SOCKET_CLOSED, no change needed. */ + if (count == 0) + return; + Assert(count <= 2); rc = kevent(set->kqueue_fd, _ev[0], count, NULL, 0, NULL); @@ -1525,7 +1541,9 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, returned_events++; } } - else if (cur_event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)) + else if (cur_event->events & (WL_SOCKET_READABLE | + WL_SOCKET_WRITEABLE | + WL_SOCKET_CLOSED)) { Assert(cur_event->fd != PGINVALID_SOCKET); @@ -1543,6 +1561,13 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
Re: pg_filenode_relation(0,0) elog
Justin Pryzby writes: > Per sqlsmith. > postgres=# SELECT pg_filenode_relation(0,0); > ERROR: unexpected duplicate for tablespace 0, relfilenode 0 Ugh. > The usual expectation is that sql callable functions should return null rather > than hitting elog(). Agreed, but you should put the short-circuit into the SQL-callable function, ie pg_filenode_relation. Lower-level callers ought not be passing junk data. Likely it should check the reltablespace, too. regards, tom lane
Re: PG 14 release notes, first draft
| Add Set Server Name Indication (SNI) for SSL connection packets (Peter Eisentraut) Remove "Set" | Reduce the default value of vacuum_cost_page_miss from 10 milliseconds to 2 (Peter Geoghegan) Peter mentioned that this should not say "milliseconds" (but maybe the page I'm looking at is old). | Cause vacuum operations to be aggressive if the table is near xid or multixact wraparound (Masahiko Sawada, Peter Geoghegan) Say "become aggressive" ? | Allow the arbitrary collations of partition boundary values (Tom Lane) Remove "the" | Generate WAL invalidations message during command completion when using logical replication (Dilip Kumar, Tomas Vondra, Amit Kapila) invalidation messages | Add support for infinity and -infinity values to the numeric data type (Tom Lane) "-infinity" has markup but not "infinity" ? | Allow vacuum to deallocate space reserved by trailing unused heap line pointers (Matthias van de Meent, Peter Geoghegan) say "reclaim space" ?
Re: [PATCH] Fix buffer not null terminated on (ecpg lib)
Andres Freund writes: > On 2021-06-11 19:08:57 -0400, Tom Lane wrote: >> Anyway, I agree that disabling that was a bit of a stopgap hack. This >> 'nonstring' attribute seems like it would help for ECPG's usage, at >> least. > nonstring is supported since gcc 8, which also brought the warnings that > e71658523 is concerned about. Which makes me think that we should be > able to get away without a configure test. The one complication is that > the relevant ecpg code doesn't include c.h. Ugh. And we *can't* include that there. > But I think we can just do something like: > -charsqlstate[5]; > +charsqlstate[5] > +#if defined(__has_attribute) && __has_attribute(nonstring) > +__attribute__((nonstring)) > +#endif > +; > }; Hmm. Worth a try, anyway. > Should we also include a pg_attribute_nonstring definition in c.h? > Probably not, given that we right now don't have another user? Yeah, no point till there's another use-case. (I'm not sure there ever will be, so I'm not excited about adding more infrastructure than we have to.) regards, tom lane
Re: [PATCH] Fix buffer not null terminated on (ecpg lib)
Hi, On 2021-06-11 19:08:57 -0400, Tom Lane wrote: > Andres Freund writes: > > It might be worth doing something about this, for other reasons. We have > > disabled -Wstringop-truncation in 716585235b1. But I've enabled it in my > > debug build, because I find it useful. > > ITYM e71658523 ? I can't find that hash in my repo. Oops, yes. > Anyway, I agree that disabling that was a bit of a stopgap hack. This > 'nonstring' attribute seems like it would help for ECPG's usage, at > least. > > > I've not looked at how much work it'd be to make a recent-ish gcc not to > > produce lots of false positives in optimized builds. > > The discussion that led up to e71658523 seemed to conclude that the > only reasonable way to suppress the majority of those warnings was > to get rid of the fixed-length MAXPGPATH buffers we use everywhere. > Now that we have psprintf(), that might be more workable than before, > but the effort-to-reward ratio still doesn't seem promising. Hm - the MAXPGPATH stuff is about -Wno-format-truncation though, right? I now tried building with optimizations and -Wstringop-truncation, and while it does result in a higher number of warnings, those are all in ecpg and fixed with one __attribute__((nonstring)). nonstring is supported since gcc 8, which also brought the warnings that e71658523 is concerned about. Which makes me think that we should be able to get away without a configure test. The one complication is that the relevant ecpg code doesn't include c.h. But I think we can just do something like: diff --git i/src/interfaces/ecpg/include/sqlca.h w/src/interfaces/ecpg/include/sqlca.h index c5f107dd33c..d909f5ba2de 100644 --- i/src/interfaces/ecpg/include/sqlca.h +++ w/src/interfaces/ecpg/include/sqlca.h @@ -50,7 +50,11 @@ struct sqlca_t /* 6: empty */ /* 7: empty */ -charsqlstate[5]; +charsqlstate[5] +#if defined(__has_attribute) && __has_attribute(nonstring) +__attribute__((nonstring)) +#endif +; }; struct sqlca_t *ECPGget_sqlca(void); Not pretty, but I don't immediately see a really better solution? Should we also include a pg_attribute_nonstring definition in c.h? Probably not, given that we right now don't have another user? Greetings, Andres Freund
Re: An out-of-date comment in nodeIndexonlyscan.c
On Fri, Jun 28, 2019 at 12:55 PM Ashwin Agrawal wrote: > On Thu, Jun 27, 2019 at 4:33 PM Thomas Munro wrote: >> >> Here's a patch I'd like to commit to fix the comment. We could look >> at improving the actual code after >> https://commitfest.postgresql.org/23/2169/ is done. > > Change looks good to me. ... and here is the corresponding code change, with a test to demonstrate the change. I'm working on a proof-of-concept to skip the btree page lock sometimes too, which seems promising, but it requires some planner work which has temporarily pretzeled my brain. From 1039ec4ffbb9da15812a51f4eb9e8be32520fa63 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Thu, 10 Jun 2021 23:35:16 +1200 Subject: [PATCH] Use tuple-level SIREAD locks for index-only scans. When index-only scans manage to avoid fetching heap tuples, previously we'd predicate-lock the whole heap page (since commit cdf91edb). Commits c01262a8 and 6f38d4da made it possible to lock the tuple instead with only a TID, to match the behavior of regular index scans and avoid some false conflicts. Discussion: https://postgr.es/m/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com --- src/backend/executor/nodeIndexonlyscan.c | 12 -- .../isolation/expected/index-only-scan-2.out | 15 +++ .../isolation/expected/index-only-scan-3.out | 16 src/test/isolation/isolation_schedule | 2 + .../isolation/specs/index-only-scan-2.spec| 39 +++ .../isolation/specs/index-only-scan-3.spec| 33 6 files changed, 113 insertions(+), 4 deletions(-) create mode 100644 src/test/isolation/expected/index-only-scan-2.out create mode 100644 src/test/isolation/expected/index-only-scan-3.out create mode 100644 src/test/isolation/specs/index-only-scan-2.spec create mode 100644 src/test/isolation/specs/index-only-scan-3.spec diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c index 0754e28a9a..f7185b4519 100644 --- a/src/backend/executor/nodeIndexonlyscan.c +++ b/src/backend/executor/nodeIndexonlyscan.c @@ -243,12 +243,16 @@ IndexOnlyNext(IndexOnlyScanState *node) /* * If we didn't access the heap, then we'll need to take a predicate - * lock explicitly, as if we had. For now we do that at page level. + * lock explicitly, as if we had. We don't have the inserter's xid, + * but that's only used by PredicateLockTID to check if it matches the + * caller's top level xid, and it can't possibly have been inserted by + * us or the page wouldn't be all visible. */ if (!tuple_from_heap) - PredicateLockPage(scandesc->heapRelation, - ItemPointerGetBlockNumber(tid), - estate->es_snapshot); + PredicateLockTID(scandesc->heapRelation, + tid, + estate->es_snapshot, + InvalidTransactionId); return slot; } diff --git a/src/test/isolation/expected/index-only-scan-2.out b/src/test/isolation/expected/index-only-scan-2.out new file mode 100644 index 00..fef5b8d398 --- /dev/null +++ b/src/test/isolation/expected/index-only-scan-2.out @@ -0,0 +1,15 @@ +Parsed test spec with 2 sessions + +starting permutation: r1 r2 w1 w2 c1 c2 +step r1: SELECT id1 FROM account WHERE id1 = 1; +id1 + +1 +step r2: SELECT id2 FROM account WHERE id2 = 2; +id2 + +2 +step w1: UPDATE account SET balance = 200 WHERE id1 = 1; +step w2: UPDATE account SET balance = 200 WHERE id2 = 2; +step c1: COMMIT; +step c2: COMMIT; diff --git a/src/test/isolation/expected/index-only-scan-3.out b/src/test/isolation/expected/index-only-scan-3.out new file mode 100644 index 00..efef162779 --- /dev/null +++ b/src/test/isolation/expected/index-only-scan-3.out @@ -0,0 +1,16 @@ +Parsed test spec with 2 sessions + +starting permutation: r1 r2 w1 w2 c1 c2 +step r1: SELECT id1 FROM account WHERE id1 = 2; +id1 + +2 +step r2: SELECT id2 FROM account WHERE id2 = 1; +id2 + +1 +step w1: UPDATE account SET balance = 200 WHERE id1 = 1; +step w2: UPDATE account SET balance = 200 WHERE id2 = 2; +step c1: COMMIT; +step c2: COMMIT; +ERROR: could not serialize access due to read/write dependencies among transactions diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index f4c01006fc..570aedb900 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -17,6 +17,8 @@ test: partial-index test: two-ids test: multiple-row-versions test: index-only-scan +test: index-only-scan-2 +test: index-only-scan-3 test: predicate-lock-hot-tuple test: update-conflict-out test: deadlock-simple diff --git a/src/test/isolation/specs/index-only-scan-2.spec b/src/test/isolation/specs/index-only-scan-2.spec new file mode 100644 index 00..cd3c599af8 --- /dev/null +++ b/src/test/isolation/specs/index-only-scan-2.spec @@ -0,0 +1,39 @@ +# Test the
pg_filenode_relation(0,0) elog
Per sqlsmith. postgres=# SELECT pg_filenode_relation(0,0); ERROR: unexpected duplicate for tablespace 0, relfilenode 0 postgres=# \errverbose ERROR: XX000: unexpected duplicate for tablespace 0, relfilenode 0 LOCATION: RelidByRelfilenode, relfilenodemap.c:220 The usual expectation is that sql callable functions should return null rather than hitting elog(). This also means that sqlsmith has one fewer false-positive error. I think it should return NULL if passed invalid relfilenode, rather than searching pg_class and then writing a pretty scary message about duplicates. diff --git a/src/backend/utils/cache/relfilenodemap.c b/src/backend/utils/cache/relfilenodemap.c index 56d7c73d33..5a5cf853bd 100644 --- a/src/backend/utils/cache/relfilenodemap.c +++ b/src/backend/utils/cache/relfilenodemap.c @@ -146,6 +146,9 @@ RelidByRelfilenode(Oid reltablespace, Oid relfilenode) ScanKeyData skey[2]; Oid relid; + if (!OidIsValid(relfilenode)) + return InvalidOid; + if (RelfilenodeMapHash == NULL) InitializeRelfilenodeMap();
Signed vs. Unsigned (some)
Hi, Removing legitimate warnings can it be worth it? -1 CAST can be wrong, when there is an invalid value defined (InvalidBucket, InvalidBlockNumber). I think depending on the compiler -1 CAST may be different from InvalidBucket or InvalidBlockNumber. pg_rewind is one special case. All cases of XLogSegNo (uint64) initialization are zero, but in pg_rewind was used -1? I did not find it InvalidXLogSegNo! Not tested. Trivial patch attached. best regards, Ranier Vilela fix_signed_vs_unsigned.patch Description: Binary data
Re: A new function to wait for the backend exit after termination
On Sat, Jun 05, 2021 at 12:08:01PM -0700, Noah Misch wrote: > > > My preference is to remove pg_wait_for_backend_termination(). The use > > > case > > > that prompted this thread used pg_terminate_backend(pid, 18); it > > > doesn't > > > need pg_wait_for_backend_termination(). Is this an Opened Issue ? -- Justin
ProcArrayStruct->pgprocnos vs ->maxProcs vs PGPROC ordering
Hi, I just tried to add a, seemingly innocuous, assertion to ProcArrayAdd()/Remove() that proc->pgprocno is < arrayP->maxProcs. That quickly fails. The reason for that is that PGPROC are created in the following order 1) MaxBackends normal (*[1]) backends 2) NUM_AUXILIARY_PROCS auxiliary processes 3) max_prepared_xacts prepared transactions. and pgprocnos are assigned sequentially - they are needed to index into ProcGlobal->allProcs. In contrast to that procarray.c initializes maxProcs to #define PROCARRAY_MAXPROCS (MaxBackends + max_prepared_xacts) i.e. without the aux processes. Which means that some of the prepared transactions have pgprocnos that are bigger than ProcArrayStruct->maxProcs. Hence my assertion failure. This is obviously not a bug, but is quite hard to understand / confusing. I think I made a similar mistake before. I'd at least like to add a comment with a warning somewhere in ProcArrayStruct or such. An alternative approach would be to change the PGPROC order to instead be 1) aux, b) normal backends, 3) prepared xacts and give aux processes a negative or invalid pgprocno. One small advantage of that would be that we'd not need to "skip" over the "aux process hole" between normal and prepared xact PGPROCs in various procarray.c routines that iterate over procs. Greetings, Andres Freund [1] well, kinda. It's user backends followed by autovacuum worker, launcher, worker processes and wal senders.
Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)
On Fri, Jun 11, 2021 at 08:19:48PM -0500, Justin Pryzby wrote: > On Fri, Jun 11, 2021 at 09:12:55PM -0400, Bruce Momjian wrote: > > OK, I used some of your ideas and tried for something more general; > > patch attached. > > This is good. > > But I wonder if "dropped before upgrading" is too specific to pg_upgrade? > > Dropping the aggregate before starting a backup to be restored into a new > version seems like a bad way to do it. More likely, I would restore whatever > backup I had, get errors, and then eventually recreate the aggregates. I am actually unclear on that. Do people really restore a dump and just ignore errors, or somehow track them and go back and try to fix them. Isn't there a cascading effect if other things depend on it? How do they get the object definitions from a huge dump file? What process should we recommend? I have just never seen good documentation on how this handled. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)
On Fri, Jun 11, 2021 at 09:17:46PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > OK, I used some of your ideas and tried for something more general; > > patch attached. > > I think it's a good idea to mention custom aggregates and operators > specifically, as otherwise people will look at this and have little > idea what you're on about. I just want wording like "such as custom > aggregates and operators", in case somebody has done some other creative > thing that breaks. Agreed, updated patch attached. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion. diff --git a/doc/src/sgml/release-14.sgml b/doc/src/sgml/release-14.sgml index 058ba7cd4e..c2d8941206 100644 --- a/doc/src/sgml/release-14.sgml +++ b/doc/src/sgml/release-14.sgml @@ -291,6 +291,35 @@ Author: Tom Lane + + + User-defined objects that reference some built-in array functions + along with their argument types must be recreated (Tom Lane) + + + + Specifically, array_append(), + array_prepend(), + array_cat(), + array_position(), + array_positions(), + array_remove(), + array_replace(), or width_bucket() + used to take anyarray arguments but now take + anycompatiblearray. Therefore, user-defined objects + like aggregates and operators that reference old array function + signatures must be dropped before upgrading and recreated once the + upgrade completes. + + + + +
Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)
On Fri, Jun 11, 2021 at 09:12:55PM -0400, Bruce Momjian wrote: > OK, I used some of your ideas and tried for something more general; > patch attached. This is good. But I wonder if "dropped before upgrading" is too specific to pg_upgrade? Dropping the aggregate before starting a backup to be restored into a new version seems like a bad way to do it. More likely, I would restore whatever backup I had, get errors, and then eventually recreate the aggregates. > + > + User-defined objects that reference some built-in array functions > + along with their argument types must be recreated (Tom Lane) > + > + > + > + Specifically, + linkend="functions-array">array_append(), ... > + used to take anyarray arguments but now take > + anycompatiblearray. Therefore, user-defined objects > + that reference the old array function signature must be dropped > + before upgrading and recreated once the upgrade completes. > + > +
Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)
Bruce Momjian writes: > OK, I used some of your ideas and tried for something more general; > patch attached. I think it's a good idea to mention custom aggregates and operators specifically, as otherwise people will look at this and have little idea what you're on about. I just want wording like "such as custom aggregates and operators", in case somebody has done some other creative thing that breaks. regards, tom lane
Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)
On Fri, Jun 11, 2021 at 08:56:19PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > OK, I came up with the attached patch. This is one of the few cases > > where the incompatibility is not clearly related to the feature, so I > > left the existing item alone and just created a new one with the same > > commit message in the incompatibilities section. > > I think phrasing this as though user-defined aggregates are the only > pain point is incorrect. For example, a custom operator based on > array_cat would have issues too. > > I suggest a treatment more like > > Some built-in array-related functions changed signature (Tom Lane) > > Certain functions were redefined to take anycompatiblearray instead > of anyarray. While this does not affect ordinary calls, it does > affect code that directly names these functions along with their > argument types; for example, custom aggregates and operators based > on these functions. The affected functions are [ blah, blah ] OK, I used some of your ideas and tried for something more general; patch attached. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion. diff --git a/doc/src/sgml/release-14.sgml b/doc/src/sgml/release-14.sgml index 058ba7cd4e..925690 100644 --- a/doc/src/sgml/release-14.sgml +++ b/doc/src/sgml/release-14.sgml @@ -291,6 +291,34 @@ Author: Tom Lane + + + User-defined objects that reference some built-in array functions + along with their argument types must be recreated (Tom Lane) + + + + Specifically, array_append(), + array_prepend(), + array_cat(), + array_position(), + array_positions(), + array_remove(), + array_replace(), or width_bucket() + used to take anyarray arguments but now take + anycompatiblearray. Therefore, user-defined objects + that reference the old array function signature must be dropped + before upgrading and recreated once the upgrade completes. + + + + +
Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)
Bruce Momjian writes: > OK, I came up with the attached patch. This is one of the few cases > where the incompatibility is not clearly related to the feature, so I > left the existing item alone and just created a new one with the same > commit message in the incompatibilities section. I think phrasing this as though user-defined aggregates are the only pain point is incorrect. For example, a custom operator based on array_cat would have issues too. I suggest a treatment more like Some built-in array-related functions changed signature (Tom Lane) Certain functions were redefined to take anycompatiblearray instead of anyarray. While this does not affect ordinary calls, it does affect code that directly names these functions along with their argument types; for example, custom aggregates and operators based on these functions. The affected functions are [ blah, blah ] regards, tom lane
Re: cleanup temporary files after crash
On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote: > > > The current behavior is only useful for debugging purposes. On Wed, 28 Oct 2020 at 15:42, Tomas Vondra wrote: > > One thing I'm not sure about is whether we should have the GUC as > > proposed, or have a negative "keep_temp_files_after_restart" defaulting > > to false. But I don't have a very good justification for the alternative > > other than vague personal preference. On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote: > I thought about not providing a GUC at all or provide it in the developer > section. I've never heard someone saying that they use those temporary > files to investigate an issue. Regarding a crash, all information is already > available and temporary files don't provide extra details. This new GUC is > just to keep the > previous behavior. I'm fine without the GUC, though. Should this GUC be classified as a developer option, and removed from postgresql.sample.conf ? That was discussed initially in October but not since. On Wed, Oct 28, 2020 at 11:16:26AM -0300, Euler Taveira wrote: > It also has an undesirable behavior: you have to restart the service to > reclaim the space. BTW, that's not really true - you can remove the tempfiles while the server is running. The complainant in bug#16427 was being careful - which is good. I watch for and remove tempfiles older than 2 days. The worst consequence of removing a tempfile would be a failed query. -- Justin
Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)
On Tue, Jun 8, 2021 at 07:10:00PM -0500, Justin Pryzby wrote: > On Tue, Jun 08, 2021 at 08:02:46PM -0400, Bruce Momjian wrote: > > This involves creating an aggreate that _uses_ these array functions as > > their state transition function? > > Yes OK, I came up with the attached patch. This is one of the few cases where the incompatibility is not clearly related to the feature, so I left the existing item alone and just created a new one with the same commit message in the incompatibilities section. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion. diff --git a/doc/src/sgml/release-14.sgml b/doc/src/sgml/release-14.sgml index 21659bd184..426d502a63 100644 --- a/doc/src/sgml/release-14.sgml +++ b/doc/src/sgml/release-14.sgml @@ -291,6 +291,32 @@ Author: Tom Lane + + + User-defined aggregates that reference some built-in array functions + will need to be recreated (Tom Lane) + + + + Specifically, user-defined aggregates that use the functions array_append(), + array_prepend(), + array_cat(), + array_position(), + array_positions(), + array_remove(), + array_replace(), or width_bucket() + as their state transition function must be dropped before + upgrading and recreated once the upgrade is complete. + + + + +
Re: pg_regress.c also sensitive to various PG* environment variables
On Fri, Jun 11, 2021 at 10:08:20AM -0400, Alvaro Herrera wrote: > I think if they're to be kept in sync, then the exceptions should be > noted. I mean, where PGCLIENTENCODING would otherwise be, I'd add > /* PGCLIENTENCODING set above */ > /* See below for PGHOSTADDR */ > and so on (PGHOST and PGPORT probably don't need this because they're > immediately below; not sure; but I would put them in alphabetical order > in both lists for sure and then that wouldn't apply). Otherwise I would > think that it's an omission and would set to fix it. Good idea, thanks. I'll add comments for each one that cannot be unsetted. -- Michael signature.asc Description: PGP signature
Re: Table AM modifications to accept column projection lists
On Sat, 2021-06-05 at 09:47 -0700, Zhihong Yu wrote: > On Fri, Jun 4, 2021 at 4:14 PM Jacob Champion wrote: > > Agreed. I'm going to double-check with Deep that the new calls > > to table_tuple_fetch_row_version() should be projecting the full row, > > then post an updated patch some time next week. (The discussions over the fallout of the inheritance_planner fallout are still going, but in the meantime here's an updated v4 that builds and passes `make check`.) > + return > relation->rd_tableam->scan_begin_with_column_projection(relation, snapshot, > 0, NULL, > + parallel_scan, flags, proj); > > scan_begin_with_column_projection() adds a parameter to scan_begin(). > Can scan_begin() be enhanced with this projection parameter ? > Otherwise in the future we may have > scan_begin_with_column_projection_with_x_y ... Maybe; I agree that would match the current "extension" APIs a little better. I'll let Deep and/or Ashwin chime in on why this design was chosen. > + /* Make sure the the new slot is not dependent on the original tuple */ > > Double 'the' in the comment. More than one place with duplicate 'the' > in the patch. Fixed. > +typedef struct neededColumnContext > +{ > + Bitmapset **mask; > + int n; > > Should field n be named ncol ? 'n' seems too general. Agreed; changed to ncol. > +* TODO: Remove this hack!! This should be done once at the start of > the tid scan. > > Would the above be addressed in the next patch ? I have not had time to get to this in v4, sorry. > Toward the end of extract_scan_columns(): > > + bms_free(rte->scanCols); > + rte->scanCols = bms_make_singleton(0); > + break; > > Should 'goto outer;' be in place of 'break;' (since rte->scanCols has > been assigned for whole-row) ? Agreed and fixed. Thank you! --Jacob From 7ac00847f17e2a0448ae06370598db0f034fcc7c Mon Sep 17 00:00:00 2001 From: Jacob Champion Date: Tue, 2 Mar 2021 08:59:45 -0800 Subject: [PATCH v4] tableam: accept column projection list TODO: check the additional bms_make_singleton(0) calls in src/backend/executor/nodeModifyTable.c Co-authored-by: Soumyadeep Chakraborty Co-authored-by: Melanie Plageman Co-authored-by: Ashwin Agrawal Co-authored-by: Jacob Champion --- src/backend/access/heap/heapam_handler.c | 5 +- src/backend/access/nbtree/nbtsort.c | 3 +- src/backend/access/table/tableam.c | 5 +- src/backend/commands/trigger.c | 19 +++- src/backend/executor/execMain.c | 3 +- src/backend/executor/execPartition.c | 2 + src/backend/executor/execReplication.c | 6 +- src/backend/executor/execScan.c | 108 + src/backend/executor/nodeIndexscan.c | 10 ++ src/backend/executor/nodeLockRows.c | 7 +- src/backend/executor/nodeModifyTable.c | 47 +++-- src/backend/executor/nodeSeqscan.c | 43 - src/backend/executor/nodeTidscan.c | 11 ++- src/backend/nodes/copyfuncs.c| 2 + src/backend/nodes/equalfuncs.c | 2 + src/backend/nodes/outfuncs.c | 2 + src/backend/nodes/readfuncs.c| 2 + src/backend/optimizer/path/allpaths.c| 85 +++- src/backend/optimizer/prep/preptlist.c | 13 ++- src/backend/optimizer/util/inherit.c | 37 ++- src/backend/parser/analyze.c | 40 ++-- src/backend/parser/parse_relation.c | 18 src/backend/partitioning/partbounds.c| 14 ++- src/backend/rewrite/rewriteHandler.c | 8 ++ src/include/access/tableam.h | 118 +-- src/include/executor/executor.h | 6 ++ src/include/nodes/execnodes.h| 1 + src/include/nodes/parsenodes.h | 14 +++ src/include/utils/rel.h | 14 +++ 29 files changed, 598 insertions(+), 47 deletions(-) diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c index e2cd79ec54..e440ae4a69 100644 --- a/src/backend/access/heap/heapam_handler.c +++ b/src/backend/access/heap/heapam_handler.c @@ -182,7 +182,8 @@ static bool heapam_fetch_row_version(Relation relation, ItemPointer tid, Snapshot snapshot, - TupleTableSlot *slot) + TupleTableSlot *slot, + Bitmapset *project_cols) { BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot; Buffer buffer; @@ -350,7 +351,7 @@ static TM_Result heapam_tuple_lock(Relation relation, ItemPointer tid, Snapshot snapshot, TupleTableSlot *slot, CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy, uint8 flags, - TM_FailureData *tmfd) + TM_FailureData *tmfd, Bitmapset *project_cols) { BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot; TM_Result result; diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
Re: unnesting multirange data types
()On Sat, Jun 12, 2021 at 2:30 AM Justin Pryzby wrote: > On Fri, Jun 11, 2021 at 11:37:58PM +0300, Alexander Korotkov wrote: > > On Fri, Jun 11, 2021 at 1:04 AM Justin Pryzby wrote: > > > > > > +{ oid => '1293', descr => 'expand mutlirange to set of ranges', > > > > > > typo: mutlirange > > > > Fixed, thanks. > > > > The patch with the implementation of both unnest() and cast to array > > is attached. It contains both tests and docs. > > |+ The multirange could be explicitly cast to the array of corresponding > should say: "can be cast to an array of corresponding.." > > |+ * Cast multirange to the array of ranges. > I think should be: *an array of ranges Thank you for catching this. > Per sqlsmith, this is causing consistent crashes. > I took one of its less appalling queries and simplified it to this: > > select > pg_catalog.multirange_to_array( > cast(pg_catalog.int8multirange() as int8multirange)) as c2 > from (select 1)x; It seems that multirange_to_array() doesn't handle empty multiranges. I'll post an updated version of the patch tomorrow. -- Regards, Alexander Korotkov
Re: unnesting multirange data types
On Fri, Jun 11, 2021 at 11:37:58PM +0300, Alexander Korotkov wrote: > On Fri, Jun 11, 2021 at 1:04 AM Justin Pryzby wrote: > > > > +{ oid => '1293', descr => 'expand mutlirange to set of ranges', > > > > typo: mutlirange > > Fixed, thanks. > > The patch with the implementation of both unnest() and cast to array > is attached. It contains both tests and docs. |+ The multirange could be explicitly cast to the array of corresponding should say: "can be cast to an array of corresponding.." |+ * Cast multirange to the array of ranges. I think should be: *an array of ranges Per sqlsmith, this is causing consistent crashes. I took one of its less appalling queries and simplified it to this: select pg_catalog.multirange_to_array( cast(pg_catalog.int8multirange() as int8multirange)) as c2 from (select 1)x; -- Justin
Re: [PATCH] Fix buffer not null terminated on (ecpg lib)
Andres Freund writes: > It might be worth doing something about this, for other reasons. We have > disabled -Wstringop-truncation in 716585235b1. But I've enabled it in my > debug build, because I find it useful. ITYM e71658523 ? I can't find that hash in my repo. Anyway, I agree that disabling that was a bit of a stopgap hack. This 'nonstring' attribute seems like it would help for ECPG's usage, at least. > I've not looked at how much work it'd be to make a recent-ish gcc not to > produce lots of false positives in optimized builds. The discussion that led up to e71658523 seemed to conclude that the only reasonable way to suppress the majority of those warnings was to get rid of the fixed-length MAXPGPATH buffers we use everywhere. Now that we have psprintf(), that might be more workable than before, but the effort-to-reward ratio still doesn't seem promising. regards, tom lane
Re: [PATCH] Fix buffer not null terminated on (ecpg lib)
Hi, On 2020-04-23 14:36:15 +0900, Kyotaro Horiguchi wrote: > At Thu, 23 Apr 2020 01:21:21 -0300, Ranier Vilela wrote > in > > Em qua., 22 de abr. de 2020 às 23:27, Kyotaro Horiguchi < > > horikyota@gmail.com> escreveu: > > > > > > - strncpy(sqlca->sqlerrm.sqlerrmc, message, > > > sizeof(sqlca->sqlerrm.sqlerrmc)); > > > - sqlca->sqlerrm.sqlerrmc[sizeof(sqlca->sqlerrm.sqlerrmc) - 1] = 0; > > > + sqlca->sqlerrm.sqlerrmc[sizeof(sqlca->sqlerrm.sqlerrmc) - 1] = > > > '\0'; > > > + strncpy(sqlca->sqlerrm.sqlerrmc, message, > > > sizeof(sqlca->sqlerrm.sqlerrmc) - 1); > > > > > > The existing strncpy then terminating by NUL works fine. I don't think > > > there's any point in doing the reverse way. Actually > > > sizeof(sqlca->sqlerrm.sqlerrmc) - 1 is enough for the length but the > > > existing code is not necessarily a bug. > > > > > Without understanding then, why Coveriy claims bug here. > > Well, handling non-terminated strings with str* functions are a sign > of bug in most cases. Coverity is very useful but false positives are > annoying. I wonder what if we attach Coverity annotations to such > codes. It might be worth doing something about this, for other reasons. We have disabled -Wstringop-truncation in 716585235b1. But I've enabled it in my debug build, because I find it useful. The only warning we're getting in non-optimized builds is /home/andres/src/postgresql/src/interfaces/ecpg/ecpglib/misc.c: In function ‘ECPGset_var’: /home/andres/src/postgresql/src/interfaces/ecpg/ecpglib/misc.c:565:17: warning: ‘strncpy’ output truncated before terminating nul copying 5 bytes from a string of the same length [-Wstringop-truncation] 565 | strncpy(sqlca->sqlstate, "YE001", sizeof(sqlca->sqlstate)); | ^~ One way we could address this is to use the 'nonstring' attribute gcc has introduced, signalling that sqlca_t->sqlstate isn't zero terminated. That removes the above warning. https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#Common-Variable-Attributes "The nonstring variable attribute specifies that an object or member declaration with type array of char, signed char, or unsigned char, or pointer to such a type is intended to store character arrays that do not necessarily contain a terminating NUL. This is useful in detecting uses of such arrays or pointers with functions that expect NUL-terminated strings, and to avoid warnings when such an array or pointer is used as an argument to a bounded string manipulation function such as strncpy. For example, without the attribute, GCC will issue a warning for the strncpy call below because it may truncate the copy without appending the terminating NUL character. Using the attribute makes it possible to suppress the warning. However, when the array is declared with the attribute the call to strlen is diagnosed because when the array doesn’t contain a NUL-terminated string the call is undefined. To copy, compare, of search non-string character arrays use the memcpy, memcmp, memchr, and other functions that operate on arrays of bytes. In addition, calling strnlen and strndup with such arrays is safe provided a suitable bound is specified, and not diagnosed. " I've not looked at how much work it'd be to make a recent-ish gcc not to produce lots of false positives in optimized builds. Greetings, Andres Freund
Re: Fdw batch insert error out when set batch_size > 65535
Tomas Vondra writes: > There's one caveat, though - for regular builds the slowdown is pretty > much eliminated. But with valgrind it's still considerably slower. For > postgres_fdw the "make check" used to take ~5 minutes for me, now it > takes >1h. And yes, this is entirely due to the new test case which is > generating / inserting 70k rows. So maybe the test case is not worth it > after all, and we should get rid of it. I bet the CLOBBER_CACHE animals won't like it much either. I suggest what we do is leave it in place for long enough to get a round of reports from those slow animals, and then (assuming those reports are positive) drop the test. regards, tom lane
Re: Fdw batch insert error out when set batch_size > 65535
On 6/9/21 4:05 PM, Tomas Vondra wrote: > On 6/9/21 3:28 PM, Tom Lane wrote: >> Tomas Vondra writes: >>> Note that the problem here is [1] - we're creating a lot of slots >>> referencing the same tuple descriptor, which inflates the duration. >>> There's a fix in the other thread, which eliminates ~99% of the >>> overhead. I plan to push that fix soon (a day or two). >> >> Oh, okay, as long as there's a plan to bring the time back down. >> > > Yeah. Sorry for not mentioning this in the original message about the > new regression test. > I've pushed a fix addressing the performance issue. There's one caveat, though - for regular builds the slowdown is pretty much eliminated. But with valgrind it's still considerably slower. For postgres_fdw the "make check" used to take ~5 minutes for me, now it takes >1h. And yes, this is entirely due to the new test case which is generating / inserting 70k rows. So maybe the test case is not worth it after all, and we should get rid of it. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: recovery test failures on hoverfly
I wrote: > Michael Paquier writes: >> This is the same problem as c757a3da and 6d41dd0, where we write a >> query to a pipe but the kill, causing a failure, makes the test fail >> with a SIGPIPE in IPC::Run as a query is sent down to a pipe. > The precedent of the previous fixes would seem to suggest seeing if > we can replace 'SELECT 1' with "undef". Not sure if that'll work > without annoying changes to poll_query_until, though. I noticed that elver failed this same way today, so that got me annoyed enough to pursue a fix. Using "undef" as poll_query_until's input almost works, except it turns out that it fails to notice psql connection failures in that case! It is *only* looking at psql's stdout, not at either stderr or the exit status, which seems seriously bogus in its own right; not least because poll_query_until's own documentation claims it will continue waiting after an error, which is exactly what it's not doing. So I propose the attached. (I first tried to make it check $result == 0, but it seems there are a lot of cases where psql returns status 1 in these tests. That seems pretty bogus too, but probably beta is no time to change that behavior.) regards, tom lane diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 45d1636128..2027cbf43d 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -2140,8 +2140,10 @@ sub poll_query_until $stdout =~ s/\r\n/\n/g if $Config{osname} eq 'msys'; chomp($stdout); + $stderr =~ s/\r\n/\n/g if $Config{osname} eq 'msys'; + chomp($stderr); - if ($stdout eq $expected) + if ($stdout eq $expected && $stderr eq '') { return 1; } @@ -2154,8 +2156,6 @@ sub poll_query_until # The query result didn't change in 180 seconds. Give up. Print the # output from the last attempt, hopefully that's useful for debugging. - $stderr =~ s/\r\n/\n/g if $Config{osname} eq 'msys'; - chomp($stderr); diag qq(poll_query_until timed out executing this query: $query expecting this output: diff --git a/src/test/recovery/t/013_crash_restart.pl b/src/test/recovery/t/013_crash_restart.pl index e1c36abe97..868a50b33d 100644 --- a/src/test/recovery/t/013_crash_restart.pl +++ b/src/test/recovery/t/013_crash_restart.pl @@ -136,12 +136,8 @@ ok( pump_until( $monitor->finish; # Wait till server restarts -is( $node->poll_query_until( - 'postgres', - 'SELECT $$restarted after sigquit$$;', - 'restarted after sigquit'), - "1", - "reconnected after SIGQUIT"); +is($node->poll_query_until('postgres', undef, ''), + "1", "reconnected after SIGQUIT"); # restart psql processes, now that the crash cycle finished @@ -216,7 +212,7 @@ ok( pump_until( $monitor->finish; # Wait till server restarts -is($node->poll_query_until('postgres', 'SELECT 1', '1'), +is($node->poll_query_until('postgres', undef, ''), "1", "reconnected after SIGKILL"); # Make sure the committed rows survived, in-progress ones not diff --git a/src/test/recovery/t/022_crash_temp_files.pl b/src/test/recovery/t/022_crash_temp_files.pl index b912f4b232..157ddba8cf 100644 --- a/src/test/recovery/t/022_crash_temp_files.pl +++ b/src/test/recovery/t/022_crash_temp_files.pl @@ -139,7 +139,7 @@ $killme->finish; $killme2->finish; # Wait till server restarts -$node->poll_query_until('postgres', 'SELECT 1', '1'); +$node->poll_query_until('postgres', undef, ''); # Check for temporary files is( $node->safe_psql( @@ -228,7 +228,7 @@ $killme->finish; $killme2->finish; # Wait till server restarts -$node->poll_query_until('postgres', 'SELECT 1', '1'); +$node->poll_query_until('postgres', undef, ''); # Check for temporary files -- should be there is( $node->safe_psql(
Re: Teaching users how they can get the most out of HOT in Postgres 14
On Sun, May 30, 2021 at 6:30 PM Masahiko Sawada wrote: > > Another concern with this approach is what it > > means for the VACUUM command itself. I haven't added an 'auto' > > spelling that is accepted by the VACUUM command in this POC version. > > But do I need to at all? Can that just be implied by not having any > > INDEX_CLEANUP option? > > It seems to me that it's better to have INDEX_CLEANUP option of VACUUM > command support AUTO for consistency. Do you have any concerns about > supporting it? I suppose we should have it. But now we have to teach vacuumdb about this new boolean-like enum too. It's a lot more new code than I would have preferred, but I suppose that it makes sense. -- Peter Geoghegan
Re: Replication protocol doc fix
On Fri, 2021-06-11 at 16:57 -0400, Robert Haas wrote: > My impression was that CopyBoth can be initiated either way, The docs say: "In either physical replication or logical replication walsender mode, only the simple query protocol can be used." Is there some way to initiate CopyBoth outside of walsender? > but if > you use the extended query protocol, then the result is a hopeless > mess, because the protocol is badly designed: > > https://www.postgresql.org/message-id/ca+tgmoa4ea+cpxaigqmebp9xisvd3ze9dbvnbzevx9ucmiw...@mail.gmail.com It seems like you're saying that CopyIn and CopyBoth are both equally broken in extended query mode. Is that right? > But I think you're correct in saying that the discard-until-Sync > behavior only happens if the extended query protocol is used, so I > agree that the current text is wrong. Should we just document how CopyBoth works with the simple query protocol, or should we make it match the CopyIn docs? Regards, Jeff Davis
Re: Teaching users how they can get the most out of HOT in Postgres 14
On Thu, Jun 3, 2021 at 11:15 PM Michael Paquier wrote: > I have read through the patch, and I am surprised to see that this > only makes possible to control the optimization at relation level. > The origin of the complaints is that this index cleanup optimization > has been introduced as a new rule that gets enforced at *system* > level, so I think that we should have an equivalent with a GUC to > control the behavior for the whole system. *Why* does it have to work at the system level? I don't understand what you mean about the system level. As Masahiko pointed out, adding a GUC isn't what we've done in other similar cases -- that's how DISABLE_PAGE_SKIPPING works, which was a defensive option that seems similar enough to what we want to add now. To give another example, the TRUNCATE VACUUM option (or the related reloption) can be used to disable relation truncation, a behavior that sometimes causes *big* issues in production. The truncate behavior is determined dynamically in most situations -- which is another similarity to the optimization we've added here. Why is this fundamentally different to those two things? > With what you are > presenting here, one could only disable the optimization for each > relation, one-by-one. If this optimization proves to be a problem, > it's just going to be harder to users to go through all the relations > and re-tune autovacuum. Am I missing something? Why would you expect autovacuum to run even when the optimization is unavailable (e.g. with Postgres 13)? After all, the specifics of when the bypass optimization kicks in make it very unlikely that ANALYZE will ever be able to notice enough dead tuples to trigger an autovacuum (barring antiwraparound and insert-driven autovacuums). There will probably be very few LP_DEAD items remaining. Occasionally there will be somewhat more LP_DEAD items, that happen to be concentrated in less than 2% of the table's blocks -- but block-based sampling by ANALYZE is likely to miss most of them and underestimate the total number. The sampling approach taken by acquire_sample_rows() ensures this with larger tables. With small tables the chances of the optimization kicking in are very low, unless perhaps fillfactor has been tuned very aggressively. There has never been a guarantee that autovacuum will be triggered (and do index vacuuming) in cases that have very few LP_DEAD items, no matter how the system has been tuned. The main reason why making the optimization behavior controllable is for the VACUUM command. Principally for hackers. I can imagine myself using the VACUUM option to disable the optimization when I was interested in testing VACUUM or space utilization in some specific, narrow way. Of course it's true that there is still some uncertainty about the optimization harming production workloads -- that is to be expected with an enhancement like this one. But there is still no actual example or test case that shows the optimization doing the wrong thing, or anything like it. Anything is possible, but I am not expecting there to be even one user complaint about the feature. Naturally I don't want to add something as heavyweight as a GUC, given all that. -- Peter Geoghegan
Re: pg_stat_progress_create_index vs. parallel index builds
On 2021-Jun-04, Greg Nancarrow wrote: > On Thu, Jun 3, 2021 at 1:49 AM Matthias van de Meent > wrote: > > > > On Wed, 2 Jun 2021 at 17:42, Tomas Vondra > > wrote: > > > > > > Nice. I gave it a try on the database I'm experimenting with, and it > > > seems to be working fine. Please add it to the next CF. > > > > Thanks, cf available here: https://commitfest.postgresql.org/33/3149/ > > The patch looks OK to me. It seems apparent that the lines added by > the patch are missing from the current source in the parallel case. > > I tested with and without the patch, using the latest PG14 source as > of today, and can confirm that without the patch applied, the "sorting > live tuples" phase is not reported in the parallel-case, but with the > patch applied it then does get reported in that case. I also confirmed > that, as you said, the patch only addresses the usual case where the > parallel leader participates in the parallel operation. So, with Matthias' patch applied and some instrumentation to log (some) parameter updates, this is what I get on building an index in parallel. The "subphase" is parameter 10: 2021-06-09 17:04:30.692 -04 19194 WARNING: updating param 0 to 1 2021-06-09 17:04:30.692 -04 19194 WARNING: updating param 6 to 0 2021-06-09 17:04:30.692 -04 19194 WARNING: updating param 8 to 403 2021-06-09 17:04:30.696 -04 19194 WARNING: updating param 9 to 2 2021-06-09 17:04:30.696 -04 19194 WARNING: updating param 10 to 1 2021-06-09 17:04:30.696 -04 19194 WARNING: updating param 11 to 0 2021-06-09 17:04:30.696 -04 19194 WARNING: updating param 15 to 0 2021-06-09 17:04:30.696 -04 19194 WARNING: updating param 10 to 2 2021-06-09 17:04:30.696 -04 19194 WARNING: updating param 15 to 486726 2021-06-09 17:04:37.418 -04 19194 WARNING: updating param 10 to 3 <-- this one is new 2021-06-09 17:04:42.215 -04 19194 WARNING: updating param 11 to 11000 2021-06-09 17:04:42.215 -04 19194 WARNING: updating param 15 to 0 2021-06-09 17:04:42.215 -04 19194 WARNING: updating param 10 to 3 2021-06-09 17:04:42.237 -04 19194 WARNING: updating param 10 to 5 The thing to note is that we set subphase to 3 twice. The first of those is added by the patch to _bt_parallel_scan_and_sort. The second is in _bt_leafbuild, just before setting the subphase to LEAF_LOAD. So the change is that we set the subphase to "sorting live tuples" five seconds ahead of what we were doing previously. Seems ok. (We could alternatively skip the progress update call in _bt_leafbuild; but those calls are so cheap that adding a conditional jump is almost as expensive.) (The other potential problem might be to pointlessly invoke the progress update calls when in a worker. But that's already covered because only the leader passes progress=true to _bt_parallel_scan_and_sort.) I'll push now. -- Álvaro Herrera Valdivia, Chile
Re: postgres_fdw batching vs. (re)creating the tuple slots
On 6/9/21 1:08 PM, Tomas Vondra wrote: > > > On 6/9/21 12:50 PM, Bharath Rupireddy wrote: >> On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra >> wrote: >>> >>> Hi, >>> >>> Here's a v2 fixing a silly bug with reusing the same variable in two >>> nested loops (worked for simple postgres_fdw cases, but "make check" >>> failed). >> >> I applied these patches and ran make check in postgres_fdw contrib >> module, I saw a server crash. Is it the same failure you were saying >> above? >> > > Nope, that was causing infinite loop. This is jut a silly mistake on my > side - I forgot to replace the i/j variable inside the loop. Here's v3. > > regards > FWIW I've pushed this, after improving the comments a little bit. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Replication protocol doc fix
On Thu, Jun 10, 2021 at 9:26 PM Jeff Davis wrote: > The docs currently say (introduced in commit 91fa853): > > "In the event of a backend-detected error during copy-both mode, the > backend will issue an ErrorResponse message, discard frontend messages > until a Sync message is received, and then issue ReadyForQuery and > return to normal processing." > > But that doesn't seem to be correct: Sync is only used for the extended > query protocol, and CopyBoth can only be initiated with the simple > query protocol. My impression was that CopyBoth can be initiated either way, but if you use the extended query protocol, then the result is a hopeless mess, because the protocol is badly designed: https://www.postgresql.org/message-id/ca+tgmoa4ea+cpxaigqmebp9xisvd3ze9dbvnbzevx9ucmiw...@mail.gmail.com But I think you're correct in saying that the discard-until-Sync behavior only happens if the extended query protocol is used, so I agree that the current text is wrong. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Question about StartLogicalReplication() error path
On 2021-06-11 12:07:24 -0700, Jeff Davis wrote: > On Fri, 2021-06-11 at 11:56 -0700, Andres Freund wrote: > > That doesn't work as easily in older versions because there was no > > SQL > > support in replication connections until PG 10... > > 9.6 will be EOL this year. I don't really see why such old versions are > relevant to this discussion. It's relevant to understand how we ended up here.
Re: Question about StartLogicalReplication() error path
Hi, On 2021-06-11 16:05:10 -0400, Robert Haas wrote: > You seem to see this as some kind of major problem and I guess I don't > agree. I think it's pretty clear what the motivation was for the > current behavior, because I believe it's well-explained by the comment > and the three people who have tried to answer your question. I also > think it's pretty clear why somebody might find it surprising: someone > might think that fast-forwarding is harmful and risky rather than a > useful convenience. As evidence for the fact that someone might think > that, I offer the fact that you seem to think exactly that thing. I > also think that there's pretty good evidence that the behavior as it > exists is not really so bad. As far as I know, and I certainly might > have missed something, you're the first one to complain about behavior > that we've had for quite a long time now, and you seem to be saying > that it might cause problems for somebody, not that you know it > actually did. So, I don't know, I'm not opposed to talking about > potential improvements here, but to the extent that you're suggesting > this is unreasonable behavior, I think that's too harsh. Yea. I think it'd be a different matter if streaming logical decoding had been added this cycle and we'd started out with supporting queries over replication connection - but it's been long enough that it's likely that people rely on the current behaviour, and I don't see the gain in reliability outweigh the compat issues. Your argument that one can just check kinda goes both ways - you can do that with the current behaviour too... Greetings, Andres Freund
Re: Race condition in InvalidateObsoleteReplicationSlots()
On 2021-06-11 15:52:21 -0400, Álvaro Herrera wrote: > On 2021-Apr-07, Andres Freund wrote: > > > After this I don't see a reason to have SAB_Inquire - as far as I can > > tell it's practically impossible to use without race conditions? Except > > for raising an error - which is "builtin"... > > Pushed 0002. > > Thanks! Thank you for your work on this!
Re: unnesting multirange data types
On Fri, Jun 11, 2021 at 1:04 AM Justin Pryzby wrote: > > +{ oid => '1293', descr => 'expand mutlirange to set of ranges', > > typo: mutlirange Fixed, thanks. The patch with the implementation of both unnest() and cast to array is attached. It contains both tests and docs. -- Regards, Alexander Korotkov multirange_unnest_cast_to_array.patch Description: Binary data
Re: Character expansion with ICU collations
On 11.06.21 22:05, Finnerty, Jim wrote: You can have these queries return both rows if you use an accent-ignoring collation, like this example in the documentation: CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false); << Indeed. Is the dependency between the character expansion capability and accent-insensitive collations documented anywhere? The above is merely a consequence of what the default collation elements for 'ß' are. Expansion isn't really a relevant concept in collation. Any character can map to 1..N collation elements. The collation algorithm doesn't care how many it is. Can a CI collation be ordered upper case first, or is this a limitation of ICU? I don't know the authoritative answer to that, but to me it doesn't make sense, since the effect of a case-insensitive collation is to throw away the third-level weights, so there is nothing left for "upper case first" to operate on. More generally, is there any interest in leveraging the full power of ICU tailoring rules to get whatever order someone may need, subject to the limitations of ICU itself? what would be required to extend CREATE COLLATION to accept an optional sequence of tailoring rules that we would store in the pg_collation catalog and apply along with the modifiers in the locale string? yes
Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options
On Tue, May 25, 2021 at 9:38 AM Alvaro Herrera wrote: > This should be okay, right? Well, almost. The problem here is if you > want to have a variable where you set more than one option, you have to > use bit-and of the enum values ... and the resulting value is no longer > part of the enum. A compiler would be understandably upset if you try > to pass that value in a variable of the enum datatype. Yes. I dislike this style for precisely this reason. I may, however, be in the minority. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [bug?] Missed parallel safety checks, and wrong parallel safety
On Fri, Jun 11, 2021 at 12:13 AM Amit Kapila wrote: > Do we invalidate relcache entry if someone changes say trigger or some > index AM function property via Alter Function (in our case from safe > to unsafe or vice-versa)? Tsunakawa-San has mentioned this as the > reason in his email [1] why we can't rely on caching this property in > relcache entry. I also don't see anything in AlterFunction which would > suggest that we invalidate the relation with which the function might > be associated via trigger. Hmm. I am not sure index that AM functions really need to be checked, but triggers certainly do. I think if you are correct that an ALTER FUNCTION wouldn't invalidate the relcache entry, which is I guess pretty much the same problem Tom was pointing out in the thread to which you linked. But ... thinking out of the box as Tom suggests, what if we came up with some new kind of invalidation message that is only sent when a function's parallel-safety marking is changed? And every backend in the same database then needs to re-evaluate the parallel-safety of every relation for which it has cached a value. Such recomputations might be expensive, but they would probably also occur very infrequently. And you might even be able to make it a bit more fine-grained if it's worth the effort to worry about that: say that in addition to caching the parallel-safety of the relation, we also cache a list of the pg_proc OIDs upon which that determination depends. Then when we hear that the flag's been changed for OID 123456, we only need to invalidate the cached value for relations that depended on that pg_proc entry. There are ways that a relation could become parallel-unsafe without changing the parallel-safety marking of any function, but perhaps all of the other ways involve a relcache invalidation? Just brainstorming here. I might be off-track. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add PortalDrop in exec_execute_message
On 2021-Jun-09, Tom Lane wrote: > BTW, to be clear: I think Alvaro's change is also necessary. > If libpq is going to silently do something different in pipeline > mode than it does in normal mode, it should strive to minimize > the semantic difference. exec_simple_query includes a PortalDrop, > so we'd best do the same in pipeline mode. Pushed that patch, thanks. -- Álvaro Herrera Valdivia, Chile "At least to kernel hackers, who really are human, despite occasional rumors to the contrary" (LWN.net)
Re: doc issue missing type name "multirange" in chapter title
Alexander Korotkov writes: > What about "Range/Multirange Functions and Operators"? Better than a comma, I guess. Personally I didn't have a problem with the form with two "ands". regards, tom lane
Re: doc issue missing type name "multirange" in chapter title
On Fri, Jun 11, 2021 at 3:39 AM Tom Lane wrote: > Justin Pryzby writes: > > If it's confusing to say "and" twice, then perhaps you'd say: > > Functions and Operators for Range and Multirange Types > > Uh, that's still two "and"s. In any case, I think it's better > to keep this section heading aligned with all of its siblings. What about "Range/Multirange Functions and Operators"? -- Regards, Alexander Korotkov
Re: Character expansion with ICU collations
>> You can have these queries return both rows if you use an accent-ignoring collation, like this example in the documentation: CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false); << Indeed. Is the dependency between the character expansion capability and accent-insensitive collations documented anywhere? Another unexpected dependency appears to be @colCaseFirst=upper. If specified in combination with colStrength=secondary, it appears that the upper/lower case ordering is random within a group of characters that are secondary equal, e.g. 'A' < 'a', but 'b' < 'B', 'c' < 'C', ... , but then 'L' < 'l'. It is not even consistently ordered with respect to case. If I make it a nondeterministic CS_AI collation, then it sorts upper before lower consistently. The rule seems to be that you can't sort by case within a group that is case-insensitive. Can a CI collation be ordered upper case first, or is this a limitation of ICU? For example, this is part of the sort order that I'd like to achieve with ICU, with the code point in column 1 and dense_rank() shown in the rightmost column indicating that 'b' = 'B', for example: 66 B B 138 151 98 b b 138 151 <- so within a group that is CI_AS equal, the sort order needs to be upper case first 67 C C 139 152 99 c c 139 152 199 Ç Ç 140 153 231 ç ç 140 153 68 D D 141 154 100 d d 141 154 208 Ð Ð 142 199 240 ð ð 142 199 69 E E 143 155 101 e e 143 155 Can this sort order be achieved with ICU? More generally, is there any interest in leveraging the full power of ICU tailoring rules to get whatever order someone may need, subject to the limitations of ICU itself? what would be required to extend CREATE COLLATION to accept an optional sequence of tailoring rules that we would store in the pg_collation catalog and apply along with the modifiers in the locale string? /Jim
Re: Question about StartLogicalReplication() error path
On Fri, Jun 11, 2021 at 2:49 PM Jeff Davis wrote: > It doesn't add any overhead. > > All the client would have to do is "SELECT confirmed_flush_lsn FROM > pg_replication_slots WHERE slot_name='myslot'", and compare it to the > stored value. If the stored value is earlier than the > confirmed_flush_lsn, the *client* can decide to start replication at > the confirmed_flush_lsn. That makes sense because the client knows more > about its behavior and configuration, and whether that's a safe choice > or not. True, but it doesn't seem very nice to forcethe client depend on SQL when that wouldn't otherwise be needed. The SQL is a lot more likely to fail than a replication command, for example due to some permissions issue. So I think if we want to make this an optional behavior, it would be better to add a flag to the START_REPLICATION flag to say whether it's OK for the server to fast-forward like this. You seem to see this as some kind of major problem and I guess I don't agree. I think it's pretty clear what the motivation was for the current behavior, because I believe it's well-explained by the comment and the three people who have tried to answer your question. I also think it's pretty clear why somebody might find it surprising: someone might think that fast-forwarding is harmful and risky rather than a useful convenience. As evidence for the fact that someone might think that, I offer the fact that you seem to think exactly that thing. I also think that there's pretty good evidence that the behavior as it exists is not really so bad. As far as I know, and I certainly might have missed something, you're the first one to complain about behavior that we've had for quite a long time now, and you seem to be saying that it might cause problems for somebody, not that you know it actually did. So, I don't know, I'm not opposed to talking about potential improvements here, but to the extent that you're suggesting this is unreasonable behavior, I think that's too harsh. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Race condition in InvalidateObsoleteReplicationSlots()
On 2021-Apr-07, Andres Freund wrote: > After this I don't see a reason to have SAB_Inquire - as far as I can > tell it's practically impossible to use without race conditions? Except > for raising an error - which is "builtin"... Pushed 0002. Thanks! -- Álvaro Herrera39°49'30"S 73°17'W "La persona que no quería pecar / estaba obligada a sentarse en duras y empinadas sillas/ desprovistas, por cierto de blandos atenuantes" (Patricio Vogel)
Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command
On Fri, 2021-06-11 at 15:43 +0530, Amit Kapila wrote: > The new patches look mostly good apart from the below cosmetic > issues. > I think the question is whether we want to do these for PG-14 or > postpone them till PG-15. I think these don't appear to be risky > changes so we can get them in PG-14 as that might help some outside > core solutions as appears to be the case for Jeff. My main interest here is that I'm working on replication protocol support in a rust library. While doing that, I've encountered a lot of rough edges (as you may have seen in my recent posts), and this patch fixes one of them. But at the same time, several small changes to the protocol spread across several releases introduces more opportunity for confusion. If we are confident this is the right direction, then v14 makes sense for consistency. But if waiting for v15 might result in a better change, then we should probably just get it into the July CF for v15. (My apologies if my opinion has drifted a bit since this thread began.) Regards, Jeff Davis
Re: automatically generating node support functions
Hi, On 2021-06-08 19:45:58 +0200, Peter Eisentraut wrote: > On 08.06.21 15:40, David Rowley wrote: > > It's almost 2 years ago now, but I'm wondering if you saw what Andres > > proposed in [1]? The idea was basically to make a metadata array of > > the node structs so that, instead of having to output large amounts of > > .c code to do read/write/copy/equals, instead just have small > > functions that loop over the elements in the array for the given > > struct and perform the required operation based on the type. > > That project was technologically impressive, but it seemed to have > significant hurdles to overcome before it can be useful. My proposal is > usable and useful today. And it doesn't prevent anyone from working on a > more sophisticated solution. I think it's short-sighted to further and further go down the path of parsing "kind of C" without just using a proper C parser. But leaving that aside, a big part of the promise of the approach in that thread isn't actually tied to the specific way the type information is collected: The perl script could output something like the "node type metadata" I generated in that patchset, and then we don't need the large amount of generated code and can much more economically add additional operations handling node types. Greetings, Andres Freund
Re: [Proposal] Add accumulated statistics for wait event
HHi, On 2021-06-05 00:53:44 +0200, Jehan-Guillaume de Rorthais wrote: > From 88c2779679c5c9625ca5348eec0543daab5ccab4 Mon Sep 17 00:00:00 2001 > From: Jehan-Guillaume de Rorthais > Date: Tue, 1 Jun 2021 13:25:57 +0200 > Subject: [PATCH 1/2] Add pg_stat_waitaccum view. > > pg_stat_waitaccum shows counts and duration of each wait events. > Each backend/backgrounds counts and measures the time of wait event > in every pgstat_report_wait_start and pgstat_report_wait_end. They > store those info into their local variables and send to Statistics > Collector. We can get those info via Statistics Collector. > > For reducing overhead, I implemented statistic hash instead of > dynamic hash. I also implemented track_wait_timing which > determines wait event duration is collected or not. I object to adding this overhead. The whole selling point for wait events was that they are low overhead. I since spent time reducing the overhead further, because even just the branches for checking if track_activity is enabled are measurable (225a22b19ed). > From ddb1adc5cd9acc9bc9de16d0cf057124b09fe1e3 Mon Sep 17 00:00:00 2001 > From: Jehan-Guillaume de Rorthais > Date: Fri, 4 Jun 2021 18:14:51 +0200 > Subject: [PATCH 2/2] [POC] Change measuring method of wait event time from > INSTR_TIME to rdtsc. > > This patch changes measuring method of wait event time from INSTR_TIME (which > uses gettimeofday or clock_gettime) to rdtsc. This might reduce the overhead > of measuring overhead. > > Any supports like changing clock cycle to actual time or error handling are > not currently implemented. rdtsc is a serializing (*) instruction - that's the expensive part. On linux clock_gettime() doesn't actually need a syscall. While the vdso call implies a bit of overhead over a raw rdtsc, it's a relatively small part of the overhead. See https://www.postgresql.org/message-id/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de Greetings, Andres Freund (*) it's not fully serializing, iirc it allows later instructions to be started, but it does wait for all earlier in-flight instructions to finish.
Re: RFC: Table access methods and scans
Hi, On 2021-06-03 17:52:24 -0700, Jeff Davis wrote: > I agree that would be very conventient for non-heap AMs. There's a very > old commit[3] that says: > > + /* > +* Note that unlike IndexScan, SeqScan never use keys > +* in heap_beginscan (and this is very bad) - so, here > +* we have not check are keys ok or not. > +*/ > > and that language has just been carried forward for decades. I wonder > if there's any major reason this hasn't been done yet. Does it just not > improve performance for a heap, or is there some other reason? It's not actually a good idea in general: - Without substantial refactoring more work is done while holding the content lock on the page. Whereas doing it as part of a seqscan only requires a buffer pin (and thus allows for concurrent writes to the same page) - It's hard to avoid repeated work with expressions that can't fully be evaluated as part of the ScanKey. Expression evaluation generally can be a bit smarter about evaluation, e.g. not deforming the tuple one-by-one. Greetings, Andres Freund
Re: Question about StartLogicalReplication() error path
On Fri, 2021-06-11 at 11:56 -0700, Andres Freund wrote: > That doesn't work as easily in older versions because there was no > SQL > support in replication connections until PG 10... 9.6 will be EOL this year. I don't really see why such old versions are relevant to this discussion. Regards, Jeff Davis
Re: Question about StartLogicalReplication() error path
On 2021-06-11 11:49:19 -0700, Jeff Davis wrote: > All the client would have to do is "SELECT confirmed_flush_lsn FROM > pg_replication_slots WHERE slot_name='myslot'", and compare it to the > stored value. That doesn't work as easily in older versions because there was no SQL support in replication connections until PG 10...
Re: Question about StartLogicalReplication() error path
On Fri, 2021-06-11 at 10:40 -0700, Andres Freund wrote: > Especially because it very well might break existing working > setups... Please address my concerns[1]. Regards, Jeff Davis [1] https://www.postgresql.org/message-id/e22a4606333ce1032e29fe2fb1aa9036e6f0ca98.camel%40j-davis.com
Re: Question about StartLogicalReplication() error path
On Fri, 2021-06-11 at 13:15 -0400, Robert Haas wrote: > on it, but it seems to me that the comment does explain this, and > that > it's basically the same explanation as what Amit said. It only addresses the "pro" side of the behavior, not the "con". It's a bit like saying "Given that we are in the U.S., it might seem like we should be driving on the right side of the road, but that side has traffic and we are in a hurry." Why might it seem that we should error out? If we don't error out, what bad things might happen? How do these "con"s weigh against the "pro"s? > I'm not sure that it would be a good > trade-off to have a tighter sanity check at the expense of adding > that > overhead. It doesn't add any overhead. All the client would have to do is "SELECT confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name='myslot'", and compare it to the stored value. If the stored value is earlier than the confirmed_flush_lsn, the *client* can decide to start replication at the confirmed_flush_lsn. That makes sense because the client knows more about its behavior and configuration, and whether that's a safe choice or not. The only difference is whether the server is safe-by-default with intuitive semantics that match the documentation, or unsafe-by-default with unexpected semantics that don't match the documentation. Regards, Jeff Davis
Re: Question about StartLogicalReplication() error path
Hi, On 2021-06-11 13:15:11 -0400, Robert Haas wrote: > On Fri, Jun 11, 2021 at 2:23 AM Jeff Davis wrote: > > * The comment acknowledges that a user might expect an error in that > > case; but doesn't really address why the user would expect an error, > > and why it's OK to violate that expectation. > > This code was written by Andres, so he'd be the best person to comment > on it, but it seems to me that the comment does explain this, and that > it's basically the same explanation as what Amit said. If the client > doesn't have to do anything for a certain range of WAL and just > acknowledges it, it would under your proposal have to also durably > record that it had chosen to do nothing, which might cause extra > fsyncs, potentially lots of extra fsyncs if this happens frequently > e.g. because most tables are filtered out and the replicated ones are > only modified occasionally. Yes, that's the motivation. > I'm not sure that it would be a good trade-off to have a tighter > sanity check at the expense of adding that overhead. Especially because it very well might break existing working setups... Greetings, Andres Freund
Re: Multi-Column List Partitioning
On Thu, Jun 10, 2021 at 8:38 PM Amit Langote wrote: > Hi Nitin, > > On Thu, Jun 3, 2021 at 11:45 PM Nitin Jadhav > wrote: > > > I'll wait for you to post a new patch addressing at least the comments > > > in my earlier email. Also, please make sure to run `make check` > > > successfully before posting the patch. :) > > > > I have fixed all of the review comments given by you and Jeevan in the > > attached patch and also the attached patch contains more changes > > compared to the previous patch. Following are the implementation > > details. > > Thanks for the updated version. > > > 1. Regarding syntax, the existing syntax will work fine for the > > single-column list partitioning. However I have used the new syntax > > for the multi-column list partitioning as we discussed earlier. I have > > used a combination of 'AND' and 'OR' logic for the partition > > constraints as given in the below example. > > > > postgres@17503=#create table t(a int, b text) partition by list(a,b); > > CREATE TABLE > > postgres@17503=#create table t1 partition of t for values in ((1,'a'), > > (NULL,'b')); > > CREATE TABLE > > postgres@17503=#\d+ t > > Partitioned table "public.t" > > Column | Type | Collation | Nullable | Default | Storage | > > Compression | Stats target | Description > > > +-+---+--+-+--+-+--+- > > a | integer | | | | plain| > > | | > > b | text| | | | extended | > > | | > > Partition key: LIST (a, b) > > Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b')) > > > > postgres@17503=#\d+ t1 > > Table "public.t1" > > Column | Type | Collation | Nullable | Default | Storage | > > Compression | Stats target | Description > > > +-+---+--+-+--+-+--+- > > a | integer | | | | plain| > > | | > > b | text| | | | extended | > > | | > > Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b')) > > Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL) > > AND (b = 'b'::text))) > > Access method: heap > > The constraint expressions seem to come out correctly, though I > haven't checked your implementation closely yet. > > > 2. In the existing code, NULL values were handled differently. It was > > not added to the 'datums' variable, rather used to store the partition > > index directly in the 'null_index' variable. Now there is a > > possibility of multiple NULL values, hence introducing a new member > > 'isnulls' in the 'PartitionBoundInfoData' struct which indicates > > whether the corresponding element in the 'datums' is NULL. Now > > 'null_index' cannot be used directly to store the partition index, so > > removed it and made the necessary changes in multiple places. > > > > 3. I have added test cases for 'create table' and 'insert' statements > > related to multi-column list partitioning and these are working fine > > with 'make check'. > > > > 4. Handled the partition pruning code to accommodate these changes for > > single-column list partitioning. However it is pending for > > multi-column list partitioning. > > > > 5. I have done necessary changes in partition wise join related code > > to accommodate for single-column list partitioning. However it is > > pending for multi-column list partitioning. > > > > Kindly review the patch and let me know if any changes are required. > > The new list bound binary search and related comparison support > function look a bit too verbose to me. I was expecting > partition_list_bsearch() to look very much like > partition_range_datum_bsearch(), but that is not the case. The > special case code that you wrote in partition_list_bsearch() seems > unnecessary, at least in that function. I'm talking about the code > fragment starting with this comment: > > /* >* Once we find the matching for the first column but if it does > not >* match for the any of the other columns, then the binary search >* will not work in all the cases. We should traverse just below >* and above the mid index until we find the match or we reach > the >* first mismatch. >*/ > > I guess you're perhaps trying to address the case where the caller > does not specify the values for all of the partition key columns, > which can happen when the partition pruning code needs to handle a set > of clauses matching only some of the partition key columns. But > that's a concern of the partition pruning code and so the special case > should be handled there (if at all), not in the binary search function > that is shared with other callers. Regarding that,
Re: Question about StartLogicalReplication() error path
On Fri, Jun 11, 2021 at 2:23 AM Jeff Davis wrote: > * The comment acknowledges that a user might expect an error in that > case; but doesn't really address why the user would expect an error, > and why it's OK to violate that expectation. This code was written by Andres, so he'd be the best person to comment on it, but it seems to me that the comment does explain this, and that it's basically the same explanation as what Amit said. If the client doesn't have to do anything for a certain range of WAL and just acknowledges it, it would under your proposal have to also durably record that it had chosen to do nothing, which might cause extra fsyncs, potentially lots of extra fsyncs if this happens frequently e.g. because most tables are filtered out and the replicated ones are only modified occasionally. I'm not sure that it would be a good trade-off to have a tighter sanity check at the expense of adding that overhead. -- Robert Haas EDB: http://www.enterprisedb.com
Re: "an SQL" vs. "a SQL"
On Thu, Jun 10, 2021 at 05:39:00PM -0400, Andrew Dunstan wrote: > I suspect "an historic" is bordering on archaic even in the UK these days. Don't trigger me on the difference between "historic" and "historical"! ;-) (Hey, not every day I get to trim quoted text to one line --- see recent pgsql-general discussion of the topic.) -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Race condition in InvalidateObsoleteReplicationSlots()
On 2021-Jun-10, Álvaro Herrera wrote: > Here's a version that I feel is committable (0001). There was an issue > when returning from the inner loop, if in a previous iteration we had > released the lock. In that case we need to return with the lock not > held, so that the caller can acquire it again, but weren't. This seems > pretty hard to hit in practice (I suppose somebody needs to destroy the > slot just as checkpointer killed the walsender, but before checkpointer > marks it as its own) ... but if it did happen, I think checkpointer > would block with no recourse. Also added some comments and slightly > restructured the code. > > There are plenty of conflicts in pg13, but it's all easy to handle. Pushed, with additional minor changes. > I wrote a test (0002) to cover the case of signalling a walsender, which > is currently not covered (we only deal with the case of a standby that's > not running). There are some sharp edges in this code -- I had to make > it use background_psql() to send a CHECKPOINT, which hangs, because I > previously send a SIGSTOP to the walreceiver. Maybe there's a better > way to achieve a walreceiver that remains connected but doesn't consume > input from the primary, but I don't know what it is. Anyway, the code > becomes covered with this. I would like to at least see it in master, > to gather some reactions from buildfarm. I tried hard to make this stable, but it just isn't (it works fine one thousand runs, then I grab some coffee and run it once more and that one fails. Why? that's not clear to me). Attached is the last one I have, in case somebody wants to make it better. Maybe there's some completely different approach that works better, but I'm out of ideas for now. -- Álvaro Herrera Valdivia, Chile "La experiencia nos dice que el hombre peló millones de veces las patatas, pero era forzoso admitir la posibilidad de que en un caso entre millones, las patatas pelarían al hombre" (Ijon Tichy) >From 6b6cb174452c553437ba7949aa25f305f599d6b7 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 11 Jun 2021 12:21:45 -0400 Subject: [PATCH v3] Add test case for invalidating an active replication slot MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The code to signal a running walsender was completely uncovered before. This test involves sending SIGSTOP to a walreceiver and running a checkpoint while advancing WAL. I'm not very certain that this test is stable, so it's in master only, and separate from the code-fix commit so that it can be reverted easily if need be. Author: Álvaro Herrera Discussion: https://postgr.es/m/202106102202.mjw4huiix7lo@alvherre.pgsql --- src/test/recovery/t/019_replslot_limit.pl | 79 ++- 1 file changed, 76 insertions(+), 3 deletions(-) diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index 7094aa0704..dcadfe1252 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -11,7 +11,7 @@ use TestLib; use PostgresNode; use File::Path qw(rmtree); -use Test::More tests => 14; +use Test::More tests => $TestLib::windows_os ? 14 : 17; use Time::HiRes qw(usleep); $ENV{PGDATABASE} = 'postgres'; @@ -211,8 +211,8 @@ for (my $i = 0; $i < 1; $i++) } ok($failed, 'check that replication has been broken'); -$node_primary->stop('immediate'); -$node_standby->stop('immediate'); +$node_primary->stop; +$node_standby->stop; my $node_primary2 = get_new_node('primary2'); $node_primary2->init(allows_streaming => 1); @@ -253,6 +253,79 @@ my @result = timeout => '60')); is($result[1], 'finished', 'check if checkpoint command is not blocked'); +$node_primary2->stop; +$node_standby->stop; + +# The next test depends on Perl's `kill`, which apparently is not +# portable to Windows. (It would be nice to use Test::More's `subtest`, +# but that's not in the ancient version we require.) +if ($TestLib::windows_os) +{ + done_testing(); + exit; +} + +# Get a slot terminated while the walsender is active +# We do this by sending SIGSTOP to the walreceiver. Skip this on Windows. +my $node_primary3 = get_new_node('primary3'); +$node_primary3->init(allows_streaming => 1, extra => ['--wal-segsize=1']); +$node_primary3->append_conf( + 'postgresql.conf', qq( + min_wal_size = 2MB + max_wal_size = 2MB + log_checkpoints = yes + max_slot_wal_keep_size = 1MB + )); +$node_primary3->start; +$node_primary3->safe_psql('postgres', + "SELECT pg_create_physical_replication_slot('rep3')"); +# Take backup +$backup_name = 'my_backup'; +$node_primary3->backup($backup_name); +# Create standby +my $node_standby3 = get_new_node('standby_3'); +$node_standby3->init_from_backup($node_primary3, $backup_name, + has_streaming => 1); +$node_standby3->append_conf('postgresql.conf', "primary_slot_name = 'rep3'"); +$node_standby3->start; +$node_primary3->wait_for_catchup($node_standby3->name, 'replay'); +my
Re: Transactions involving multiple postgres foreign servers, take 2
On 2021/05/11 13:37, Masahiko Sawada wrote: I've attached the updated patches that incorporated comments from Zhihong and Ikeda-san. Thanks for updating the patches! I'm still reading these patches, but I'd like to share some review comments that I found so far. (1) +/* Remove the foreign transaction from FdwXactParticipants */ +void +FdwXactUnregisterXact(UserMapping *usermapping) +{ + Assert(IsTransactionState()); + RemoveFdwXactEntry(usermapping->umid); +} Currently there is no user of FdwXactUnregisterXact(). This function should be removed? (2) When I ran the regression test, I got the following failure. = Contents of ./src/test/modules/test_fdwxact/regression.diffs diff -U3 /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out --- /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out 2021-06-10 02:19:43.808622747 + +++ /home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out 2021-06-10 02:29:53.452410462 + @@ -174,7 +174,7 @@ SELECT count(*) FROM pg_foreign_xacts; count --- - 1 + 4 (1 row) (3) +errmsg("could not read foreign transaction state from xlog at %X/%X", + (uint32) (lsn >> 32), + (uint32) lsn))); LSN_FORMAT_ARGS() should be used? (4) +extern void RecreateFdwXactFile(TransactionId xid, Oid umid, void *content, + int len); Since RecreateFdwXactFile() is used only in fdwxact.c, the above "extern" is not necessary? (5) +2. Pre-Commit phase (1st phase of two-phase commit) +we record the corresponding WAL indicating that the foreign server is involved +with the current transaction before doing PREPARE all foreign transactions. +Thus, in case we loose connectivity to the foreign server or crash ourselves, +we will remember that we might have prepared tranascation on the foreign +server, and try to resolve it when connectivity is restored or after crash +recovery. So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there for WAL record to be replicated to the standby if sync replication is enabled? Otherwise, when the failover happens, new primary (past-standby) might not have enough XLOG_FDWXACT_INSERT WAL records and might fail to find some in-doubt foreign transactions. (6) XLogFlush() is called for each foreign transaction. So if there are many foreign transactions, XLogFlush() is called too frequently. Which might cause unnecessary performance overhead? Instead, for example, we should call XLogFlush() only at once in FdwXactPrepareForeignTransactions() after inserting all WAL records for all foreign transactions? (7) /* Open connection; report that we'll create a prepared statement. */ fmstate->conn = GetConnection(user, true, >conn_state); + MarkConnectionModified(user); MarkConnectionModified() should be called also when TRUNCATE on a foreign table is executed? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: [PATCH] Fix select from wrong table array_op_test
On 11/06/2021 18:31, Tom Lane wrote: Jason Kim writes: In the middle of GIN index testing, there are some selects that are on a different table array_op_test that doesn't even have an index. They probably were supposed to be selects to table array_index_op_test like the other ones around the area. I think it's probably intentional, else why have two tables at all? I suppose the point of these test cases is to confirm that you get the same results with or without use of an index. We already have these same queries in the 'arrays' test against the 'array_op_test' table, though. It sure looks like a copy-paste error to me as well. - Heikki
Re: [PATCH] Fix select from wrong table array_op_test
Jason Kim writes: > In the middle of GIN index testing, there are some selects that are on > a different table array_op_test that doesn't even have an index. They > probably were supposed to be selects to table array_index_op_test like > the other ones around the area. I think it's probably intentional, else why have two tables at all? I suppose the point of these test cases is to confirm that you get the same results with or without use of an index. Certainly, there's more than one way to do that. Perhaps we should have only one table and perform the variant tests by manipulating enable_indexscan et al. But I think what you did here is defeating the intent. regards, tom lane
Re: logical replication of truncate command with trigger causes Assert
Amit Kapila writes: > On Fri, Jun 11, 2021 at 12:20 AM Tom Lane wrote: >> Another thing >> I'm wondering is how many of these messages really need to be >> translated. We could use errmsg_internal and avoid burdening the >> translators, perhaps. > Not sure but I see all existing similar ereport calls don't use > errmsg_internal. I was thinking maybe we could mark all these replication protocol violation errors non-translatable. While we don't want to crash on a protocol violation, it shouldn't really be a user-facing case either. Thoughts? regards, tom lane
Re: Race condition in recovery?
Kyotaro Horiguchi writes: >> ==~_~===-=-===~_~== >> pgsql.build/src/bin/pg_verifybackup/tmp_check/log/003_corruption_primary.log >> ==~_~===-=-===~_~== >> ... >> 2021-06-08 16:17:41.706 CEST [51792:9] 003_corruption.pl LOG: received >> replication command: START_REPLICATION SLOT "pg_basebackup_51792" 0/B00 >> TIMELINE 1 >> 2021-06-08 16:17:41.706 CEST [51792:10] 003_corruption.pl STATEMENT: >> START_REPLICATION SLOT "pg_basebackup_51792" 0/B00 TIMELINE 1 >> (log ends here) > There seems like some hardware failure? conchuela has definitely evinced flakiness before. Not sure what's up with it, but I have no problem with writing off non-repeatable failures from that machine. In any case, it's now passed half a dozen times in a row on HEAD, so I think we can say that it's okay with this test. That leaves jacana, which I'm betting has a Windows portability issue with the new test. regards, tom lane
Re: pgbench bug candidate: negative "initial connection time"
Hello Hayato-san, I played pgbench with wrong parameters, That's good:-) and I found bug-candidate. 1. Do initdb and start. 2. Initialize schema and data with "scale factor" = 1. 3. execute following command many times: $ pgbench -c 101 -j 10 postgres Then, sometimes the negative " initial connection time" was returned. Lateyncy average is also strange. ``` $ pgbench -c 101 -j 10 postgres starting vacuum...end. pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: sorry, too many clients already Hmmm. AFAICR there was a decision to generate a report even if something went very wrong, in this case some client could not connect, so some values are not initialized, hence the absurd figures, as you show below. Maybe we should revisit this decision. initial connection time = -372896921.586 ms I sought pgbench.c and found a reason. When a thread failed to get some connections, they do not fill any values to thread->bench_start in threadRun(). And if the failure is caused in the final thread (this means threads[nthreads - 1]->bench_start is zero), the following if-statement sets bench_start to zero. I cannot distinguish whether we have to fix it, but I attache the patch. This simply ignores a result when therad->bench_start is zero. How do you think? Hmmm. Possibly. Another option could be not to report anything after some errors. I'm not sure, because it would depend on the use case. I guess the command returned an error status as well. I'm going to give it some thoughts. -- Fabien.
Re: Race condition in recovery?
Dilip Kumar writes: > On Fri, Jun 11, 2021 at 11:45 AM Kyotaro Horiguchi > wrote: >>> ==~_~===-=-===~_~== >>> pgsql.build/src/test/recovery/tmp_check/log/025_stuck_on_old_timeline_primary.log >>> ==~_~===-=-===~_~== >>> ... >>> The system cannot find the path specified. >>> 2021-06-10 22:56:17.754 EDT [60c2d0cf.54c:1] LOG: archive command failed >>> with exit code 1 >>> 2021-06-10 22:56:17.754 EDT [60c2d0cf.54c:2] DETAIL: The failed archive >>> command was: /usr/bin/perl >>> "/home/pgrunner/bf/root/HEAD/pgsql/src/test/recovery/t/cp_history_files" >>> "pg_wal\\00010001" >>> "/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_025_stuck_on_old_timeline_primary_data/archives/00010001" > Wal file copying will not create a problem for us, but I noticed that > it is failing in copying the history files as well and that is > creating a problem. I think jacana uses msys[2?], so this likely indicates a problem in path sanitization for the archive command. Andrew, any advice? regards, tom lane
Re: recovery test failures on hoverfly
Michael Paquier writes: > On Fri, Jun 11, 2021 at 05:38:34PM +0530, Amit Kapila wrote: >> It seems the error happens in both the tests when after issuing a >> KILL, we are trying to reconnect. Can we do anything for this? > This is the same problem as c757a3da and 6d41dd0, where we write a > query to a pipe but the kill, causing a failure, makes the test fail > with a SIGPIPE in IPC::Run as a query is sent down to a pipe. Indeed. > I think that using SELECT 1 to test if the server has been restarted > is a bit crazy. I would suggest to use instead a loop based on > pg_isready. The precedent of the previous fixes would seem to suggest seeing if we can replace 'SELECT 1' with "undef". Not sure if that'll work without annoying changes to poll_query_until, though. regards, tom lane
Re: Error on pgbench logs
Bonjour Michaël, + /* flush remaining stats */ + if (!logged && latency == 0.0) + logAgg(logfile, agg); You are right, this is missing the final stats. Why the choice of latency here for the check? For me zero latency really says that there is no actual transaction to count, so it is a good trigger for the closing call. I did not wish to add a new "flush" parameter, or a specific function. I agree that it looks strange, though. That's just to make the difference between the case where doLog() is called while processing the benchmark for the end of the transaction and the case where doLog() is called once a thread ends, no? Yes. Wouldn't it be better to do a final push of the states once a thread reaches CSTATE_FINISHED or CSTATE_ABORTED instead? The call was already in place at the end of threadRun and had just become ineffective. I did not wish to revisit its place and change the overall structure, it is just a bug fix. I agree that it could be done differently with the added logAgg function which could be called directly. Attached another version which does that. -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index dc84b7b9b7..62f7994363 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -3765,6 +3765,30 @@ executeMetaCommand(CState *st, pg_time_usec_t *now) return CSTATE_END_COMMAND; } +/* print aggregated report to logfile */ +static void +logAgg(FILE *logfile, StatsData *agg) +{ + fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f", + agg->start_time, + agg->cnt, + agg->latency.sum, + agg->latency.sum2, + agg->latency.min, + agg->latency.max); + if (throttle_delay) + { + fprintf(logfile, " %.0f %.0f %.0f %.0f", +agg->lag.sum, +agg->lag.sum2, +agg->lag.min, +agg->lag.max); + if (latency_limit) + fprintf(logfile, " " INT64_FORMAT, agg->skipped); + } + fputc('\n', logfile); +} + /* * Print log entry after completing one transaction. * @@ -3793,36 +3817,19 @@ doLog(TState *thread, CState *st, /* should we aggregate the results or not? */ if (agg_interval > 0) { + pg_time_usec_t next; + /* * Loop until we reach the interval of the current moment, and print * any empty intervals in between (this may happen with very low tps, * e.g. --rate=0.1). */ - - while (agg->start_time + agg_interval <= now) + while ((next = agg->start_time + agg_interval * INT64CONST(100)) <= now) { - /* print aggregated report to logfile */ - fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f", - agg->start_time, - agg->cnt, - agg->latency.sum, - agg->latency.sum2, - agg->latency.min, - agg->latency.max); - if (throttle_delay) - { -fprintf(logfile, " %.0f %.0f %.0f %.0f", - agg->lag.sum, - agg->lag.sum2, - agg->lag.min, - agg->lag.max); -if (latency_limit) - fprintf(logfile, " " INT64_FORMAT, agg->skipped); - } - fputc('\n', logfile); + logAgg(logfile, agg); /* reset data and move to next interval */ - initStats(agg, agg->start_time + agg_interval); + initStats(agg, next); } /* accumulate the current transaction */ @@ -6795,7 +6802,9 @@ done: { /* log aggregated but not yet reported transactions */ doLog(thread, state, , false, 0, 0); + logAgg(thread->logfile, ); } + fclose(thread->logfile); thread->logfile = NULL; }
Re: pg_regress.c also sensitive to various PG* environment variables
On 2021-Jun-11, Michael Paquier wrote: > Following up with the recent thread that dealt with the same $subject > for the TAP tests, I have gone through pg_regress.c: > https://www.postgresql.org/message-id/ylbjjrpuciez7...@paquier.xyz Good idea. > The list of environment variables that had better be reset when using > a temporary instance is very close to TestLib.pm, leading to the > attached. Please note that that the list of unsetted parameters has > been reorganized to be consistent with the TAP tests, and that I have > added comments referring one and the other. > > Thoughts? I think if they're to be kept in sync, then the exceptions should be noted. I mean, where PGCLIENTENCODING would otherwise be, I'd add /* PGCLIENTENCODING set above */ /* See below for PGHOSTADDR */ and so on (PGHOST and PGPORT probably don't need this because they're immediately below; not sure; but I would put them in alphabetical order in both lists for sure and then that wouldn't apply). Otherwise I would think that it's an omission and would set to fix it. -- Álvaro Herrera39°49'30"S 73°17'W
Re: Decoding speculative insert with toast leaks memory
On Fri, Jun 11, 2021 at 11:37 AM Dilip Kumar wrote: > > On Thu, Jun 10, 2021 at 7:15 PM Amit Kapila wrote: > > > > > > > Please find the patch for HEAD attached. Can you please prepare the > > patch for back-branches by doing all the changes I have done in the > > patch for HEAD? > > Done > Thanks, the patch looks good to me. I'll push these early next week (Tuesday) unless someone has any other comments or suggestions. -- With Regards, Amit Kapila.
Re: Add PortalDrop in exec_execute_message
Yura Sokolov writes: > This makes me think, Close message were intended to be used > but simply forgotten when libpq patch were made. > Tom, could I be right? You could argue all day about what the intentions were nearly twenty years ago. But the facts on the ground are that we don't issue Close in those places, and changing it now would be a de facto protocol change for applications. So I'm a hard -1 on these proposals. (Alvaro's proposed change isn't a protocol break, since pipeline mode hasn't shipped yet. It's trying to make some brand new code act more like old code, which seems like a fine idea.) I think that the actual problem here has been resolved in commit bb4aed46a. Perhaps we should reconsider my decision not to back-patch that. Unlike a protocol change, that one could possibly be sane to back-patch. I didn't think it was worth the trouble and risk, but maybe there's a case for it. regards, tom lane
Re: Added schema level support for publication.
On Fri, Jun 11, 2021, 6:22 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Sat, Jun 5, 2021 at 7:02 PM vignesh C wrote: > > Thanks for identifying and reporting this issue. I have \dn with the > > equivalent query to display only the publication name. The updated > > patch has the fix for the same. > > Currently, FOR ALL TABLES is there to add all the tables(existing and > future) in the current database in which the publication is created. I > wonder before providing FOR SCHEMA capability, we better target FOR > DATABASE first, something like CREATE PUBLICATION ... FOR DATABASE > foo, bar, baz, qux; Of course users with the proper permissions on the > specified databases can add them to the publication. This can help to > add all the tables in other databases as well. Then, the CREATE > PUBLICATION ... FOR SCHEMA foo, bar, baz, qux; makes more sense. > Because, my understanding is that: database is a collection of tables, > schema is a collection of databases. I may be wrong here, but it's > just a thought. What do you think? > Please ignore above comment. I was confused about what a database and schema is in postgres. I'm sorry for the noise. https://www.postgresql.org/docs/devel/ddl-schemas.html Regards, Bharath Rupireddy.
Re: Added schema level support for publication.
On Sat, Jun 5, 2021 at 7:02 PM vignesh C wrote: > Thanks for identifying and reporting this issue. I have \dn with the > equivalent query to display only the publication name. The updated > patch has the fix for the same. Currently, FOR ALL TABLES is there to add all the tables(existing and future) in the current database in which the publication is created. I wonder before providing FOR SCHEMA capability, we better target FOR DATABASE first, something like CREATE PUBLICATION ... FOR DATABASE foo, bar, baz, qux; Of course users with the proper permissions on the specified databases can add them to the publication. This can help to add all the tables in other databases as well. Then, the CREATE PUBLICATION ... FOR SCHEMA foo, bar, baz, qux; makes more sense. Because, my understanding is that: database is a collection of tables, schema is a collection of databases. I may be wrong here, but it's just a thought. What do you think? With Regards, Bharath Rupireddy.
Re: Transactions involving multiple postgres foreign servers, take 2
On Thu, Jun 10, 2021 at 9:58 PM tsunakawa.ta...@fujitsu.com wrote: > I understand that. As I cited yesterday and possibly before, that's why > xa_commit() returns various return codes. So, I have never suggested that > FDWs should not report an error and always report success for the commit > request. They should be allowed to report an error. In the text to which I was responding it seemed like you were saying the opposite. Perhaps I misunderstood. > The question I have been asking is how. With that said, we should only have > two options; one is the return value of the FDW commit routine, and the other > is via ereport(ERROR). I suggested the possibility of the former, because if > the FDW does ereport(ERROR), Postgres core (transaction manager) may have > difficulty in handling the rest of the participants. I don't think that is going to work. It is very difficult to write code that doesn't ever ERROR in PostgreSQL. It is not impossible if the operation is trivial enough, but I think you're greatly underestimating the complexity of committing the remote transaction. If somebody had designed PostgreSQL so that every function returns a return code and every time you call some other function you check that return code and pass any error up to your own caller, then there would be no problem here. But in fact the design was that at the first sign of trouble you throw an ERROR. It's not easy to depart from that programming model in just one place. > > Also, leaving aside theoretical arguments, I think it's not > > realistically possible for an FDW author to write code to commit a > > prepared transaction that will be safe in the context of running late > > in PrepareTransaction(), after we've already done > > RecordTransactionCommit(). Such code can't avoid throwing errors > > because it can't avoid performing operations and allocating memory. > > I'm not completely sure about this. I thought (and said) that the only thing > the FDW does would be to send a commit request through an existing > connection. So, I think it's not a severe restriction to require FDWs to do > ereport(ERROR) during commits (of the second phase of 2PC.) To send a commit request through an existing connection, you have to send some bytes over the network using a send() or write() system call. That can fail. Then you have to read the response back over the network using recv() or read(). That can also fail. You also need to parse the result that you get from the remote side, which can also fail, because you could get back garbage for some reason. And depending on the details, you might first need to construct the message you're going to send, which might be able to fail too. Also, the data might be encrypted using SSL, so you might have to decrypt it, which can also fail, and you might need to encrypt data before sending it, which can fail. In fact, if you're using the OpenSSL, trying to call SSL_read() or SSL_write() can both read and write data from the socket, even multiple times, so you have extra opportunities to fail. > (I took "abort" as the same as "rollback" here.) Once we've sent commit > requests to some participants, we can't abort the transaction. If one FDW > returned an error halfway, we need to send commit requests to the rest of > participants. I understand that it's not possible to abort the local transaction to abort after it's been committed, but that doesn't mean that we're going to be able to send the commit requests to the rest of the participants. We want to be able to do that, certainly, but there's no guarantee that it's actually possible. Again, the remote servers may be dropped into a volcano, or less seriously, we may not be able to access them. Also, someone may kill off our session. > It's a design question, as I repeatedly said, whether and how we should > report the error of some participants to the client. For instance, how > should we report the errors of multiple participants? Concatenate those > error messages? Sure, I agree that there are some questions about how to report errors. > Anyway, we should design the interface first, giving much thought and > respecting the ideas of predecessors (TX/XA, MS DTC, JTA/JTS). Otherwise, we > may end up like "We implemented like this, so the interface is like this and > it can only behave like this, although you may find it strange..." That > might be a situation similar to what your comment "the design of PostgreSQL, > in all circumstances, the way you recover from an error is to abort the > transaction" suggests -- Postgres doesn't have statement-level rollback. I think that's a valid concern, but we also have to have a plan that is realistic. Some things are indeed not possible in PostgreSQL's design. Also, some of these problems are things everyone has to somehow confront. There's no database doing 2PC that can't have a situation where one of the machines disappears unexpectedly due to some natural disaster or
Re: Fix dropped object handling in pg_event_trigger_ddl_commands
On Fri, Jun 11, 2021 at 11:00:40AM +0300, Aleksander Alekseev wrote: > The last argument should be `false` then. Hm, nope. I think that we had better pass true as argument here. First, this is more consistent with the identity lookup (OK, it does not matter as we would have discarded the object after the identity lookup anyway, but any future shuffling of this code may not be that wise). Second, now that I look at it, getObjectTypeDescription() can never be NULL as we have fallback names for relations, routines and constraints for all object types so the buffer will be filled with some data. Let's replace the bottom of getObjectTypeDescription() that returns now NULL by Assert(buffer.len > 0). This code is new as of v14, so it is better to adjust that sooner than later. -- Michael signature.asc Description: PGP signature
Re: recovery test failures on hoverfly
On Fri, Jun 11, 2021 at 05:38:34PM +0530, Amit Kapila wrote: > It seems the error happens in both the tests when after issuing a > KILL, we are trying to reconnect. Can we do anything for this? This is the same problem as c757a3da and 6d41dd0, where we write a query to a pipe but the kill, causing a failure, makes the test fail with a SIGPIPE in IPC::Run as a query is sent down to a pipe. I think that using SELECT 1 to test if the server has been restarted is a bit crazy. I would suggest to use instead a loop based on pg_isready. -- Michael signature.asc Description: PGP signature
recovery test failures on hoverfly
I noticed that we are getting random failures [1][2][3] in the recovery test on hoverfly. The failures are in 022_crash_temp_files and 013_crash_restart. Both the tests failed due to same reason: ack Broken pipe: write( 13, 'SELECT 1' ) at /home/nm/src/build/IPC-Run-0.94/lib/IPC/Run/IO.pm line 558. It seems the error happens in both the tests when after issuing a KILL, we are trying to reconnect. Can we do anything for this? [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly=2021-06-11%2006%3A59%3A59 [2] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly=2021-06-06%2007%3A09%3A53 [3] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly=2021-06-05%2008%3A40%3A49 -- With Regards, Amit Kapila.
pg_regress.c also sensitive to various PG* environment variables
Hi all, Following up with the recent thread that dealt with the same $subject for the TAP tests, I have gone through pg_regress.c: https://www.postgresql.org/message-id/ylbjjrpuciez7...@paquier.xyz The list of environment variables that had better be reset when using a temporary instance is very close to TestLib.pm, leading to the attached. Please note that that the list of unsetted parameters has been reorganized to be consistent with the TAP tests, and that I have added comments referring one and the other. Thoughts? -- Michael diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm index 47d7f31e94..26fbe08d4b 100644 --- a/src/test/perl/TestLib.pm +++ b/src/test/perl/TestLib.pm @@ -104,6 +104,7 @@ BEGIN delete $ENV{LC_ALL}; $ENV{LC_MESSAGES} = 'C'; + # This list should be kept in sync with pg_regress.c. my @envkeys = qw ( PGCHANNELBINDING PGCLIENTENCODING diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index e04d365258..89db370184 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -819,14 +819,33 @@ initialize_environment(void) * we also use psql's -X switch consistently, so that ~/.psqlrc files * won't mess things up.) Also, set PGPORT to the temp port, and set * PGHOST depending on whether we are using TCP or Unix sockets. + * + * This list should be kept in sync with TestLib.pm. */ - unsetenv("PGDATABASE"); - unsetenv("PGUSER"); - unsetenv("PGSERVICE"); - unsetenv("PGSSLMODE"); - unsetenv("PGREQUIRESSL"); + unsetenv("PGCHANNELBINDING"); unsetenv("PGCONNECT_TIMEOUT"); unsetenv("PGDATA"); + unsetenv("PGDATABASE"); + unsetenv("PGGSSENCMODE"); + unsetenv("PGGSSLIB"); + unsetenv("PGKRBSRVNAME"); + unsetenv("PGPASSFILE"); + unsetenv("PGREQUIREPEER"); + unsetenv("PGREQUIRESSL"); + unsetenv("PGSERVICE"); + unsetenv("PGSERVICEFILE"); + unsetenv("PGSSLCERT"); + unsetenv("PGSSLCRL"); + unsetenv("PGSSLCRLDIR"); + unsetenv("PGSSLKEY"); + unsetenv("PGSSLMAXPROTOCOLVERSION"); + unsetenv("PGSSLMINPROTOCOLVERSION"); + unsetenv("PGSSLMODE"); + unsetenv("PGSSLROOTCERT"); + unsetenv("PGSSLSNI"); + unsetenv("PGTARGETSESSIONATTRS"); + unsetenv("PGUSER"); + #ifdef HAVE_UNIX_SOCKETS if (hostname != NULL) setenv("PGHOST", hostname, 1); signature.asc Description: PGP signature
Re: PoC/WIP: Extended statistics on expressions
On 6/11/21 6:55 AM, Noah Misch wrote: On Sun, Jun 06, 2021 at 09:13:17PM +0200, Tomas Vondra wrote: On 6/6/21 7:37 AM, Noah Misch wrote: On Sat, Mar 27, 2021 at 01:17:14AM +0100, Tomas Vondra wrote: OK, pushed after a bit more polishing and testing. This added a "transformed" field to CreateStatsStmt, but it didn't mention that field in src/backend/nodes. Should those functions handle the field? Yup, that's a mistake - it should do whatever CREATE INDEX is doing. Not sure if it can result in error/failure or just inefficiency (due to transforming the expressions repeatedly), but it should do whatever CREATE INDEX is doing. Thanks for noticing! Fixed by d57ecebd12. Great. For future reference, this didn't need a catversion bump. readfuncs.c changes need a catversion bump, since the catalogs might contain input for each read function. Other src/backend/nodes functions don't face that. Also, src/backend/nodes generally process fields in the order that they appear in the struct. The order you used in d57ecebd12 is nicer, being more like IndexStmt, so I'm pushing an order change to the struct. OK, makes sense. Thanks! -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command
On Fri, Jun 11, 2021 at 8:14 PM Amit Kapila wrote: > > On Thu, Jun 10, 2021 at 2:04 PM Ajin Cherian wrote: > > > > The new patches look mostly good apart from the below cosmetic issues. > I think the question is whether we want to do these for PG-14 or > postpone them till PG-15. I think these don't appear to be risky > changes so we can get them in PG-14 as that might help some outside > core solutions as appears to be the case for Jeff. The changes related > to start_replication are too specific to the subscriber-side solution > so we can postpone those along with the subscriber-side 2PC work. > Jeff, Ajin, what do you think? > Since we are exposing two-phase decoding using the pg_create_replication_slot API, I think it is reasonable to expose CREATE_REPLICATION_SLOT as well. We can leave the subscriber side changes for PG-15. > Also, I can take care of the below cosmetic issues before committing > if we decide to do this for PG-14. Thanks, Regards, Ajin Cherian Fujitsu Australia
Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command
On Thu, Jun 10, 2021 at 2:04 PM Ajin Cherian wrote: > The new patches look mostly good apart from the below cosmetic issues. I think the question is whether we want to do these for PG-14 or postpone them till PG-15. I think these don't appear to be risky changes so we can get them in PG-14 as that might help some outside core solutions as appears to be the case for Jeff. The changes related to start_replication are too specific to the subscriber-side solution so we can postpone those along with the subscriber-side 2PC work. Jeff, Ajin, what do you think? Also, I can take care of the below cosmetic issues before committing if we decide to do this for PG-14. Few cosmetic issues: == 1. git diff --check shows src/bin/pg_basebackup/t/030_pg_recvlogical.pl:109: new blank line at EOF. 2. + The following example shows SQL interface that can be used to decode prepared transactions. Before you use two-phase commit commands, you must set Spurious line addition. 3. /* Build query */ appendPQExpBuffer(query, "CREATE_REPLICATION_SLOT \"%s\"", slot_name); if (is_temporary) appendPQExpBufferStr(query, " TEMPORARY"); + if (is_physical) Spurious line addition. 4. appendPQExpBuffer(query, " LOGICAL \"%s\"", plugin); + if (two_phase && PQserverVersion(conn) >= 14) + appendPQExpBufferStr(query, " TWO_PHASE"); + if (PQserverVersion(conn) >= 10) /* pg_recvlogical doesn't use an exported snapshot, so suppress */ appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT"); I think it might be better to append TWO_PHASE after NOEXPORT_SNAPSHOT but it doesn't matter much. 5. +$node->safe_psql('postgres', + "BEGIN;INSERT INTO test_table values (11); PREPARE TRANSACTION 'test'"); There is no space after BEGIN but there is a space after INSERT. For consistency-sake, I will have space after BEGIN as well. -- With Regards, Amit Kapila.
Re: Race condition in recovery?
On Fri, Jun 11, 2021 at 11:45 AM Kyotaro Horiguchi wrote: > > At Thu, 10 Jun 2021 21:53:18 -0400, Tom Lane wrote in > tgl> Please note that conchuela and jacana are still failing ... > > I forgot jacana's case.. > > It is failing for the issue the first patch should have fixed. > > > ==~_~===-=-===~_~== > > pgsql.build/src/test/recovery/tmp_check/log/025_stuck_on_old_timeline_primary.log > > ==~_~===-=-===~_~== > ... > > The system cannot find the path specified. > > 2021-06-10 22:56:17.754 EDT [60c2d0cf.54c:1] LOG: archive command failed > > with exit code 1 > > 2021-06-10 22:56:17.754 EDT [60c2d0cf.54c:2] DETAIL: The failed archive > > command was: /usr/bin/perl > > "/home/pgrunner/bf/root/HEAD/pgsql/src/test/recovery/t/cp_history_files" > > "pg_wal\\00010001" > > "/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_025_stuck_on_old_timeline_primary_data/archives/00010001" Wal file copying will not create a problem for us, but I noticed that it is failing in copying the history files as well and that is creating a problem. 2021-06-10 22:56:28.940 EDT [60c2d0db.1208:1] LOG: archive command failed with exit code 1 2021-06-10 22:56:28.940 EDT [60c2d0db.1208:2] DETAIL: The failed archive command was: /usr/bin/perl "/home/pgrunner/bf/root/HEAD/pgsql/src/test/recovery/t/cp_history_files" "pg_wal\\0002.history" "/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_025_stuck_on_old_timeline_primary_data/archives/0002.history" I have noticed that the archive command is failing in some other test case too (002_archiving_standby2.log), see below logs. ==~_~===-=-===~_~== pgsql.build/src/test/recovery/tmp_check/log/002_archiving_standby2.log ==~_~===-=-===~_~== ... 0 file(s) copied. 2021-06-10 22:44:34.467 EDT [60c2ce10.1270:1] LOG: archive command failed with exit code 1 2021-06-10 22:44:34.467 EDT [60c2ce10.1270:2] DETAIL: The failed archive command was: copy "pg_wal\\0003.history" "c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_002_archiving_primary_data/archives\\0003.history" The system cannot find the path specified. 0 file(s) copied. 2021-06-10 22:44:35.478 EDT [60c2ce10.1270:3] LOG: archive command failed with exit code 1 2021-06-10 22:44:35.478 EDT [60c2ce10.1270:4] DETAIL: The failed archive command was: copy "pg_wal\\0003.history" "c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_002_archiving_primary_data/archives\\0003.history" 2021-06-10 22:44:36.113 EDT [60c2ce0c.283c:5] LOG: received immediate shutdown request 2021-06-10 22:44:36.129 EDT [60c2ce0c.283c:6] LOG: database system is shut down I am not able to figure out why the archive command is failing. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
[PATCH] Fix select from wrong table array_op_test
In the middle of GIN index testing, there are some selects that are on a different table array_op_test that doesn't even have an index. They probably were supposed to be selects to table array_index_op_test like the other ones around the area. Fix that. The expected output should stay the same because both tables use the same array.data. --- src/test/regress/expected/create_index.out | 12 ++-- src/test/regress/sql/create_index.sql | 12 ++-- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index 49f2a158c1..cfdf73179f 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -904,23 +904,23 @@ SELECT * FROM array_index_op_test WHERE i <@ '{}' ORDER BY seqno; 101 | {} | {} (1 row) -SELECT * FROM array_op_test WHERE i = '{NULL}' ORDER BY seqno; +SELECT * FROM array_index_op_test WHERE i = '{NULL}' ORDER BY seqno; seqno | i| t ---++ 102 | {NULL} | {NULL} (1 row) -SELECT * FROM array_op_test WHERE i @> '{NULL}' ORDER BY seqno; +SELECT * FROM array_index_op_test WHERE i @> '{NULL}' ORDER BY seqno; seqno | i | t ---+---+--- (0 rows) -SELECT * FROM array_op_test WHERE i && '{NULL}' ORDER BY seqno; +SELECT * FROM array_index_op_test WHERE i && '{NULL}' ORDER BY seqno; seqno | i | t ---+---+--- (0 rows) -SELECT * FROM array_op_test WHERE i <@ '{NULL}' ORDER BY seqno; +SELECT * FROM array_index_op_test WHERE i <@ '{NULL}' ORDER BY seqno; seqno | i | t ---++ 101 | {} | {} @@ -1195,13 +1195,13 @@ SELECT * FROM array_index_op_test WHERE t = '{}' ORDER BY seqno; 101 | {} | {} (1 row) -SELECT * FROM array_op_test WHERE i = '{NULL}' ORDER BY seqno; +SELECT * FROM array_index_op_test WHERE i = '{NULL}' ORDER BY seqno; seqno | i| t ---++ 102 | {NULL} | {NULL} (1 row) -SELECT * FROM array_op_test WHERE i <@ '{NULL}' ORDER BY seqno; +SELECT * FROM array_index_op_test WHERE i <@ '{NULL}' ORDER BY seqno; seqno | i | t ---++ 101 | {} | {} diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql index 8bc76f7c6f..9474dabf9e 100644 --- a/src/test/regress/sql/create_index.sql +++ b/src/test/regress/sql/create_index.sql @@ -295,10 +295,10 @@ SELECT * FROM array_index_op_test WHERE i = '{}' ORDER BY seqno; SELECT * FROM array_index_op_test WHERE i @> '{}' ORDER BY seqno; SELECT * FROM array_index_op_test WHERE i && '{}' ORDER BY seqno; SELECT * FROM array_index_op_test WHERE i <@ '{}' ORDER BY seqno; -SELECT * FROM array_op_test WHERE i = '{NULL}' ORDER BY seqno; -SELECT * FROM array_op_test WHERE i @> '{NULL}' ORDER BY seqno; -SELECT * FROM array_op_test WHERE i && '{NULL}' ORDER BY seqno; -SELECT * FROM array_op_test WHERE i <@ '{NULL}' ORDER BY seqno; +SELECT * FROM array_index_op_test WHERE i = '{NULL}' ORDER BY seqno; +SELECT * FROM array_index_op_test WHERE i @> '{NULL}' ORDER BY seqno; +SELECT * FROM array_index_op_test WHERE i && '{NULL}' ORDER BY seqno; +SELECT * FROM array_index_op_test WHERE i <@ '{NULL}' ORDER BY seqno; CREATE INDEX textarrayidx ON array_index_op_test USING gin (t); @@ -331,8 +331,8 @@ SELECT * FROM array_index_op_test WHERE t && '{AAA80240}' ORDER BY seqno; SELECT * FROM array_index_op_test WHERE i @> '{32}' AND t && '{AAA80240}' ORDER BY seqno; SELECT * FROM array_index_op_test WHERE i && '{32}' AND t @> '{AAA80240}' ORDER BY seqno; SELECT * FROM array_index_op_test WHERE t = '{}' ORDER BY seqno; -SELECT * FROM array_op_test WHERE i = '{NULL}' ORDER BY seqno; -SELECT * FROM array_op_test WHERE i <@ '{NULL}' ORDER BY seqno; +SELECT * FROM array_index_op_test WHERE i = '{NULL}' ORDER BY seqno; +SELECT * FROM array_index_op_test WHERE i <@ '{NULL}' ORDER BY seqno; RESET enable_seqscan; RESET enable_indexscan; -- 2.24.1
pgbench bug candidate: negative "initial connection time"
Hi Hackers, I played pgbench with wrong parameters, and I found bug-candidate. The latest commit in my source is 3a09d75. 1. Do initdb and start. 2. Initialize schema and data with "scale factor" = 1. 3. execute following command many times: $ pgbench -c 101 -j 10 postgres Then, sometimes the negative " initial connection time" was returned. Lateyncy average is also strange. ``` $ pgbench -c 101 -j 10 postgres starting vacuum...end. pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: sorry, too many clients already pgbench (PostgreSQL) 14.0 transaction type: scaling factor: 1 query mode: simple number of clients: 101 number of threads: 10 number of transactions per client: 10 number of transactions actually processed: 910/1010 latency average = 41387686.662 ms initial connection time = -372896921.586 ms tps = 0.002440 (without initial connection time) ``` I sought pgbench.c and found a reason. When a thread failed to get some connections, they do not fill any values to thread->bench_start in threadRun(). And if the failure is caused in the final thread (this means threads[nthreads - 1]->bench_start is zero), the following if-statement sets bench_start to zero. ``` 6494 /* first recorded benchmarking start time */ 6495 if (bench_start == 0 || thread->bench_start < bench_start) 6496 bench_start = thread->bench_start; ``` The wrong bench_start propagates to printResult() and then some invalid values are appered. ``` 6509 printResults(, pg_time_now() - bench_start, conn_total_duration, 6510 bench_start - start_time, latency_late); ``` I cannot distinguish whether we have to fix it, but I attache the patch. This simply ignores a result when therad->bench_start is zero. How do you think? Best Regards, Hayato Kuroda FUJITSU LIMITED ignore_failed_thread.patch Description: ignore_failed_thread.patch
Re: Fix dropped object handling in pg_event_trigger_ddl_commands
Hi Michael, > /* The type can never be NULL */ > type = getObjectTypeDescription(, true); The last argument should be `false` then. -- Best regards, Aleksander Alekseev v4-0001-pg_event_trigger_ddl_commands.patch Description: Binary data
Re: Duplicate history file?
On Fri, Jun 11, 2021 at 03:32:28PM +0900, Kyotaro Horiguchi wrote: > At Fri, 11 Jun 2021 10:48:32 +0800, Julien Rouhaud wrote > in > > > > "far from perfect" is a strong understatement for "appears to work but will > > randomly and silently breaks everything without a simple way to detect it". > > I think it's overstating. It sounds like a story of a mission critical > world. How perfect archive_command should be depends on the > requirements of every system. Simple cp is actualy sufficient in > certain log? range of usages, maybe. I disagree, cp is probably the worst command that can be used for this purpose. On top on the previous problems already mentioned, you also have the fact that the copy isn't atomic. It means that any concurrent restore_command (or anything that would consume the archived files) will happily process a half copied WAL file, and in case of any error during the copy you end up with a file for which you don't know if it contains valid data or not. I don't see any case where you would actually want to use that, unless maybe if you want to benchmark how long it takes before you lose some data. > > We should document a minimum workable setup, but cp isn't an example of > > that, > > and I don't think that there will be a simple example unless we provide a > > dedicated utility. > > It looks somewhat strange like "Well, you need a special track to > drive your car, however, we don't have one. It's your responsibility > to construct a track that protects it from accidents perfectly.". > > "Yeah, I'm not going to push it so hard and don't care it gets some > small scratches, couldn't I drive it on a public road?" > > (Sorry for the bad analogy). I think that a better analogy would be "I don't need working brakes on my car since I only drive on highway and there aren't any red light there". > Isn't it almost saying that anything less than pgBackRest isn't > qualified as archive_program? I don't know, I'm assuming that barman also provides one, such as wal-e and wal-g (assuming that the distant providers do their part of the job correctly). Maybe there are other tools too. But as long as we don't document what exactly are the requirements, it's not really a surprise that most people don't implement them.
Re: Duplicate history file?
On Fri, Jun 11, 2021 at 03:32:28PM +0900, Kyotaro Horiguchi wrote: > At Fri, 11 Jun 2021 10:48:32 +0800, Julien Rouhaud wrote > in >> "far from perfect" is a strong understatement for "appears to work but will >> randomly and silently breaks everything without a simple way to detect it". Yeah. Users like unplugging their hosts, because that's *fast* and easy to do. > I think it's overstating. It sounds like a story of a mission critical > world. How perfect archive_command should be depends on the > requirements of every system. Simple cp is actually sufficient in > certain log? range of usages, maybe. > >> We should document a minimum workable setup, but cp isn't an example of that, >> and I don't think that there will be a simple example unless we provide a >> dedicated utility. > > I think cp can be an example as far as we explain the limitations. (On > the other hand "test !-f" cannot since it actually prevents server > from working correctly.) Disagreed. I think that we should not try to change this area until we can document a reliable solution, and a simple "cp" is not that. Hmm. A simple command that could be used as reference is for example "dd" that flushes the file by itself, or we could just revisit the discussions about having a pg_copy command, or we could document a small utility in perl that does the job. -- Michael signature.asc Description: PGP signature
Re: Error on pgbench logs
At Fri, 11 Jun 2021 15:56:55 +0900 (JST), Kyotaro Horiguchi wrote in > Doesn't threadRun already doing that? (s/Does/Is) That's once per thread, sorry for the noise. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Added missing tab completion for alter subscription set option
On Sun, May 23, 2021 at 04:24:59PM +0530, vignesh C wrote: > /* Complete "CREATE SUBSCRIPTION ... WITH ( " */ > else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", > "(")) > - COMPLETE_WITH("copy_data", "connect", "create_slot", "enabled", > - "slot_name", "synchronous_commit"); > + COMPLETE_WITH("binary", "copy_data", "connect", "create_slot", > + "enabled", "slot_name", "streaming", > + "synchronous_commit"); "copy_data" and "connect" need to be reversed. Applied. -- Michael signature.asc Description: PGP signature
Re: Error on pgbench logs
At Fri, 11 Jun 2021 15:23:41 +0900, Michael Paquier wrote in > On Thu, Jun 10, 2021 at 11:29:30PM +0200, Fabien COELHO wrote: > > + /* flush remaining stats */ > > + if (!logged && latency == 0.0) > > + logAgg(logfile, agg); > > You are right, this is missing the final stats. Why the choice of > latency here for the check? That's just to make the difference > between the case where doLog() is called while processing the > benchmark for the end of the transaction and the case where doLog() is > called once a thread ends, no? Wouldn't it be better to do a final > push of the states once a thread reaches CSTATE_FINISHED or > CSTATE_ABORTED instead? Doesn't threadRun already doing that? -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Duplicate history file?
At Fri, 11 Jun 2021 10:48:32 +0800, Julien Rouhaud wrote in > On Fri, Jun 11, 2021 at 11:25:51AM +0900, Kyotaro Horiguchi wrote: > > > > Nevertheless I agree to it, still don't we need a minimum workable > > setup as the first step? Something like below. > > > > === > > The following is an example of the minimal archive_command. > > > > Example: cp %p /blah/%f > > > > However, it is far from perfect. The following is the discussion about > > what is needed for archive_command to be more reliable. > > > > > > > > "far from perfect" is a strong understatement for "appears to work but will > randomly and silently breaks everything without a simple way to detect it". I think it's overstating. It sounds like a story of a mission critical world. How perfect archive_command should be depends on the requirements of every system. Simple cp is actualy sufficient in certain log? range of usages, maybe. > We should document a minimum workable setup, but cp isn't an example of that, > and I don't think that there will be a simple example unless we provide a > dedicated utility. It looks somewhat strange like "Well, you need a special track to drive your car, however, we don't have one. It's your responsibility to construct a track that protects it from accidents perfectly.". "Yeah, I'm not going to push it so hard and don't care it gets some small scratches, couldn't I drive it on a public road?" (Sorry for the bad analogy). I think cp can be an example as far as we explain the limitations. (On the other hand "test !-f" cannot since it actually prevents server from working correctly.) > It could however be something along those lines: > > Example: archive_program %p /path/to/%d > > archive_program being a script ensuring that all those requirements are met: > Isn't it almost saying that anything less than pgBackRest isn't qualified as archive_program? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Multi-Column List Partitioning
On Fri, Jun 11, 2021 at 12:37 PM Amit Langote wrote: > I will look at other parts of the patch next week hopefully. For > now, attached is a delta patch that applies on top of your v1, which > does: > > * Simplify partition_list_bsearch() and partition_lbound_datum_cmp() > * Make qsort_partition_list_value_cmp simply call > partition_lbound_datum_cmp() instead of having its own logic to > compare input bounds > * Move partition_lbound_datum_cmp() into partbounds.c as a static > function (export seems unnecessary) > * Add a comment for PartitionBoundInfo.isnulls and remove that for null_index One more: * Add all columns of newly added test query in insert.sql to the order by clause to get predictably ordered output -- Amit Langote EDB: http://www.enterprisedb.com
Re: Error on pgbench logs
On Thu, Jun 10, 2021 at 11:29:30PM +0200, Fabien COELHO wrote: > + /* flush remaining stats */ > + if (!logged && latency == 0.0) > + logAgg(logfile, agg); You are right, this is missing the final stats. Why the choice of latency here for the check? That's just to make the difference between the case where doLog() is called while processing the benchmark for the end of the transaction and the case where doLog() is called once a thread ends, no? Wouldn't it be better to do a final push of the states once a thread reaches CSTATE_FINISHED or CSTATE_ABORTED instead? -- Michael signature.asc Description: PGP signature
Re: Question about StartLogicalReplication() error path
On Fri, 2021-06-11 at 10:13 +0530, Amit Kapila wrote: > Because sometimes clients don't have to do anything for xlog records. > One example is WAL for DDL where logical decoding didn't produce > anything for the client but later with keepalive we send the LSN of > WAL where DDL has finished and the client just responds with the > position sent by the server as it doesn't have any other pending > transactions. If I understand correctly, in this situation it avoids the cost of a write on the client just to update its stored LSN progress value when there's no real data to be written. In that case the client would need to rely on the server's confirmed_flush_lsn instead of its own stored LSN progress value. That's a reasonable thing for the *client* to do explicitly, e.g. by just reading the slot's confirmed_flush_lsn and comparing to its own stored lsn. But I don't think it's reasonable for the server to just skip over data requested by the client because it thinks it knows best. > I think because there is no need to process the WAL that has been > confirmed by the client. Do you see any problems with this scheme? Several: * Replication setups are complex, and it can be easy to misconfigure something or have a bug in some control code. An error is valuable to detect the problem closer to the source. * There are plausible configurations where things could go badly wrong. For instance, if you are storing the decoded data in another postgres server with syncrhonous_commit=off, and acknowledging LSNs before they are durable. A crash of the destination system would be consistent, but it would be missing some data earlier than the confirmed_flush_lsn. The client would then request the data starting at its stored lsn progress value, but the server would skip ahead to the confirmed_flush_lsn; silently missing data. * It's contradicted by the docs: "Instructs server to start streaming WAL for logical replication, starting at WAL location XXX/XXX." * The comment acknowledges that a user might expect an error in that case; but doesn't really address why the user would expect an error, and why it's OK to violate that expectation. Regards, Jeff Davis
Re: Race condition in recovery?
At Thu, 10 Jun 2021 21:53:18 -0400, Tom Lane wrote in tgl> Please note that conchuela and jacana are still failing ... I forgot jacana's case.. It is failing for the issue the first patch should have fixed. > ==~_~===-=-===~_~== > pgsql.build/src/test/recovery/tmp_check/log/025_stuck_on_old_timeline_primary.log > ==~_~===-=-===~_~== ... > The system cannot find the path specified. > 2021-06-10 22:56:17.754 EDT [60c2d0cf.54c:1] LOG: archive command failed > with exit code 1 > 2021-06-10 22:56:17.754 EDT [60c2d0cf.54c:2] DETAIL: The failed archive > command was: /usr/bin/perl > "/home/pgrunner/bf/root/HEAD/pgsql/src/test/recovery/t/cp_history_files" > "pg_wal\\00010001" > "/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_025_stuck_on_old_timeline_primary_data/archives/00010001" the cp_history_files exits with just "exit" for the files with that name, which should set status to 0. ActivePerl did so. If I specified nonexistent command like /hoge/perl, %ERRORLEVEL% is set to 3, not 1. I don't find what is happening there so far. regards. -- Kyotaro Horiguchi NTT Open Source Software Center