Re: A new function to wait for the backend exit after termination

2021-06-11 Thread Justin Pryzby
On Fri, Jun 11, 2021 at 09:37:50PM -0700, Noah Misch wrote:
> On Fri, Jun 11, 2021 at 08:54:08PM -0500, Justin Pryzby wrote:
> > On Sat, Jun 05, 2021 at 12:08:01PM -0700, Noah Misch wrote:
> > > > > My preference is to remove pg_wait_for_backend_termination().  The 
> > > > > use case
> > > > > that prompted this thread used pg_terminate_backend(pid, 18); it 
> > > > > doesn't
> > > > > need pg_wait_for_backend_termination().
> > 
> > Is this an Opened Issue ?
> 
> An Open Item?  Not really, since there's no objective defect.  Nonetheless,
> the attached is what I'd like to use.

I think of this as a list of stuff to avoid forgetting that needs to be
addressed or settled before the release.

If the value of the new function is marginal, it may be good to remove it, else
we're committed to supporting it.

Even if it's not removed, the descriptions should be cleaned up.

| src/include/catalog/pg_proc.dat-  descr => 'terminate a backend process and 
if timeout is specified, wait for its exit or until timeout occurs',
=> I think doesn't need to change or mention the optional timeout at all

| src/include/catalog/pg_proc.dat-{ oid => '2137', descr => 'wait for a backend 
process exit or timeout occurs',
=> should just say "wait for a backend process to exit".  The timeout has a 
default.




Re: use `proc->pgxactoff` as the value of `index` in `ProcArrayRemove()`

2021-06-11 Thread 盏一
Ok! Thanks
--




--原始邮件--
发件人:"Andres Freund "

Re: use `proc->pgxactoff` as the value of `index` in `ProcArrayRemove()`

2021-06-11 Thread Andres Freund
Hi,

On 2021-05-07 04:36:25 +0800, 盏一 wrote:
> Sounds like a plan! Do you want to write a patch?
> Add the patch.

Thanks for the patch. I finally pushed an edited version of it. There
were other loops over ->pgprocnos, so I put assertions in those - that
gains us a a good bit more checking than we had before...

I also couldn't resist to do some small formatting cleanups - I found
the memmove calls just too hard to read.

I took the authorship information as you had it in the diff you attached
- I hope that's OK?

Greetings,

Andres Freund




Re: A new function to wait for the backend exit after termination

2021-06-11 Thread Noah Misch
On Fri, Jun 11, 2021 at 08:54:08PM -0500, Justin Pryzby wrote:
> On Sat, Jun 05, 2021 at 12:08:01PM -0700, Noah Misch wrote:
> > > > My preference is to remove pg_wait_for_backend_termination().  The use 
> > > > case
> > > > that prompted this thread used pg_terminate_backend(pid, 18); it 
> > > > doesn't
> > > > need pg_wait_for_backend_termination().
> 
> Is this an Opened Issue ?

An Open Item?  Not really, since there's no objective defect.  Nonetheless,
the attached is what I'd like to use.
Author: Noah Misch 
Commit: Noah Misch 

Remove pg_wait_for_backend_termination().

It was unable wait on a backend that had already left the procarray.
Users tolerant of that limitation can poll pg_stat_activity.  Other
users can employ the "timeout" argument of pg_terminate_backend().

Reviewed by FIXME.

Discussion: https://postgr.es/m/20210605013236.ga208...@rfd.leadboat.com

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fbc80c1..d387a32 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25002,23 +25002,6 @@ SELECT collation for ('foo' COLLATE "de_DE");
 false is returned.

   
-
-  
-   
-
- pg_wait_for_backend_termination
-
-pg_wait_for_backend_termination ( 
pid integer, timeout 
bigint DEFAULT 5000 )
-boolean
-   
-   
-Waits for the backend process with the specified Process ID to
-terminate.  If the process terminates before
-the timeout (in milliseconds)
-expires, true is returned.  On timeout, a warning
-is emitted and false is returned.
-   
-  
  
 

diff --git a/doc/src/sgml/release-14.sgml b/doc/src/sgml/release-14.sgml
index a2ad120..045f2b0 100644
--- a/doc/src/sgml/release-14.sgml
+++ b/doc/src/sgml/release-14.sgml
@@ -611,13 +611,7 @@ Author: Magnus Hagander 
 -->
 
   
-   Add function pg_wait_for_backend_termination()
-   that waits for session exit (Bharath Rupireddy)
-  
-
-  
-   Also add a similar optional wait parameter to pg_terminate_backend()
   
  
diff --git a/src/backend/catalog/system_functions.sql 
b/src/backend/catalog/system_functions.sql
index a4373b1..a416e94 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -397,11 +397,6 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_terminate_backend'
   PARALLEL SAFE;
 
-CREATE OR REPLACE FUNCTION
-  pg_wait_for_backend_termination(pid integer, timeout int8 DEFAULT 5000)
-  RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 
'pg_wait_for_backend_termination'
-  PARALLEL SAFE;
-
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text 
boolean DEFAULT false)
diff --git a/src/backend/storage/ipc/signalfuncs.c 
b/src/backend/storage/ipc/signalfuncs.c
index 8376994..5ed8b2e 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -231,42 +231,6 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 }
 
 /*
- * Wait for a backend process with the given PID to exit or until the given
- * timeout milliseconds occurs. Returns true if the backend has exited. On
- * timeout a warning is emitted and false is returned.
- *
- * We allow any user to call this function, consistent with any user being
- * able to view the pid of the process in pg_stat_activity etc.
- */
-Datum
-pg_wait_for_backend_termination(PG_FUNCTION_ARGS)
-{
-   int pid;
-   int64   timeout;
-   PGPROC *proc = NULL;
-
-   pid = PG_GETARG_INT32(0);
-   timeout = PG_GETARG_INT64(1);
-
-   if (timeout <= 0)
-   ereport(ERROR,
-   (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-errmsg("\"timeout\" must not be negative or 
zero")));
-
-   proc = BackendPidGetProc(pid);
-
-   if (proc == NULL)
-   {
-   ereport(WARNING,
-   (errmsg("PID %d is not a PostgreSQL server 
process", pid)));
-
-   PG_RETURN_BOOL(false);
-   }
-
-   PG_RETURN_BOOL(pg_wait_until_termination(pid, timeout));
-}
-
-/*
  * Signal to reload the database configuration
  *
  * Permission checking for this function is managed through the normal
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index acbcae4..2d45765 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6191,10 +6191,6 @@
   proname => 'pg_terminate_backend', provolatile => 'v', prorettype => 'bool',
   proargtypes => 'int4 int8', proargnames => '{pid,timeout}',
   prosrc => 'pg_terminate_backend' },
-{ oid => '2137', descr => 'wait for a backend process exit or timeout occurs',
-  proname => 'pg_wait_for_backend_termination', 

Re: Add client connection check during the execution of the query

2021-06-11 Thread Thomas Munro
On Fri, Apr 30, 2021 at 2:23 PM Thomas Munro  wrote:
> Here's something I wanted to park here to look into for the next
> cycle:  it turns out that kqueue's EV_EOF flag also has the right
> semantics for this.  That leads to the idea of exposing the event via
> the WaitEventSet API, and would the bring
> client_connection_check_interval feature to 6/10 of our OSes, up from
> 2/10.  Maybe Windows' FD_CLOSE event could get us up to 7/10, not
> sure.

Rebased.  Added documentation tweak and a check to reject the GUC on
unsupported OSes.
From 356b0dcbc15353b9bd349972c80a7f2e5c516a0e Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 30 Apr 2021 10:38:40 +1200
Subject: [PATCH v2 1/2] Add WL_SOCKET_CLOSED for socket shutdown events.

Provide a way for WaitEventSet to report that the remote peer has shut
down its socket, independently of whether there is any buffered data
remaining to be read.  This works only on systems where the kernel
exposes that information, namely:

* WAIT_USE_POLL builds, using (non-standard) POLLRDHUP
* WAIT_USE_EPOLL builds, using EPOLLRDHUP
* WAIT_USE_KQUEUE builds, using EV_EOF

Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
---
 src/backend/storage/ipc/latch.c | 79 +
 src/include/storage/latch.h |  6 ++-
 2 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 1d893cf863..54e928c564 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -841,6 +841,7 @@ FreeWaitEventSet(WaitEventSet *set)
  * - WL_SOCKET_CONNECTED: Wait for socket connection to be established,
  *	 can be combined with other WL_SOCKET_* events (on non-Windows
  *	 platforms, this is the same as WL_SOCKET_WRITEABLE)
+ * - WL_SOCKET_CLOSED: Wait for socket to be closed by remote peer.
  * - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies
  *
  * Returns the offset in WaitEventSet->events (starting from 0), which can be
@@ -1043,12 +1044,16 @@ WaitEventAdjustEpoll(WaitEventSet *set, WaitEvent *event, int action)
 	else
 	{
 		Assert(event->fd != PGINVALID_SOCKET);
-		Assert(event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE));
+		Assert(event->events & (WL_SOCKET_READABLE |
+WL_SOCKET_WRITEABLE |
+WL_SOCKET_CLOSED));
 
 		if (event->events & WL_SOCKET_READABLE)
 			epoll_ev.events |= EPOLLIN;
 		if (event->events & WL_SOCKET_WRITEABLE)
 			epoll_ev.events |= EPOLLOUT;
+		if (event->events & WL_SOCKET_CLOSED)
+			epoll_ev.events |= EPOLLRDHUP;
 	}
 
 	/*
@@ -1087,12 +1092,18 @@ WaitEventAdjustPoll(WaitEventSet *set, WaitEvent *event)
 	}
 	else
 	{
-		Assert(event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE));
+		Assert(event->events & (WL_SOCKET_READABLE |
+WL_SOCKET_WRITEABLE |
+WL_SOCKET_CLOSED));
 		pollfd->events = 0;
 		if (event->events & WL_SOCKET_READABLE)
 			pollfd->events |= POLLIN;
 		if (event->events & WL_SOCKET_WRITEABLE)
 			pollfd->events |= POLLOUT;
+#ifdef POLLRDHUP
+		if (event->events & WL_SOCKET_CLOSED)
+			pollfd->events |= POLLRDHUP;
+#endif
 	}
 
 	Assert(event->fd != PGINVALID_SOCKET);
@@ -1165,7 +1176,9 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
 	Assert(event->events != WL_LATCH_SET || set->latch != NULL);
 	Assert(event->events == WL_LATCH_SET ||
 		   event->events == WL_POSTMASTER_DEATH ||
-		   (event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)));
+		   (event->events & (WL_SOCKET_READABLE |
+			 WL_SOCKET_WRITEABLE |
+			 WL_SOCKET_CLOSED)));
 
 	if (event->events == WL_POSTMASTER_DEATH)
 	{
@@ -1188,9 +1201,9 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
 		 * old event mask to the new event mask, since kevent treats readable
 		 * and writable as separate events.
 		 */
-		if (old_events & WL_SOCKET_READABLE)
+		if (old_events & (WL_SOCKET_READABLE | WL_SOCKET_CLOSED))
 			old_filt_read = true;
-		if (event->events & WL_SOCKET_READABLE)
+		if (event->events & (WL_SOCKET_READABLE | WL_SOCKET_CLOSED))
 			new_filt_read = true;
 		if (old_events & WL_SOCKET_WRITEABLE)
 			old_filt_write = true;
@@ -1210,7 +1223,10 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
 	 event);
 	}
 
-	Assert(count > 0);
+	/* For WL_SOCKET_READ -> WL_SOCKET_CLOSED, no change needed. */
+	if (count == 0)
+		return;
+
 	Assert(count <= 2);
 
 	rc = kevent(set->kqueue_fd, _ev[0], count, NULL, 0, NULL);
@@ -1525,7 +1541,9 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 returned_events++;
 			}
 		}
-		else if (cur_event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE))
+		else if (cur_event->events & (WL_SOCKET_READABLE |
+	  WL_SOCKET_WRITEABLE |
+	  WL_SOCKET_CLOSED))
 		{
 			Assert(cur_event->fd != PGINVALID_SOCKET);
 
@@ -1543,6 +1561,13 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 

Re: pg_filenode_relation(0,0) elog

2021-06-11 Thread Tom Lane
Justin Pryzby  writes:
> Per sqlsmith.
> postgres=# SELECT pg_filenode_relation(0,0);
> ERROR:  unexpected duplicate for tablespace 0, relfilenode 0

Ugh.

> The usual expectation is that sql callable functions should return null rather
> than hitting elog().

Agreed, but you should put the short-circuit into the SQL-callable
function, ie pg_filenode_relation.  Lower-level callers ought not be
passing junk data.

Likely it should check the reltablespace, too.

regards, tom lane




Re: PG 14 release notes, first draft

2021-06-11 Thread Justin Pryzby
| Add Set Server Name Indication (SNI) for SSL connection packets (Peter 
Eisentraut) 
Remove "Set"

| Reduce the default value of vacuum_cost_page_miss from 10 milliseconds to 2 
(Peter Geoghegan) 
Peter mentioned that this should not say "milliseconds" (but maybe the page I'm
looking at is old).

| Cause vacuum operations to be aggressive if the table is near xid or 
multixact wraparound (Masahiko Sawada, Peter Geoghegan) 
Say "become aggressive" ?

|  Allow the arbitrary collations of partition boundary values (Tom Lane) 
Remove "the"

| Generate WAL invalidations message during command completion when using 
logical replication (Dilip Kumar, Tomas Vondra, Amit Kapila) 
invalidation messages

|  Add support for infinity and -infinity values to the numeric data type (Tom 
Lane) 
"-infinity" has markup but not "infinity" ?

| Allow vacuum to deallocate space reserved by trailing unused heap line 
pointers (Matthias van de Meent, Peter Geoghegan) 
say "reclaim space" ?




Re: [PATCH] Fix buffer not null terminated on (ecpg lib)

2021-06-11 Thread Tom Lane
Andres Freund  writes:
> On 2021-06-11 19:08:57 -0400, Tom Lane wrote:
>> Anyway, I agree that disabling that was a bit of a stopgap hack.  This
>> 'nonstring' attribute seems like it would help for ECPG's usage, at
>> least.

> nonstring is supported since gcc 8, which also brought the warnings that
> e71658523 is concerned about. Which makes me think that we should be
> able to get away without a configure test. The one complication is that
> the relevant ecpg code doesn't include c.h.

Ugh.  And we *can't* include that there.

> But I think we can just do something like:

> -charsqlstate[5];
> +charsqlstate[5]
> +#if defined(__has_attribute) && __has_attribute(nonstring)
> +__attribute__((nonstring))
> +#endif
> +;
>  };

Hmm.  Worth a try, anyway.

> Should we also include a pg_attribute_nonstring definition in c.h?
> Probably not, given that we right now don't have another user?

Yeah, no point till there's another use-case.  (I'm not sure
there ever will be, so I'm not excited about adding more
infrastructure than we have to.)

regards, tom lane




Re: [PATCH] Fix buffer not null terminated on (ecpg lib)

2021-06-11 Thread Andres Freund
Hi,

On 2021-06-11 19:08:57 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > It might be worth doing something about this, for other reasons. We have
> > disabled -Wstringop-truncation in 716585235b1. But I've enabled it in my
> > debug build, because I find it useful.
> 
> ITYM e71658523 ?  I can't find that hash in my repo.

Oops, yes.


> Anyway, I agree that disabling that was a bit of a stopgap hack.  This
> 'nonstring' attribute seems like it would help for ECPG's usage, at
> least.
> 
> > I've not looked at how much work it'd be to make a recent-ish gcc not to
> > produce lots of false positives in optimized builds.
> 
> The discussion that led up to e71658523 seemed to conclude that the
> only reasonable way to suppress the majority of those warnings was
> to get rid of the fixed-length MAXPGPATH buffers we use everywhere.
> Now that we have psprintf(), that might be more workable than before,
> but the effort-to-reward ratio still doesn't seem promising.

Hm - the MAXPGPATH stuff is about -Wno-format-truncation though, right?

I now tried building with optimizations and -Wstringop-truncation, and
while it does result in a higher number of warnings, those are all in
ecpg and fixed with one __attribute__((nonstring)).

nonstring is supported since gcc 8, which also brought the warnings that
e71658523 is concerned about. Which makes me think that we should be
able to get away without a configure test. The one complication is that
the relevant ecpg code doesn't include c.h. But I think we can just do
something like:

diff --git i/src/interfaces/ecpg/include/sqlca.h 
w/src/interfaces/ecpg/include/sqlca.h
index c5f107dd33c..d909f5ba2de 100644
--- i/src/interfaces/ecpg/include/sqlca.h
+++ w/src/interfaces/ecpg/include/sqlca.h
@@ -50,7 +50,11 @@ struct sqlca_t
 /* 6: empty */
 /* 7: empty */
 
-charsqlstate[5];
+charsqlstate[5]
+#if defined(__has_attribute) && __has_attribute(nonstring)
+__attribute__((nonstring))
+#endif
+;
 };
 
 struct sqlca_t *ECPGget_sqlca(void);

Not pretty, but I don't immediately see a really better solution?

Should we also include a pg_attribute_nonstring definition in c.h?
Probably not, given that we right now don't have another user?

Greetings,

Andres Freund




Re: An out-of-date comment in nodeIndexonlyscan.c

2021-06-11 Thread Thomas Munro
On Fri, Jun 28, 2019 at 12:55 PM Ashwin Agrawal  wrote:
> On Thu, Jun 27, 2019 at 4:33 PM Thomas Munro  wrote:
>>
>> Here's a patch I'd like to commit to fix the comment.  We could look
>> at improving the actual code after
>> https://commitfest.postgresql.org/23/2169/ is done.
>
> Change looks good to me.

... and here is the corresponding code change, with a test to
demonstrate the change.

I'm working on a proof-of-concept to skip the btree page lock
sometimes too, which seems promising, but it requires some planner
work which has temporarily pretzeled my brain.
From 1039ec4ffbb9da15812a51f4eb9e8be32520fa63 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 10 Jun 2021 23:35:16 +1200
Subject: [PATCH] Use tuple-level SIREAD locks for index-only scans.

When index-only scans manage to avoid fetching heap tuples,
previously we'd predicate-lock the whole heap page (since commit
cdf91edb).  Commits c01262a8 and 6f38d4da made it possible to lock the
tuple instead with only a TID, to match the behavior of regular index
scans and avoid some false conflicts.

Discussion: https://postgr.es/m/CAEepm%3D2GK3FVdnt5V3d%2Bh9njWipCv_fNL%3DwjxyUhzsF%3D0PcbNg%40mail.gmail.com
---
 src/backend/executor/nodeIndexonlyscan.c  | 12 --
 .../isolation/expected/index-only-scan-2.out  | 15 +++
 .../isolation/expected/index-only-scan-3.out  | 16 
 src/test/isolation/isolation_schedule |  2 +
 .../isolation/specs/index-only-scan-2.spec| 39 +++
 .../isolation/specs/index-only-scan-3.spec| 33 
 6 files changed, 113 insertions(+), 4 deletions(-)
 create mode 100644 src/test/isolation/expected/index-only-scan-2.out
 create mode 100644 src/test/isolation/expected/index-only-scan-3.out
 create mode 100644 src/test/isolation/specs/index-only-scan-2.spec
 create mode 100644 src/test/isolation/specs/index-only-scan-3.spec

diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 0754e28a9a..f7185b4519 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -243,12 +243,16 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 		/*
 		 * If we didn't access the heap, then we'll need to take a predicate
-		 * lock explicitly, as if we had.  For now we do that at page level.
+		 * lock explicitly, as if we had.  We don't have the inserter's xid,
+		 * but that's only used by PredicateLockTID to check if it matches the
+		 * caller's top level xid, and it can't possibly have been inserted by
+		 * us or the page wouldn't be all visible.
 		 */
 		if (!tuple_from_heap)
-			PredicateLockPage(scandesc->heapRelation,
-			  ItemPointerGetBlockNumber(tid),
-			  estate->es_snapshot);
+			PredicateLockTID(scandesc->heapRelation,
+			 tid,
+			 estate->es_snapshot,
+			 InvalidTransactionId);
 
 		return slot;
 	}
diff --git a/src/test/isolation/expected/index-only-scan-2.out b/src/test/isolation/expected/index-only-scan-2.out
new file mode 100644
index 00..fef5b8d398
--- /dev/null
+++ b/src/test/isolation/expected/index-only-scan-2.out
@@ -0,0 +1,15 @@
+Parsed test spec with 2 sessions
+
+starting permutation: r1 r2 w1 w2 c1 c2
+step r1: SELECT id1 FROM account WHERE id1 = 1;
+id1
+
+1  
+step r2: SELECT id2 FROM account WHERE id2 = 2;
+id2
+
+2  
+step w1: UPDATE account SET balance = 200 WHERE id1 = 1;
+step w2: UPDATE account SET balance = 200 WHERE id2 = 2;
+step c1: COMMIT;
+step c2: COMMIT;
diff --git a/src/test/isolation/expected/index-only-scan-3.out b/src/test/isolation/expected/index-only-scan-3.out
new file mode 100644
index 00..efef162779
--- /dev/null
+++ b/src/test/isolation/expected/index-only-scan-3.out
@@ -0,0 +1,16 @@
+Parsed test spec with 2 sessions
+
+starting permutation: r1 r2 w1 w2 c1 c2
+step r1: SELECT id1 FROM account WHERE id1 = 2;
+id1
+
+2  
+step r2: SELECT id2 FROM account WHERE id2 = 1;
+id2
+
+1  
+step w1: UPDATE account SET balance = 200 WHERE id1 = 1;
+step w2: UPDATE account SET balance = 200 WHERE id2 = 2;
+step c1: COMMIT;
+step c2: COMMIT;
+ERROR:  could not serialize access due to read/write dependencies among transactions
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index f4c01006fc..570aedb900 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -17,6 +17,8 @@ test: partial-index
 test: two-ids
 test: multiple-row-versions
 test: index-only-scan
+test: index-only-scan-2
+test: index-only-scan-3
 test: predicate-lock-hot-tuple
 test: update-conflict-out
 test: deadlock-simple
diff --git a/src/test/isolation/specs/index-only-scan-2.spec b/src/test/isolation/specs/index-only-scan-2.spec
new file mode 100644
index 00..cd3c599af8
--- /dev/null
+++ b/src/test/isolation/specs/index-only-scan-2.spec
@@ -0,0 +1,39 @@
+# Test the 

pg_filenode_relation(0,0) elog

2021-06-11 Thread Justin Pryzby
Per sqlsmith.

postgres=# SELECT pg_filenode_relation(0,0);
ERROR:  unexpected duplicate for tablespace 0, relfilenode 0

postgres=# \errverbose 
ERROR:  XX000: unexpected duplicate for tablespace 0, relfilenode 0
LOCATION:  RelidByRelfilenode, relfilenodemap.c:220

The usual expectation is that sql callable functions should return null rather
than hitting elog().  This also means that sqlsmith has one fewer
false-positive error.

I think it should return NULL if passed invalid relfilenode, rather than
searching pg_class and then writing a pretty scary message about duplicates.

diff --git a/src/backend/utils/cache/relfilenodemap.c 
b/src/backend/utils/cache/relfilenodemap.c
index 56d7c73d33..5a5cf853bd 100644
--- a/src/backend/utils/cache/relfilenodemap.c
+++ b/src/backend/utils/cache/relfilenodemap.c
@@ -146,6 +146,9 @@ RelidByRelfilenode(Oid reltablespace, Oid relfilenode)
ScanKeyData skey[2];
Oid relid;
 
+   if (!OidIsValid(relfilenode))
+   return InvalidOid;
+
if (RelfilenodeMapHash == NULL)
InitializeRelfilenodeMap();
 




Signed vs. Unsigned (some)

2021-06-11 Thread Ranier Vilela
Hi,

Removing legitimate warnings can it be worth it?

-1 CAST can be wrong, when there is an invalid value defined
(InvalidBucket, InvalidBlockNumber).
I think depending on the compiler -1 CAST may be different from
InvalidBucket or InvalidBlockNumber.

pg_rewind is one special case.
All cases of XLogSegNo (uint64) initialization are zero, but in pg_rewind
was used -1?
I did not find it InvalidXLogSegNo!
Not tested.

Trivial patch attached.

best regards,
Ranier Vilela


fix_signed_vs_unsigned.patch
Description: Binary data


Re: A new function to wait for the backend exit after termination

2021-06-11 Thread Justin Pryzby
On Sat, Jun 05, 2021 at 12:08:01PM -0700, Noah Misch wrote:
> > > My preference is to remove pg_wait_for_backend_termination().  The use 
> > > case
> > > that prompted this thread used pg_terminate_backend(pid, 18); it 
> > > doesn't
> > > need pg_wait_for_backend_termination().

Is this an Opened Issue ?

-- 
Justin




ProcArrayStruct->pgprocnos vs ->maxProcs vs PGPROC ordering

2021-06-11 Thread Andres Freund
Hi,

I just tried to add a, seemingly innocuous, assertion to
ProcArrayAdd()/Remove() that proc->pgprocno is < arrayP->maxProcs. That
quickly fails.

The reason for that is that PGPROC are created in the following order
1) MaxBackends normal (*[1]) backends
2) NUM_AUXILIARY_PROCS auxiliary processes
3) max_prepared_xacts prepared transactions.

and pgprocnos are assigned sequentially - they are needed to index into
ProcGlobal->allProcs.

In contrast to that procarray.c initializes maxProcs to
#define PROCARRAY_MAXPROCS  (MaxBackends + max_prepared_xacts)
i.e. without the aux processes.

Which means that some of the prepared transactions have pgprocnos that are
bigger than ProcArrayStruct->maxProcs. Hence my assertion failure.


This is obviously not a bug, but is quite hard to understand / confusing. I
think I made a similar mistake before.

I'd at least like to add a comment with a warning somewhere in ProcArrayStruct
or such.

An alternative approach would be to change the PGPROC order to instead be 1)
aux, b) normal backends, 3) prepared xacts and give aux processes a negative
or invalid pgprocno.

One small advantage of that would be that we'd not need to "skip" over the
"aux process hole" between normal and prepared xact PGPROCs in various
procarray.c routines that iterate over procs.

Greetings,

Andres Freund

[1] well, kinda. It's user backends followed by autovacuum worker, launcher,
worker processes and wal senders.




Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)

2021-06-11 Thread Bruce Momjian
On Fri, Jun 11, 2021 at 08:19:48PM -0500, Justin Pryzby wrote:
> On Fri, Jun 11, 2021 at 09:12:55PM -0400, Bruce Momjian wrote:
> > OK, I used some of your ideas and tried for something more general; 
> > patch attached.
> 
> This is good.
> 
> But I wonder if "dropped before upgrading" is too specific to pg_upgrade?
> 
> Dropping the aggregate before starting a backup to be restored into a new
> version seems like a bad way to do it.  More likely, I would restore whatever
> backup I had, get errors, and then eventually recreate the aggregates.

I am actually unclear on that.  Do people really restore a dump and just
ignore errors, or somehow track them and go back and try to fix them. 
Isn't there a cascading effect if other things depend on it?  How do
they get the object definitions from a huge dump file?  What process
should we recommend?  I have just never seen good documentation on how
this handled.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)

2021-06-11 Thread Bruce Momjian
On Fri, Jun 11, 2021 at 09:17:46PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > OK, I used some of your ideas and tried for something more general; 
> > patch attached.
> 
> I think it's a good idea to mention custom aggregates and operators
> specifically, as otherwise people will look at this and have little
> idea what you're on about.  I just want wording like "such as custom
> aggregates and operators", in case somebody has done some other creative
> thing that breaks.

Agreed, updated patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

diff --git a/doc/src/sgml/release-14.sgml b/doc/src/sgml/release-14.sgml
index 058ba7cd4e..c2d8941206 100644
--- a/doc/src/sgml/release-14.sgml
+++ b/doc/src/sgml/release-14.sgml
@@ -291,6 +291,35 @@ Author: Tom Lane 
 
 
+
+ 
+  User-defined objects that reference some built-in array functions
+  along with their argument types must be recreated (Tom Lane)
+ 
+
+ 
+  Specifically, array_append(),
+  array_prepend(),
+  array_cat(),
+  array_position(),
+  array_positions(),
+  array_remove(),
+  array_replace(), or width_bucket()
+  used to take anyarray arguments but now take
+  anycompatiblearray.  Therefore, user-defined objects
+  like aggregates and operators that reference old array function
+  signatures must be dropped before upgrading and recreated once the
+  upgrade completes.
+ 
+
+
+
+
 


Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)

2021-06-11 Thread Justin Pryzby
On Fri, Jun 11, 2021 at 09:12:55PM -0400, Bruce Momjian wrote:
> OK, I used some of your ideas and tried for something more general; 
> patch attached.

This is good.

But I wonder if "dropped before upgrading" is too specific to pg_upgrade?

Dropping the aggregate before starting a backup to be restored into a new
version seems like a bad way to do it.  More likely, I would restore whatever
backup I had, get errors, and then eventually recreate the aggregates.

> + 
> +  User-defined objects that reference some built-in array functions
> +  along with their argument types must be recreated (Tom Lane)
> + 
> +
> + 
> +  Specifically,  +  linkend="functions-array">array_append(),
...
> +  used to take anyarray arguments but now take
> +  anycompatiblearray.  Therefore, user-defined objects
> +  that reference the old array function signature must be dropped
> +  before upgrading and recreated once the upgrade completes.
> + 
> +




Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)

2021-06-11 Thread Tom Lane
Bruce Momjian  writes:
> OK, I used some of your ideas and tried for something more general; 
> patch attached.

I think it's a good idea to mention custom aggregates and operators
specifically, as otherwise people will look at this and have little
idea what you're on about.  I just want wording like "such as custom
aggregates and operators", in case somebody has done some other creative
thing that breaks.

regards, tom lane




Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)

2021-06-11 Thread Bruce Momjian
On Fri, Jun 11, 2021 at 08:56:19PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > OK, I came up with the attached patch.  This is one of the few cases
> > where the incompatibility is not clearly related to the feature, so I
> > left the existing item alone and just created a new one with the same
> > commit message in the incompatibilities section.
> 
> I think phrasing this as though user-defined aggregates are the only
> pain point is incorrect.  For example, a custom operator based on
> array_cat would have issues too.
> 
> I suggest a treatment more like
> 
> Some built-in array-related functions changed signature (Tom Lane)
> 
> Certain functions were redefined to take anycompatiblearray instead
> of anyarray.  While this does not affect ordinary calls, it does
> affect code that directly names these functions along with their
> argument types; for example, custom aggregates and operators based
> on these functions.  The affected functions are [ blah, blah ]

OK, I used some of your ideas and tried for something more general; 
patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

diff --git a/doc/src/sgml/release-14.sgml b/doc/src/sgml/release-14.sgml
index 058ba7cd4e..925690 100644
--- a/doc/src/sgml/release-14.sgml
+++ b/doc/src/sgml/release-14.sgml
@@ -291,6 +291,34 @@ Author: Tom Lane 
 
 
+
+ 
+  User-defined objects that reference some built-in array functions
+  along with their argument types must be recreated (Tom Lane)
+ 
+
+ 
+  Specifically, array_append(),
+  array_prepend(),
+  array_cat(),
+  array_position(),
+  array_positions(),
+  array_remove(),
+  array_replace(), or width_bucket()
+  used to take anyarray arguments but now take
+  anycompatiblearray.  Therefore, user-defined objects
+  that reference the old array function signature must be dropped
+  before upgrading and recreated once the upgrade completes.
+ 
+
+
+
+
 


Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)

2021-06-11 Thread Tom Lane
Bruce Momjian  writes:
> OK, I came up with the attached patch.  This is one of the few cases
> where the incompatibility is not clearly related to the feature, so I
> left the existing item alone and just created a new one with the same
> commit message in the incompatibilities section.

I think phrasing this as though user-defined aggregates are the only
pain point is incorrect.  For example, a custom operator based on
array_cat would have issues too.

I suggest a treatment more like

Some built-in array-related functions changed signature (Tom Lane)

Certain functions were redefined to take anycompatiblearray instead
of anyarray.  While this does not affect ordinary calls, it does
affect code that directly names these functions along with their
argument types; for example, custom aggregates and operators based
on these functions.  The affected functions are [ blah, blah ]


regards, tom lane




Re: cleanup temporary files after crash

2021-06-11 Thread Justin Pryzby
On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:
> > > The current behavior is only useful for debugging purposes.

On Wed, 28 Oct 2020 at 15:42, Tomas Vondra  wrote:
> > One thing I'm not sure about is whether we should have the GUC as
> > proposed, or have a negative "keep_temp_files_after_restart" defaulting
> > to false. But I don't have a very good justification for the alternative
> > other than vague personal preference.

On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:
> I thought about not providing a GUC at all or provide it in the developer
> section. I've never heard someone saying that they use those temporary
> files to investigate an issue. Regarding a crash, all information is already
> available and temporary files don't provide extra details. This new GUC is 
> just to keep the
> previous behavior. I'm fine without the GUC, though.

Should this GUC be classified as a developer option, and removed from
postgresql.sample.conf ?

That was discussed initially in October but not since.

On Wed, Oct 28, 2020 at 11:16:26AM -0300, Euler Taveira wrote:
> It also has an undesirable behavior: you have to restart the service to
> reclaim the space.

BTW, that's not really true - you can remove the tempfiles while the server is
running.  The complainant in bug#16427 was being careful - which is good.
I watch for and remove tempfiles older than 2 days.  The worst consequence of
removing a tempfile would be a failed query.  

-- 
Justin




Re: array_cat anycompatible change is breaking xversion upgrade tests (v14 release notes)

2021-06-11 Thread Bruce Momjian
On Tue, Jun  8, 2021 at 07:10:00PM -0500, Justin Pryzby wrote:
> On Tue, Jun 08, 2021 at 08:02:46PM -0400, Bruce Momjian wrote:
> > This involves creating an aggreate that _uses_ these array functions as
> > their state transition function?
> 
> Yes

OK, I came up with the attached patch.  This is one of the few cases
where the incompatibility is not clearly related to the feature, so I
left the existing item alone and just created a new one with the same
commit message in the incompatibilities section.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

diff --git a/doc/src/sgml/release-14.sgml b/doc/src/sgml/release-14.sgml
index 21659bd184..426d502a63 100644
--- a/doc/src/sgml/release-14.sgml
+++ b/doc/src/sgml/release-14.sgml
@@ -291,6 +291,32 @@ Author: Tom Lane 
 
 
+
+ 
+  User-defined aggregates that reference some built-in array functions
+  will need to be recreated (Tom Lane)
+ 
+
+ 
+  Specifically, user-defined aggregates that use the functions array_append(),
+  array_prepend(),
+  array_cat(),
+  array_position(),
+  array_positions(),
+  array_remove(),
+  array_replace(), or width_bucket()
+  as their state transition function must be dropped before
+  upgrading and recreated once the upgrade is complete.
+ 
+
+
+
+
 


Re: pg_regress.c also sensitive to various PG* environment variables

2021-06-11 Thread Michael Paquier
On Fri, Jun 11, 2021 at 10:08:20AM -0400, Alvaro Herrera wrote:
> I think if they're to be kept in sync, then the exceptions should be
> noted.  I mean, where PGCLIENTENCODING would otherwise be, I'd add
> /* PGCLIENTENCODING set above */
> /* See below for PGHOSTADDR */
> and so on (PGHOST and PGPORT probably don't need this because they're
> immediately below; not sure; but I would put them in alphabetical order
> in both lists for sure and then that wouldn't apply).  Otherwise I would
> think that it's an omission and would set to fix it.

Good idea, thanks.  I'll add comments for each one that cannot be
unsetted.
--
Michael


signature.asc
Description: PGP signature


Re: Table AM modifications to accept column projection lists

2021-06-11 Thread Jacob Champion
On Sat, 2021-06-05 at 09:47 -0700, Zhihong Yu wrote:
> On Fri, Jun 4, 2021 at 4:14 PM Jacob Champion  wrote:
> > Agreed. I'm going to double-check with Deep that the new calls
> > to table_tuple_fetch_row_version() should be projecting the full row,
> > then post an updated patch some time next week.

(The discussions over the fallout of the inheritance_planner fallout
are still going, but in the meantime here's an updated v4 that builds
and passes `make check`.)

> +   return 
> relation->rd_tableam->scan_begin_with_column_projection(relation, snapshot, 
> 0, NULL,
> +   parallel_scan, flags, proj);
> 
> scan_begin_with_column_projection() adds a parameter to scan_begin().
> Can scan_begin() be enhanced with this projection parameter ?
> Otherwise in the future we may have 
> scan_begin_with_column_projection_with_x_y ...

Maybe; I agree that would match the current "extension" APIs a little
better. I'll let Deep and/or Ashwin chime in on why this design was
chosen.

> +   /* Make sure the the new slot is not dependent on the original tuple */
> 
> Double 'the' in the comment. More than one place with duplicate 'the'
> in the patch.

Fixed.

> +typedef struct neededColumnContext
> +{
> +   Bitmapset **mask;
> +   int n;
> 
> Should field n be named ncol ? 'n' seems too general.

Agreed; changed to ncol.

> +* TODO: Remove this hack!! This should be done once at the start of 
> the tid scan.
> 
> Would the above be addressed in the next patch ?

I have not had time to get to this in v4, sorry.

> Toward the end of extract_scan_columns():
> 
> +   bms_free(rte->scanCols);
> +   rte->scanCols = bms_make_singleton(0);
> +   break;
> 
> Should 'goto outer;' be in place of 'break;' (since rte->scanCols has
> been assigned for whole-row) ?

Agreed and fixed. Thank you!

--Jacob
From 7ac00847f17e2a0448ae06370598db0f034fcc7c Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Tue, 2 Mar 2021 08:59:45 -0800
Subject: [PATCH v4] tableam: accept column projection list

TODO: check the additional bms_make_singleton(0) calls in
  src/backend/executor/nodeModifyTable.c

Co-authored-by: Soumyadeep Chakraborty 
Co-authored-by: Melanie Plageman 
Co-authored-by: Ashwin Agrawal 
Co-authored-by: Jacob Champion 
---
 src/backend/access/heap/heapam_handler.c |   5 +-
 src/backend/access/nbtree/nbtsort.c  |   3 +-
 src/backend/access/table/tableam.c   |   5 +-
 src/backend/commands/trigger.c   |  19 +++-
 src/backend/executor/execMain.c  |   3 +-
 src/backend/executor/execPartition.c |   2 +
 src/backend/executor/execReplication.c   |   6 +-
 src/backend/executor/execScan.c  | 108 +
 src/backend/executor/nodeIndexscan.c |  10 ++
 src/backend/executor/nodeLockRows.c  |   7 +-
 src/backend/executor/nodeModifyTable.c   |  47 +++--
 src/backend/executor/nodeSeqscan.c   |  43 -
 src/backend/executor/nodeTidscan.c   |  11 ++-
 src/backend/nodes/copyfuncs.c|   2 +
 src/backend/nodes/equalfuncs.c   |   2 +
 src/backend/nodes/outfuncs.c |   2 +
 src/backend/nodes/readfuncs.c|   2 +
 src/backend/optimizer/path/allpaths.c|  85 +++-
 src/backend/optimizer/prep/preptlist.c   |  13 ++-
 src/backend/optimizer/util/inherit.c |  37 ++-
 src/backend/parser/analyze.c |  40 ++--
 src/backend/parser/parse_relation.c  |  18 
 src/backend/partitioning/partbounds.c|  14 ++-
 src/backend/rewrite/rewriteHandler.c |   8 ++
 src/include/access/tableam.h | 118 +--
 src/include/executor/executor.h  |   6 ++
 src/include/nodes/execnodes.h|   1 +
 src/include/nodes/parsenodes.h   |  14 +++
 src/include/utils/rel.h  |  14 +++
 29 files changed, 598 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index e2cd79ec54..e440ae4a69 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -182,7 +182,8 @@ static bool
 heapam_fetch_row_version(Relation relation,
 		 ItemPointer tid,
 		 Snapshot snapshot,
-		 TupleTableSlot *slot)
+		 TupleTableSlot *slot,
+		 Bitmapset *project_cols)
 {
 	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
 	Buffer		buffer;
@@ -350,7 +351,7 @@ static TM_Result
 heapam_tuple_lock(Relation relation, ItemPointer tid, Snapshot snapshot,
   TupleTableSlot *slot, CommandId cid, LockTupleMode mode,
   LockWaitPolicy wait_policy, uint8 flags,
-  TM_FailureData *tmfd)
+  TM_FailureData *tmfd, Bitmapset *project_cols)
 {
 	BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot;
 	TM_Result	result;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c

Re: unnesting multirange data types

2021-06-11 Thread Alexander Korotkov
()On Sat, Jun 12, 2021 at 2:30 AM Justin Pryzby  wrote:
> On Fri, Jun 11, 2021 at 11:37:58PM +0300, Alexander Korotkov wrote:
> > On Fri, Jun 11, 2021 at 1:04 AM Justin Pryzby  wrote:
> > >
> > > +{ oid => '1293', descr => 'expand mutlirange to set of ranges',
> > >
> > > typo: mutlirange
> >
> > Fixed, thanks.
> >
> > The patch with the implementation of both unnest() and cast to array
> > is attached.  It contains both tests and docs.
>
> |+   The multirange could be explicitly cast to the array of corresponding
> should say: "can be cast to an array of corresponding.."
>
> |+ * Cast multirange to the array of ranges.
> I think should be: *an array of ranges

Thank you for catching this.

> Per sqlsmith, this is causing consistent crashes.
> I took one of its less appalling queries and simplified it to this:
>
> select
> pg_catalog.multirange_to_array(
> cast(pg_catalog.int8multirange() as int8multirange)) as c2
> from (select 1)x;

It seems that multirange_to_array() doesn't handle empty multiranges.
I'll post an updated version of the patch tomorrow.

--
Regards,
Alexander Korotkov




Re: unnesting multirange data types

2021-06-11 Thread Justin Pryzby
On Fri, Jun 11, 2021 at 11:37:58PM +0300, Alexander Korotkov wrote:
> On Fri, Jun 11, 2021 at 1:04 AM Justin Pryzby  wrote:
> >
> > +{ oid => '1293', descr => 'expand mutlirange to set of ranges',
> >
> > typo: mutlirange
> 
> Fixed, thanks.
> 
> The patch with the implementation of both unnest() and cast to array
> is attached.  It contains both tests and docs.

|+   The multirange could be explicitly cast to the array of corresponding
should say: "can be cast to an array of corresponding.."

|+ * Cast multirange to the array of ranges.
I think should be: *an array of ranges

Per sqlsmith, this is causing consistent crashes.
I took one of its less appalling queries and simplified it to this:

select
pg_catalog.multirange_to_array(
cast(pg_catalog.int8multirange() as int8multirange)) as c2
from (select 1)x;

-- 
Justin




Re: [PATCH] Fix buffer not null terminated on (ecpg lib)

2021-06-11 Thread Tom Lane
Andres Freund  writes:
> It might be worth doing something about this, for other reasons. We have
> disabled -Wstringop-truncation in 716585235b1. But I've enabled it in my
> debug build, because I find it useful.

ITYM e71658523 ?  I can't find that hash in my repo.  Anyway, I agree
that disabling that was a bit of a stopgap hack.  This 'nonstring'
attribute seems like it would help for ECPG's usage, at least.

> I've not looked at how much work it'd be to make a recent-ish gcc not to
> produce lots of false positives in optimized builds.

The discussion that led up to e71658523 seemed to conclude that the
only reasonable way to suppress the majority of those warnings was
to get rid of the fixed-length MAXPGPATH buffers we use everywhere.
Now that we have psprintf(), that might be more workable than before,
but the effort-to-reward ratio still doesn't seem promising.

regards, tom lane




Re: [PATCH] Fix buffer not null terminated on (ecpg lib)

2021-06-11 Thread Andres Freund
Hi,

On 2020-04-23 14:36:15 +0900, Kyotaro Horiguchi wrote:
> At Thu, 23 Apr 2020 01:21:21 -0300, Ranier Vilela  wrote 
> in 
> > Em qua., 22 de abr. de 2020 às 23:27, Kyotaro Horiguchi <
> > horikyota@gmail.com> escreveu:
> > >
> > > -   strncpy(sqlca->sqlerrm.sqlerrmc, message,
> > > sizeof(sqlca->sqlerrm.sqlerrmc));
> > > -   sqlca->sqlerrm.sqlerrmc[sizeof(sqlca->sqlerrm.sqlerrmc) - 1] = 0;
> > > +   sqlca->sqlerrm.sqlerrmc[sizeof(sqlca->sqlerrm.sqlerrmc) - 1] =
> > > '\0';
> > > +   strncpy(sqlca->sqlerrm.sqlerrmc, message,
> > > sizeof(sqlca->sqlerrm.sqlerrmc) - 1);
> > >
> > > The existing strncpy then terminating by NUL works fine. I don't think
> > > there's any point in doing the reverse way.  Actually
> > > sizeof(sqlca->sqlerrm.sqlerrmc) - 1 is enough for the length but the
> > > existing code is not necessarily a bug.
> > >
> > Without understanding then, why Coveriy claims bug here.
> 
> Well, handling non-terminated strings with str* functions are a sign
> of bug in most cases.  Coverity is very useful but false positives are
> annoying.  I wonder what if we attach Coverity annotations to such
> codes.

It might be worth doing something about this, for other reasons. We have
disabled -Wstringop-truncation in 716585235b1. But I've enabled it in my
debug build, because I find it useful. The only warning we're getting
in non-optimized builds is

/home/andres/src/postgresql/src/interfaces/ecpg/ecpglib/misc.c: In function 
‘ECPGset_var’:
/home/andres/src/postgresql/src/interfaces/ecpg/ecpglib/misc.c:565:17: warning: 
‘strncpy’ output truncated before terminating nul copying 5 bytes from a string 
of the same length [-Wstringop-truncation]
  565 | strncpy(sqlca->sqlstate, "YE001", 
sizeof(sqlca->sqlstate));
  | 
^~

One way we could address this is to use the 'nonstring' attribute gcc
has introduced, signalling that sqlca_t->sqlstate isn't zero
terminated. That removes the above warning.

https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#Common-Variable-Attributes

"The nonstring variable attribute specifies that an object or member 
declaration with type array of char, signed char, or unsigned char, or pointer 
to such a type is intended to store character arrays that do not necessarily 
contain a terminating NUL. This is useful in detecting uses of such arrays or 
pointers with functions that expect NUL-terminated strings, and to avoid 
warnings when such an array or pointer is used as an argument to a bounded 
string manipulation function such as strncpy. For example, without the 
attribute, GCC will issue a warning for the strncpy call below because it may 
truncate the copy without appending the terminating NUL character. Using the 
attribute makes it possible to suppress the warning. However, when the array is 
declared with the attribute the call to strlen is diagnosed because when the 
array doesn’t contain a NUL-terminated string the call is undefined. To copy, 
compare, of search non-string character arrays use the memcpy, memcmp, memchr, 
and other functions that operate on arrays of bytes. In addition, calling 
strnlen and strndup with such arrays is safe provided a suitable bound is 
specified, and not diagnosed. "

I've not looked at how much work it'd be to make a recent-ish gcc not to
produce lots of false positives in optimized builds.

Greetings,

Andres Freund




Re: Fdw batch insert error out when set batch_size > 65535

2021-06-11 Thread Tom Lane
Tomas Vondra  writes:
> There's one caveat, though - for regular builds the slowdown is pretty
> much eliminated. But with valgrind it's still considerably slower. For
> postgres_fdw the "make check" used to take ~5 minutes for me, now it
> takes >1h. And yes, this is entirely due to the new test case which is
> generating / inserting 70k rows. So maybe the test case is not worth it
> after all, and we should get rid of it.

I bet the CLOBBER_CACHE animals won't like it much either.

I suggest what we do is leave it in place for long enough to get
a round of reports from those slow animals, and then (assuming
those reports are positive) drop the test.

regards, tom lane




Re: Fdw batch insert error out when set batch_size > 65535

2021-06-11 Thread Tomas Vondra



On 6/9/21 4:05 PM, Tomas Vondra wrote:
> On 6/9/21 3:28 PM, Tom Lane wrote:
>> Tomas Vondra  writes:
>>> Note that the problem here is [1] - we're creating a lot of slots
>>> referencing the same tuple descriptor, which inflates the duration.
>>> There's a fix in the other thread, which eliminates ~99% of the
>>> overhead. I plan to push that fix soon (a day or two).
>>
>> Oh, okay, as long as there's a plan to bring the time back down.
>>
> 
> Yeah. Sorry for not mentioning this in the original message about the
> new regression test.
> 

I've pushed a fix addressing the performance issue.

There's one caveat, though - for regular builds the slowdown is pretty
much eliminated. But with valgrind it's still considerably slower. For
postgres_fdw the "make check" used to take ~5 minutes for me, now it
takes >1h. And yes, this is entirely due to the new test case which is
generating / inserting 70k rows. So maybe the test case is not worth it
after all, and we should get rid of it.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: recovery test failures on hoverfly

2021-06-11 Thread Tom Lane
I wrote:
> Michael Paquier  writes:
>> This is the same problem as c757a3da and 6d41dd0, where we write a
>> query to a pipe but the kill, causing a failure, makes the test fail
>> with a SIGPIPE in IPC::Run as a query is sent down to a pipe.

> The precedent of the previous fixes would seem to suggest seeing if
> we can replace 'SELECT 1' with "undef".  Not sure if that'll work
> without annoying changes to poll_query_until, though.

I noticed that elver failed this same way today, so that got me
annoyed enough to pursue a fix.  Using "undef" as poll_query_until's
input almost works, except it turns out that it fails to notice psql
connection failures in that case!  It is *only* looking at psql's
stdout, not at either stderr or the exit status, which seems seriously
bogus in its own right; not least because poll_query_until's own
documentation claims it will continue waiting after an error, which
is exactly what it's not doing.  So I propose the attached.

(I first tried to make it check $result == 0, but it seems there are a
lot of cases where psql returns status 1 in these tests.  That seems
pretty bogus too, but probably beta is no time to change that
behavior.)

regards, tom lane

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 45d1636128..2027cbf43d 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2140,8 +2140,10 @@ sub poll_query_until
 
 		$stdout =~ s/\r\n/\n/g if $Config{osname} eq 'msys';
 		chomp($stdout);
+		$stderr =~ s/\r\n/\n/g if $Config{osname} eq 'msys';
+		chomp($stderr);
 
-		if ($stdout eq $expected)
+		if ($stdout eq $expected && $stderr eq '')
 		{
 			return 1;
 		}
@@ -2154,8 +2156,6 @@ sub poll_query_until
 
 	# The query result didn't change in 180 seconds. Give up. Print the
 	# output from the last attempt, hopefully that's useful for debugging.
-	$stderr =~ s/\r\n/\n/g if $Config{osname} eq 'msys';
-	chomp($stderr);
 	diag qq(poll_query_until timed out executing this query:
 $query
 expecting this output:
diff --git a/src/test/recovery/t/013_crash_restart.pl b/src/test/recovery/t/013_crash_restart.pl
index e1c36abe97..868a50b33d 100644
--- a/src/test/recovery/t/013_crash_restart.pl
+++ b/src/test/recovery/t/013_crash_restart.pl
@@ -136,12 +136,8 @@ ok( pump_until(
 $monitor->finish;
 
 # Wait till server restarts
-is( $node->poll_query_until(
-		'postgres',
-		'SELECT $$restarted after sigquit$$;',
-		'restarted after sigquit'),
-	"1",
-	"reconnected after SIGQUIT");
+is($node->poll_query_until('postgres', undef, ''),
+	"1", "reconnected after SIGQUIT");
 
 
 # restart psql processes, now that the crash cycle finished
@@ -216,7 +212,7 @@ ok( pump_until(
 $monitor->finish;
 
 # Wait till server restarts
-is($node->poll_query_until('postgres', 'SELECT 1', '1'),
+is($node->poll_query_until('postgres', undef, ''),
 	"1", "reconnected after SIGKILL");
 
 # Make sure the committed rows survived, in-progress ones not
diff --git a/src/test/recovery/t/022_crash_temp_files.pl b/src/test/recovery/t/022_crash_temp_files.pl
index b912f4b232..157ddba8cf 100644
--- a/src/test/recovery/t/022_crash_temp_files.pl
+++ b/src/test/recovery/t/022_crash_temp_files.pl
@@ -139,7 +139,7 @@ $killme->finish;
 $killme2->finish;
 
 # Wait till server restarts
-$node->poll_query_until('postgres', 'SELECT 1', '1');
+$node->poll_query_until('postgres', undef, '');
 
 # Check for temporary files
 is( $node->safe_psql(
@@ -228,7 +228,7 @@ $killme->finish;
 $killme2->finish;
 
 # Wait till server restarts
-$node->poll_query_until('postgres', 'SELECT 1', '1');
+$node->poll_query_until('postgres', undef, '');
 
 # Check for temporary files -- should be there
 is( $node->safe_psql(


Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-06-11 Thread Peter Geoghegan
On Sun, May 30, 2021 at 6:30 PM Masahiko Sawada  wrote:
> > Another concern with this approach is what it
> > means for the VACUUM command itself. I haven't added an 'auto'
> > spelling that is accepted by the VACUUM command in this POC version.
> > But do I need to at all? Can that just be implied by not having any
> > INDEX_CLEANUP option?
>
> It seems to me that it's better to have INDEX_CLEANUP option of VACUUM
> command support AUTO for consistency. Do you have any concerns about
> supporting it?

I suppose we should have it. But now we have to teach vacuumdb about
this new boolean-like enum too. It's a lot more new code than I would
have preferred, but I suppose that it makes sense.


-- 
Peter Geoghegan




Re: Replication protocol doc fix

2021-06-11 Thread Jeff Davis
On Fri, 2021-06-11 at 16:57 -0400, Robert Haas wrote:
> My impression was that CopyBoth can be initiated either way, 

The docs say: "In either physical replication or logical replication
walsender mode, only the simple query protocol can be used." Is there
some way to initiate CopyBoth outside of walsender?

> but if
> you use the extended query protocol, then the result is a hopeless
> mess, because the protocol is badly designed:
> 
> 
https://www.postgresql.org/message-id/ca+tgmoa4ea+cpxaigqmebp9xisvd3ze9dbvnbzevx9ucmiw...@mail.gmail.com

It seems like you're saying that CopyIn and CopyBoth are both equally
broken in extended query mode. Is that right?

> But I think you're correct in saying that the discard-until-Sync
> behavior only happens if the extended query protocol is used, so I
> agree that the current text is wrong.

Should we just document how CopyBoth works with the simple query
protocol, or should we make it match the CopyIn docs?

Regards,
Jeff Davis






Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-06-11 Thread Peter Geoghegan
On Thu, Jun 3, 2021 at 11:15 PM Michael Paquier  wrote:
> I have read through the patch, and I am surprised to see that this
> only makes possible to control the optimization at relation level.
> The origin of the complaints is that this index cleanup optimization
> has been introduced as a new rule that gets enforced at *system*
> level, so I think that we should have an equivalent with a GUC to
> control the behavior for the whole system.

*Why* does it have to work at the system level? I don't understand
what you mean about the system level.

As Masahiko pointed out, adding a GUC isn't what we've done in other
similar cases -- that's how DISABLE_PAGE_SKIPPING works, which was a
defensive option that seems similar enough to what we want to add now.
To give another example, the TRUNCATE VACUUM option (or the related
reloption) can be used to disable relation truncation, a behavior that
sometimes causes *big* issues in production. The truncate behavior is
determined dynamically in most situations -- which is another
similarity to the optimization we've added here.

Why is this fundamentally different to those two things?

> With what you are
> presenting here, one could only disable the optimization for each
> relation, one-by-one.  If this optimization proves to be a problem,
> it's just going to be harder to users to go through all the relations
> and re-tune autovacuum.  Am I missing something?

Why would you expect autovacuum to run even when the optimization is
unavailable (e.g. with Postgres 13)? After all, the specifics of when
the bypass optimization kicks in make it very unlikely that ANALYZE
will ever be able to notice enough dead tuples to trigger an
autovacuum (barring antiwraparound and insert-driven autovacuums).
There will probably be very few LP_DEAD items remaining. Occasionally
there will be somewhat more LP_DEAD items, that happen to be
concentrated in less than 2% of the table's blocks -- but block-based
sampling by ANALYZE is likely to miss most of them and underestimate
the total number. The sampling approach taken by acquire_sample_rows()
ensures this with larger tables. With small tables the chances of the
optimization kicking in are very low, unless perhaps fillfactor has
been tuned very aggressively.

There has never been a guarantee that autovacuum will be triggered
(and do index vacuuming) in cases that have very few LP_DEAD items, no
matter how the system has been tuned. The main reason why making the
optimization behavior controllable is for the VACUUM command.
Principally for hackers. I can imagine myself using the VACUUM option
to disable the optimization when I was interested in testing VACUUM or
space utilization in some specific, narrow way.

Of course it's true that there is still some uncertainty about the
optimization harming production workloads -- that is to be expected
with an enhancement like this one. But there is still no actual
example or test case that shows the optimization doing the wrong
thing, or anything like it. Anything is possible, but I am not
expecting there to be even one user complaint about the feature.
Naturally I don't want to add something as heavyweight as a GUC, given
all that.

-- 
Peter Geoghegan




Re: pg_stat_progress_create_index vs. parallel index builds

2021-06-11 Thread Alvaro Herrera
On 2021-Jun-04, Greg Nancarrow wrote:

> On Thu, Jun 3, 2021 at 1:49 AM Matthias van de Meent
>  wrote:
> >
> > On Wed, 2 Jun 2021 at 17:42, Tomas Vondra  
> > wrote:
> > >
> > > Nice. I gave it a try on the database I'm experimenting with, and it
> > > seems to be working fine. Please add it to the next CF.
> >
> > Thanks, cf available here: https://commitfest.postgresql.org/33/3149/
> 
> The patch looks OK to me. It seems apparent that the lines added by
> the patch are missing from the current source in the parallel case.
> 
> I tested with and without the patch, using the latest PG14 source as
> of today, and can confirm that without the patch applied, the "sorting
> live tuples" phase is not reported in the parallel-case, but with the
> patch applied it then does get reported in that case. I also confirmed
> that, as you said, the patch only addresses the usual case where the
> parallel leader participates in the parallel operation.

So, with Matthias' patch applied and some instrumentation to log (some)
parameter updates, this is what I get on building an index in parallel.
The "subphase" is parameter 10:

2021-06-09 17:04:30.692 -04 19194 WARNING:  updating param 0 to 1
2021-06-09 17:04:30.692 -04 19194 WARNING:  updating param 6 to 0
2021-06-09 17:04:30.692 -04 19194 WARNING:  updating param 8 to 403
2021-06-09 17:04:30.696 -04 19194 WARNING:  updating param 9 to 2
2021-06-09 17:04:30.696 -04 19194 WARNING:  updating param 10 to 1
2021-06-09 17:04:30.696 -04 19194 WARNING:  updating param 11 to 0
2021-06-09 17:04:30.696 -04 19194 WARNING:  updating param 15 to 0
2021-06-09 17:04:30.696 -04 19194 WARNING:  updating param 10 to 2
2021-06-09 17:04:30.696 -04 19194 WARNING:  updating param 15 to 486726
2021-06-09 17:04:37.418 -04 19194 WARNING:  updating param 10 to 3  <-- 
this one is new
2021-06-09 17:04:42.215 -04 19194 WARNING:  updating param 11 to 11000
2021-06-09 17:04:42.215 -04 19194 WARNING:  updating param 15 to 0
2021-06-09 17:04:42.215 -04 19194 WARNING:  updating param 10 to 3
2021-06-09 17:04:42.237 -04 19194 WARNING:  updating param 10 to 5

The thing to note is that we set subphase to 3 twice.  The first of
those is added by the patch to _bt_parallel_scan_and_sort.  The second
is in _bt_leafbuild, just before setting the subphase to LEAF_LOAD.  So
the change is that we set the subphase to "sorting live tuples" five
seconds ahead of what we were doing previously.  Seems ok.  (We could
alternatively skip the progress update call in _bt_leafbuild; but those
calls are so cheap that adding a conditional jump is almost as
expensive.)

(The other potential problem might be to pointlessly invoke the progress
update calls when in a worker.  But that's already covered because only
the leader passes progress=true to _bt_parallel_scan_and_sort.)

I'll push now.

-- 
Álvaro Herrera   Valdivia, Chile




Re: postgres_fdw batching vs. (re)creating the tuple slots

2021-06-11 Thread Tomas Vondra
On 6/9/21 1:08 PM, Tomas Vondra wrote:
> 
> 
> On 6/9/21 12:50 PM, Bharath Rupireddy wrote:
>> On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
>>  wrote:
>>>
>>> Hi,
>>>
>>> Here's a v2 fixing a silly bug with reusing the same variable in two
>>> nested loops (worked for simple postgres_fdw cases, but "make check"
>>> failed).
>>
>> I applied these patches and ran make check in postgres_fdw contrib
>> module, I saw a server crash. Is it the same failure you were saying
>> above?
>>
> 
> Nope, that was causing infinite loop. This is jut a silly mistake on my
> side - I forgot to replace the i/j variable inside the loop. Here's v3.
> 
> regards
> 

FWIW I've pushed this, after improving the comments a little bit.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Replication protocol doc fix

2021-06-11 Thread Robert Haas
On Thu, Jun 10, 2021 at 9:26 PM Jeff Davis  wrote:
> The docs currently say (introduced in commit 91fa853):
>
> "In the event of a backend-detected error during copy-both mode, the
> backend will issue an ErrorResponse message, discard frontend messages
> until a Sync message is received, and then issue ReadyForQuery and
> return to normal processing."
>
> But that doesn't seem to be correct: Sync is only used for the extended
> query protocol, and CopyBoth can only be initiated with the simple
> query protocol.

My impression was that CopyBoth can be initiated either way, but if
you use the extended query protocol, then the result is a hopeless
mess, because the protocol is badly designed:

https://www.postgresql.org/message-id/ca+tgmoa4ea+cpxaigqmebp9xisvd3ze9dbvnbzevx9ucmiw...@mail.gmail.com

But I think you're correct in saying that the discard-until-Sync
behavior only happens if the extended query protocol is used, so I
agree that the current text is wrong.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Question about StartLogicalReplication() error path

2021-06-11 Thread Andres Freund
On 2021-06-11 12:07:24 -0700, Jeff Davis wrote:
> On Fri, 2021-06-11 at 11:56 -0700, Andres Freund wrote:
> > That doesn't work as easily in older versions because there was no
> > SQL
> > support in replication connections until PG 10...
> 
> 9.6 will be EOL this year. I don't really see why such old versions are
> relevant to this discussion.

It's relevant to understand how we ended up here.




Re: Question about StartLogicalReplication() error path

2021-06-11 Thread Andres Freund
Hi,

On 2021-06-11 16:05:10 -0400, Robert Haas wrote:
> You seem to see this as some kind of major problem and I guess I don't
> agree. I think it's pretty clear what the motivation was for the
> current behavior, because I believe it's well-explained by the comment
> and the three people who have tried to answer your question. I also
> think it's pretty clear why somebody might find it surprising: someone
> might think that fast-forwarding is harmful and risky rather than a
> useful convenience. As evidence for the fact that someone might think
> that, I offer the fact that you seem to think exactly that thing. I
> also think that there's pretty good evidence that the behavior as it
> exists is not really so bad. As far as I know, and I certainly might
> have missed something, you're the first one to complain about behavior
> that we've had for quite a long time now, and you seem to be saying
> that it might cause problems for somebody, not that you know it
> actually did. So, I don't know, I'm not opposed to talking about
> potential improvements here, but to the extent that you're suggesting
> this is unreasonable behavior, I think that's too harsh.

Yea. I think it'd be a different matter if streaming logical decoding
had been added this cycle and we'd started out with supporting queries
over replication connection - but it's been long enough that it's likely
that people rely on the current behaviour, and I don't see the gain in
reliability outweigh the compat issues.

Your argument that one can just check kinda goes both ways - you can do
that with the current behaviour too...

Greetings,

Andres Freund




Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-11 Thread Andres Freund
On 2021-06-11 15:52:21 -0400, Álvaro Herrera wrote:
> On 2021-Apr-07, Andres Freund wrote:
> 
> > After this I don't see a reason to have SAB_Inquire - as far as I can
> > tell it's practically impossible to use without race conditions? Except
> > for raising an error - which is "builtin"...
> 
> Pushed 0002.
> 
> Thanks!

Thank you for your work on this!




Re: unnesting multirange data types

2021-06-11 Thread Alexander Korotkov
On Fri, Jun 11, 2021 at 1:04 AM Justin Pryzby  wrote:
>
> +{ oid => '1293', descr => 'expand mutlirange to set of ranges',
>
> typo: mutlirange

Fixed, thanks.

The patch with the implementation of both unnest() and cast to array
is attached.  It contains both tests and docs.

--
Regards,
Alexander Korotkov


multirange_unnest_cast_to_array.patch
Description: Binary data


Re: Character expansion with ICU collations

2021-06-11 Thread Peter Eisentraut

On 11.06.21 22:05, Finnerty, Jim wrote:



 You can have these queries return both rows if you use an
 accent-ignoring collation, like this example in the documentation:

 CREATE COLLATION ignore_accents (provider = icu, locale =
 'und-u-ks-level1-kc-true', deterministic = false);
<<

Indeed.  Is the dependency between the character expansion capability and 
accent-insensitive collations documented anywhere?


The above is merely a consequence of what the default collation elements 
for 'ß' are.


Expansion isn't really a relevant concept in collation.  Any character 
can map to 1..N collation elements.  The collation algorithm doesn't 
care how many it is.



Can a CI collation be ordered upper case first, or is this a limitation of ICU?


I don't know the authoritative answer to that, but to me it doesn't make 
sense, since the effect of a case-insensitive collation is to throw away 
the third-level weights, so there is nothing left for "upper case first" 
to operate on.



More generally, is there any interest in leveraging the full power of ICU 
tailoring rules to get whatever order someone may need, subject to the 
limitations of ICU itself?  what would be required to extend CREATE COLLATION 
to accept an optional sequence of tailoring rules that we would store in the 
pg_collation catalog and apply along with the modifiers in the locale string?


yes




Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options

2021-06-11 Thread Robert Haas
On Tue, May 25, 2021 at 9:38 AM Alvaro Herrera  wrote:
> This should be okay, right?  Well, almost. The problem here is if you
> want to have a variable where you set more than one option, you have to
> use bit-and of the enum values ... and the resulting value is no longer
> part of the enum.  A compiler would be understandably upset if you try
> to pass that value in a variable of the enum datatype.

Yes. I dislike this style for precisely this reason.

I may, however, be in the minority.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-06-11 Thread Robert Haas
On Fri, Jun 11, 2021 at 12:13 AM Amit Kapila  wrote:
> Do we invalidate relcache entry if someone changes say trigger or some
> index AM function property via Alter Function (in our case from safe
> to unsafe or vice-versa)? Tsunakawa-San has mentioned this as the
> reason in his email [1] why we can't rely on caching this property in
> relcache entry. I also don't see anything in AlterFunction which would
> suggest that we invalidate the relation with which the function might
> be associated via trigger.

Hmm. I am not sure index that AM functions really need to be checked,
but triggers certainly do. I think if you are correct that an ALTER
FUNCTION wouldn't invalidate the relcache entry, which is I guess
pretty much the same problem Tom was pointing out in the thread to
which you linked.

But ... thinking out of the box as Tom suggests, what if we came up
with some new kind of invalidation message that is only sent when a
function's parallel-safety marking is changed? And every backend in
the same database then needs to re-evaluate the parallel-safety of
every relation for which it has cached a value. Such recomputations
might be expensive, but they would probably also occur very
infrequently. And you might even be able to make it a bit more
fine-grained if it's worth the effort to worry about that: say that in
addition to caching the parallel-safety of the relation, we also cache
a list of the pg_proc OIDs upon which that determination depends. Then
when we hear that the flag's been changed for OID 123456, we only need
to invalidate the cached value for relations that depended on that
pg_proc entry. There are ways that a relation could become
parallel-unsafe without changing the parallel-safety marking of any
function, but perhaps all of the other ways involve a relcache
invalidation?

Just brainstorming here. I might be off-track.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add PortalDrop in exec_execute_message

2021-06-11 Thread Alvaro Herrera
On 2021-Jun-09, Tom Lane wrote:

> BTW, to be clear: I think Alvaro's change is also necessary.
> If libpq is going to silently do something different in pipeline
> mode than it does in normal mode, it should strive to minimize
> the semantic difference.  exec_simple_query includes a PortalDrop,
> so we'd best do the same in pipeline mode.

Pushed that patch, thanks.

-- 
Álvaro Herrera   Valdivia, Chile
"At least to kernel hackers, who really are human, despite occasional
rumors to the contrary" (LWN.net)




Re: doc issue missing type name "multirange" in chapter title

2021-06-11 Thread Tom Lane
Alexander Korotkov  writes:
> What about "Range/Multirange Functions and Operators"?

Better than a comma, I guess.  Personally I didn't have a
problem with the form with two "ands".

regards, tom lane




Re: doc issue missing type name "multirange" in chapter title

2021-06-11 Thread Alexander Korotkov
On Fri, Jun 11, 2021 at 3:39 AM Tom Lane  wrote:
> Justin Pryzby  writes:
> > If it's confusing to say "and" twice, then perhaps you'd say:
> > Functions and Operators for Range and Multirange Types
>
> Uh, that's still two "and"s.  In any case, I think it's better
> to keep this section heading aligned with all of its siblings.

What about "Range/Multirange Functions and Operators"?

--
Regards,
Alexander Korotkov




Re: Character expansion with ICU collations

2021-06-11 Thread Finnerty, Jim
>>
You can have these queries return both rows if you use an
accent-ignoring collation, like this example in the documentation:

CREATE COLLATION ignore_accents (provider = icu, locale =
'und-u-ks-level1-kc-true', deterministic = false);
<<

Indeed.  Is the dependency between the character expansion capability and 
accent-insensitive collations documented anywhere?

Another unexpected dependency appears to be @colCaseFirst=upper.  If specified 
in combination with colStrength=secondary, it appears that the upper/lower case 
ordering is random within a group of characters that are secondary equal, e.g. 
'A' < 'a', but 'b' < 'B', 'c' < 'C', ...  , but then 'L' < 'l'.  It is not even 
consistently ordered with respect to case.  If I make it a nondeterministic 
CS_AI collation, then it sorts upper before lower consistently.  The rule seems 
to be that you can't sort by case within a group that is case-insensitive. 

Can a CI collation be ordered upper case first, or is this a limitation of ICU?

For example, this is part of the sort order that I'd like to achieve with ICU, 
with the code point in column 1 and dense_rank() shown in the rightmost column 
indicating that 'b' = 'B', for example:

66  B   B   138 151
98  b   b   138 151   <- so within a group that is CI_AS equal, 
the sort order needs to be upper case first
67  C   C   139 152
99  c   c   139 152
199 Ç   Ç   140 153
231 ç   ç   140 153
68  D   D   141 154
100 d   d   141 154
208 Ð   Ð   142 199
240 ð   ð   142 199
69  E   E   143 155
101 e   e   143 155

Can this sort order be achieved with ICU?

More generally, is there any interest in leveraging the full power of ICU 
tailoring rules to get whatever order someone may need, subject to the 
limitations of ICU itself?  what would be required to extend CREATE COLLATION 
to accept an optional sequence of tailoring rules that we would store in the 
pg_collation catalog and apply along with the modifiers in the locale string?

/Jim





Re: Question about StartLogicalReplication() error path

2021-06-11 Thread Robert Haas
On Fri, Jun 11, 2021 at 2:49 PM Jeff Davis  wrote:
> It doesn't add any overhead.
>
> All the client would have to do is "SELECT confirmed_flush_lsn FROM
> pg_replication_slots WHERE slot_name='myslot'", and compare it to the
> stored value. If the stored value is earlier than the
> confirmed_flush_lsn, the *client* can decide to start replication at
> the confirmed_flush_lsn. That makes sense because the client knows more
> about its behavior and configuration, and whether that's a safe choice
> or not.

True, but it doesn't seem very nice to forcethe client depend on SQL
when that wouldn't otherwise be needed. The SQL is a lot more likely
to fail than a replication command, for example due to some
permissions issue. So I think if we want to make this an optional
behavior, it would be better to add a flag to the START_REPLICATION
flag to say whether it's OK for the server to fast-forward like this.

You seem to see this as some kind of major problem and I guess I don't
agree. I think it's pretty clear what the motivation was for the
current behavior, because I believe it's well-explained by the comment
and the three people who have tried to answer your question. I also
think it's pretty clear why somebody might find it surprising: someone
might think that fast-forwarding is harmful and risky rather than a
useful convenience. As evidence for the fact that someone might think
that, I offer the fact that you seem to think exactly that thing. I
also think that there's pretty good evidence that the behavior as it
exists is not really so bad. As far as I know, and I certainly might
have missed something, you're the first one to complain about behavior
that we've had for quite a long time now, and you seem to be saying
that it might cause problems for somebody, not that you know it
actually did. So, I don't know, I'm not opposed to talking about
potential improvements here, but to the extent that you're suggesting
this is unreasonable behavior, I think that's too harsh.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-11 Thread Álvaro Herrera
On 2021-Apr-07, Andres Freund wrote:

> After this I don't see a reason to have SAB_Inquire - as far as I can
> tell it's practically impossible to use without race conditions? Except
> for raising an error - which is "builtin"...

Pushed 0002.

Thanks!

-- 
Álvaro Herrera39°49'30"S 73°17'W
"La persona que no quería pecar / estaba obligada a sentarse
 en duras y empinadas sillas/ desprovistas, por cierto
 de blandos atenuantes"  (Patricio Vogel)




Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-11 Thread Jeff Davis
On Fri, 2021-06-11 at 15:43 +0530, Amit Kapila wrote:
> The new patches look mostly good apart from the below cosmetic
> issues.
> I think the question is whether we want to do these for PG-14 or
> postpone them till PG-15. I think these don't appear to be risky
> changes so we can get them in PG-14 as that might help some outside
> core solutions as appears to be the case for Jeff. 

My main interest here is that I'm working on replication protocol
support in a rust library. While doing that, I've encountered a lot of
rough edges (as you may have seen in my recent posts), and this patch
fixes one of them.

But at the same time, several small changes to the protocol spread
across several releases introduces more opportunity for confusion.

If we are confident this is the right direction, then v14 makes sense
for consistency. But if waiting for v15 might result in a better
change, then we should probably just get it into the July CF for v15.

(My apologies if my opinion has drifted a bit since this thread began.)

Regards,
Jeff Davis






Re: automatically generating node support functions

2021-06-11 Thread Andres Freund
Hi,

On 2021-06-08 19:45:58 +0200, Peter Eisentraut wrote:
> On 08.06.21 15:40, David Rowley wrote:
> > It's almost 2 years ago now, but I'm wondering if you saw what Andres
> > proposed in [1]?  The idea was basically to make a metadata array of
> > the node structs so that, instead of having to output large amounts of
> > .c code to do read/write/copy/equals, instead just have small
> > functions that loop over the elements in the array for the given
> > struct and perform the required operation based on the type.
> 
> That project was technologically impressive, but it seemed to have
> significant hurdles to overcome before it can be useful.  My proposal is
> usable and useful today.  And it doesn't prevent anyone from working on a
> more sophisticated solution.

I think it's short-sighted to further and further go down the path of
parsing "kind of C" without just using a proper C parser. But leaving
that aside, a big part of the promise of the approach in that thread
isn't actually tied to the specific way the type information is
collected: The perl script could output something like the "node type
metadata" I generated in that patchset, and then we don't need the large
amount of generated code and can much more economically add additional
operations handling node types.

Greetings,

Andres Freund




Re: [Proposal] Add accumulated statistics for wait event

2021-06-11 Thread Andres Freund
HHi,

On 2021-06-05 00:53:44 +0200, Jehan-Guillaume de Rorthais wrote:
> From 88c2779679c5c9625ca5348eec0543daab5ccab4 Mon Sep 17 00:00:00 2001
> From: Jehan-Guillaume de Rorthais 
> Date: Tue, 1 Jun 2021 13:25:57 +0200
> Subject: [PATCH 1/2] Add pg_stat_waitaccum view.
> 
> pg_stat_waitaccum shows counts and duration of each wait events.
> Each backend/backgrounds counts and measures the time of wait event
> in every pgstat_report_wait_start and pgstat_report_wait_end. They
> store those info into their local variables and send to Statistics
> Collector. We can get those info via Statistics Collector.
> 
> For reducing overhead, I implemented statistic hash instead of
> dynamic hash. I also implemented track_wait_timing which
> determines wait event duration is collected or not.

I object to adding this overhead. The whole selling point for wait
events was that they are low overhead. I since spent time reducing the
overhead further, because even just the branches for checking if
track_activity is enabled are measurable (225a22b19ed).


> From ddb1adc5cd9acc9bc9de16d0cf057124b09fe1e3 Mon Sep 17 00:00:00 2001
> From: Jehan-Guillaume de Rorthais 
> Date: Fri, 4 Jun 2021 18:14:51 +0200
> Subject: [PATCH 2/2] [POC] Change measuring method of wait event time from
>  INSTR_TIME to rdtsc.
> 
> This patch changes measuring method of wait event time from INSTR_TIME (which
> uses gettimeofday or clock_gettime) to rdtsc. This might reduce the overhead
> of measuring overhead.
> 
> Any supports like changing clock cycle to actual time or error handling are
> not currently implemented.

rdtsc is a serializing (*) instruction - that's the expensive part. On linux
clock_gettime() doesn't actually need a syscall. While the vdso call
implies a bit of overhead over a raw rdtsc, it's a relatively small part
of the overhead.  See
https://www.postgresql.org/message-id/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de

Greetings,

Andres Freund

(*) it's not fully serializing, iirc it allows later instructions to be
started, but it does wait for all earlier in-flight instructions to
finish.




Re: RFC: Table access methods and scans

2021-06-11 Thread Andres Freund
Hi,

On 2021-06-03 17:52:24 -0700, Jeff Davis wrote:
> I agree that would be very conventient for non-heap AMs. There's a very
> old commit[3] that says:
>
> +   /*
> +* Note that unlike IndexScan, SeqScan never use keys
> +* in heap_beginscan (and this is very bad) - so, here
> +* we have not check are keys ok or not.
> +*/
>
> and that language has just been carried forward for decades. I wonder
> if there's any major reason this hasn't been done yet. Does it just not
> improve performance for a heap, or is there some other reason?

It's not actually a good idea in general:

- Without substantial refactoring more work is done while holding the
  content lock on the page. Whereas doing it as part of a seqscan only
  requires a buffer pin (and thus allows for concurrent writes to the
  same page)

- It's hard to avoid repeated work with expressions that can't fully be
  evaluated as part of the ScanKey. Expression evaluation generally can
  be a bit smarter about evaluation, e.g. not deforming the tuple
  one-by-one.

Greetings,

Andres Freund




Re: Question about StartLogicalReplication() error path

2021-06-11 Thread Jeff Davis
On Fri, 2021-06-11 at 11:56 -0700, Andres Freund wrote:
> That doesn't work as easily in older versions because there was no
> SQL
> support in replication connections until PG 10...

9.6 will be EOL this year. I don't really see why such old versions are
relevant to this discussion.

Regards,
Jeff Davis






Re: Question about StartLogicalReplication() error path

2021-06-11 Thread Andres Freund
On 2021-06-11 11:49:19 -0700, Jeff Davis wrote:
> All the client would have to do is "SELECT confirmed_flush_lsn FROM
> pg_replication_slots WHERE slot_name='myslot'", and compare it to the
> stored value.

That doesn't work as easily in older versions because there was no SQL
support in replication connections until PG 10...




Re: Question about StartLogicalReplication() error path

2021-06-11 Thread Jeff Davis
On Fri, 2021-06-11 at 10:40 -0700, Andres Freund wrote:
> Especially because it very well might break existing working
> setups...

Please address my concerns[1].

Regards,
Jeff Davis

[1] 
https://www.postgresql.org/message-id/e22a4606333ce1032e29fe2fb1aa9036e6f0ca98.camel%40j-davis.com





Re: Question about StartLogicalReplication() error path

2021-06-11 Thread Jeff Davis
On Fri, 2021-06-11 at 13:15 -0400, Robert Haas wrote:
> on it, but it seems to me that the comment does explain this, and
> that
> it's basically the same explanation as what Amit said.

It only addresses the "pro" side of the behavior, not the "con". It's a
bit like saying "Given that we are in the U.S., it might seem like we
should be driving on the right side of the road, but that side has
traffic and we are in a hurry."

Why might it seem that we should error out? If we don't error out, what
bad things might happen? How do these "con"s weigh against the "pro"s?

> I'm not sure that it would be a good
> trade-off to have a tighter sanity check at the expense of adding
> that
> overhead.

It doesn't add any overhead.

All the client would have to do is "SELECT confirmed_flush_lsn FROM
pg_replication_slots WHERE slot_name='myslot'", and compare it to the
stored value. If the stored value is earlier than the
confirmed_flush_lsn, the *client* can decide to start replication at
the confirmed_flush_lsn. That makes sense because the client knows more
about its behavior and configuration, and whether that's a safe choice
or not.

The only difference is whether the server is safe-by-default with
intuitive semantics that match the documentation, or unsafe-by-default
with unexpected semantics that don't match the documentation.

Regards,
Jeff Davis






Re: Question about StartLogicalReplication() error path

2021-06-11 Thread Andres Freund
Hi,

On 2021-06-11 13:15:11 -0400, Robert Haas wrote:
> On Fri, Jun 11, 2021 at 2:23 AM Jeff Davis  wrote:
> > * The comment acknowledges that a user might expect an error in that
> > case; but doesn't really address why the user would expect an error,
> > and why it's OK to violate that expectation.
> 
> This code was written by Andres, so he'd be the best person to comment
> on it, but it seems to me that the comment does explain this, and that
> it's basically the same explanation as what Amit said. If the client
> doesn't have to do anything for a certain range of WAL and just
> acknowledges it, it would under your proposal have to also durably
> record that it had chosen to do nothing, which might cause extra
> fsyncs, potentially lots of extra fsyncs if this happens frequently
> e.g. because most tables are filtered out and the replicated ones are
> only modified occasionally.

Yes, that's the motivation.


> I'm not sure that it would be a good trade-off to have a tighter
> sanity check at the expense of adding that overhead.

Especially because it very well might break existing working setups...

Greetings,

Andres Freund




Re: Multi-Column List Partitioning

2021-06-11 Thread Zhihong Yu
On Thu, Jun 10, 2021 at 8:38 PM Amit Langote 
wrote:

> Hi Nitin,
>
> On Thu, Jun 3, 2021 at 11:45 PM Nitin Jadhav
>  wrote:
> > > I'll wait for you to post a new patch addressing at least the comments
> > > in my earlier email.  Also, please make sure to run `make check`
> > > successfully before posting the patch. :)
> >
> > I have fixed all of the review comments given by you and Jeevan in the
> > attached patch and also the attached patch contains more changes
> > compared to the previous patch. Following are the implementation
> > details.
>
> Thanks for the updated version.
>
> > 1. Regarding syntax, the existing syntax will work fine for the
> > single-column list partitioning. However I have used the new syntax
> > for the multi-column list partitioning as we discussed earlier. I have
> > used a combination of 'AND' and 'OR' logic for the partition
> > constraints as given in the below example.
> >
> > postgres@17503=#create table t(a int, b text) partition by list(a,b);
> > CREATE TABLE
> > postgres@17503=#create table t1 partition of t for values in ((1,'a'),
> > (NULL,'b'));
> > CREATE TABLE
> > postgres@17503=#\d+ t
> >   Partitioned table "public.t"
> >  Column |  Type   | Collation | Nullable | Default | Storage  |
> > Compression | Stats target | Description
> >
> +-+---+--+-+--+-+--+-
> >  a  | integer |   |  | | plain|
> >  |  |
> >  b  | text|   |  | | extended |
> >  |  |
> > Partition key: LIST (a, b)
> > Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b'))
> >
> > postgres@17503=#\d+ t1
> > Table "public.t1"
> >  Column |  Type   | Collation | Nullable | Default | Storage  |
> > Compression | Stats target | Description
> >
> +-+---+--+-+--+-+--+-
> >  a  | integer |   |  | | plain|
> >  |  |
> >  b  | text|   |  | | extended |
> >  |  |
> > Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b'))
> > Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL)
> > AND (b = 'b'::text)))
> > Access method: heap
>
> The constraint expressions seem to come out correctly, though I
> haven't checked your implementation closely yet.
>
> > 2. In the existing code, NULL values were handled differently. It was
> > not added to the 'datums' variable, rather used to store the partition
> > index directly in the 'null_index' variable. Now there is a
> > possibility of multiple NULL values, hence introducing  a new member
> > 'isnulls' in the 'PartitionBoundInfoData' struct which indicates
> > whether the corresponding element in the 'datums' is NULL. Now
> > 'null_index' cannot be used directly to store the partition index, so
> > removed it and made the necessary changes in multiple places.
> >
> > 3. I have added test cases for 'create table' and 'insert' statements
> > related to multi-column list partitioning and these are working fine
> > with 'make check'.
> >
> > 4. Handled the partition pruning code to accommodate these changes for
> > single-column list partitioning. However it is pending for
> > multi-column list partitioning.
> >
> > 5. I have done necessary changes in partition wise join related code
> > to accommodate for single-column list partitioning. However it is
> > pending for multi-column list partitioning.
> >
> > Kindly review the patch and let me know if any changes are required.
>
> The new list bound binary search and related comparison support
> function look a bit too verbose to me.  I was expecting
> partition_list_bsearch() to look very much like
> partition_range_datum_bsearch(), but that is not the case.  The
> special case code that you wrote in partition_list_bsearch() seems
> unnecessary, at least in that function.  I'm talking about the code
> fragment starting with this comment:
>
>   /*
>* Once we find the matching for the first column but if it does
> not
>* match for the any of the other columns, then the binary search
>* will not work in all the cases. We should traverse just below
>* and above the mid index until we find the match or we reach
> the
>* first mismatch.
>*/
>
> I guess you're perhaps trying to address the case where the caller
> does not specify the values for all of the partition key columns,
> which can happen when the partition pruning code needs to handle a set
> of clauses matching only some of the partition key columns.  But
> that's a concern of the partition pruning code and so the special case
> should be handled there (if at all), not in the binary search function
> that is shared with other callers.  Regarding that, 

Re: Question about StartLogicalReplication() error path

2021-06-11 Thread Robert Haas
On Fri, Jun 11, 2021 at 2:23 AM Jeff Davis  wrote:
> * The comment acknowledges that a user might expect an error in that
> case; but doesn't really address why the user would expect an error,
> and why it's OK to violate that expectation.

This code was written by Andres, so he'd be the best person to comment
on it, but it seems to me that the comment does explain this, and that
it's basically the same explanation as what Amit said. If the client
doesn't have to do anything for a certain range of WAL and just
acknowledges it, it would under your proposal have to also durably
record that it had chosen to do nothing, which might cause extra
fsyncs, potentially lots of extra fsyncs if this happens frequently
e.g. because most tables are filtered out and the replicated ones are
only modified occasionally. I'm not sure that it would be a good
trade-off to have a tighter sanity check at the expense of adding that
overhead.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: "an SQL" vs. "a SQL"

2021-06-11 Thread Bruce Momjian
On Thu, Jun 10, 2021 at 05:39:00PM -0400, Andrew Dunstan wrote:
> I suspect "an historic" is bordering on archaic even in the UK these days.

Don't trigger me on the difference between "historic" and "historical"!  ;-)

(Hey, not every day I get to trim quoted text to one line --- see recent
pgsql-general discussion of the topic.)

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Race condition in InvalidateObsoleteReplicationSlots()

2021-06-11 Thread Álvaro Herrera
On 2021-Jun-10, Álvaro Herrera wrote:

> Here's a version that I feel is committable (0001).  There was an issue
> when returning from the inner loop, if in a previous iteration we had
> released the lock.  In that case we need to return with the lock not
> held, so that the caller can acquire it again, but weren't.  This seems
> pretty hard to hit in practice (I suppose somebody needs to destroy the
> slot just as checkpointer killed the walsender, but before checkpointer
> marks it as its own) ... but if it did happen, I think checkpointer
> would block with no recourse.  Also added some comments and slightly
> restructured the code.
> 
> There are plenty of conflicts in pg13, but it's all easy to handle.

Pushed, with additional minor changes.

> I wrote a test (0002) to cover the case of signalling a walsender, which
> is currently not covered (we only deal with the case of a standby that's
> not running).  There are some sharp edges in this code -- I had to make
> it use background_psql() to send a CHECKPOINT, which hangs, because I
> previously send a SIGSTOP to the walreceiver.  Maybe there's a better
> way to achieve a walreceiver that remains connected but doesn't consume
> input from the primary, but I don't know what it is.  Anyway, the code
> becomes covered with this.  I would like to at least see it in master,
> to gather some reactions from buildfarm.

I tried hard to make this stable, but it just isn't (it works fine one
thousand runs, then I grab some coffee and run it once more and that one
fails.  Why?  that's not clear to me).  Attached is the last one I have,
in case somebody wants to make it better.  Maybe there's some completely
different approach that works better, but I'm out of ideas for now.

-- 
Álvaro Herrera   Valdivia, Chile
"La experiencia nos dice que el hombre peló millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelarían al hombre" (Ijon Tichy)
>From 6b6cb174452c553437ba7949aa25f305f599d6b7 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 11 Jun 2021 12:21:45 -0400
Subject: [PATCH v3] Add test case for invalidating an active replication slot
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The code to signal a running walsender was completely uncovered before.

This test involves sending SIGSTOP to a walreceiver and running a
checkpoint while advancing WAL.  I'm not very certain that this test is
stable, so it's in master only, and separate from the code-fix commit so
that it can be reverted easily if need be.

Author: Álvaro Herrera 
Discussion: https://postgr.es/m/202106102202.mjw4huiix7lo@alvherre.pgsql
---
 src/test/recovery/t/019_replslot_limit.pl | 79 ++-
 1 file changed, 76 insertions(+), 3 deletions(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 7094aa0704..dcadfe1252 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -11,7 +11,7 @@ use TestLib;
 use PostgresNode;
 
 use File::Path qw(rmtree);
-use Test::More tests => 14;
+use Test::More tests => $TestLib::windows_os ? 14 : 17;
 use Time::HiRes qw(usleep);
 
 $ENV{PGDATABASE} = 'postgres';
@@ -211,8 +211,8 @@ for (my $i = 0; $i < 1; $i++)
 }
 ok($failed, 'check that replication has been broken');
 
-$node_primary->stop('immediate');
-$node_standby->stop('immediate');
+$node_primary->stop;
+$node_standby->stop;
 
 my $node_primary2 = get_new_node('primary2');
 $node_primary2->init(allows_streaming => 1);
@@ -253,6 +253,79 @@ my @result =
 		timeout => '60'));
 is($result[1], 'finished', 'check if checkpoint command is not blocked');
 
+$node_primary2->stop;
+$node_standby->stop;
+
+# The next test depends on Perl's `kill`, which apparently is not
+# portable to Windows.  (It would be nice to use Test::More's `subtest`,
+# but that's not in the ancient version we require.)
+if ($TestLib::windows_os)
+{
+	done_testing();
+	exit;
+}
+
+# Get a slot terminated while the walsender is active
+# We do this by sending SIGSTOP to the walreceiver.  Skip this on Windows.
+my $node_primary3 = get_new_node('primary3');
+$node_primary3->init(allows_streaming => 1, extra => ['--wal-segsize=1']);
+$node_primary3->append_conf(
+	'postgresql.conf', qq(
+	min_wal_size = 2MB
+	max_wal_size = 2MB
+	log_checkpoints = yes
+	max_slot_wal_keep_size = 1MB
+	));
+$node_primary3->start;
+$node_primary3->safe_psql('postgres',
+	"SELECT pg_create_physical_replication_slot('rep3')");
+# Take backup
+$backup_name = 'my_backup';
+$node_primary3->backup($backup_name);
+# Create standby
+my $node_standby3 = get_new_node('standby_3');
+$node_standby3->init_from_backup($node_primary3, $backup_name,
+	has_streaming => 1);
+$node_standby3->append_conf('postgresql.conf', "primary_slot_name = 'rep3'");
+$node_standby3->start;
+$node_primary3->wait_for_catchup($node_standby3->name, 'replay');
+my 

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-11 Thread Fujii Masao




On 2021/05/11 13:37, Masahiko Sawada wrote:

I've attached the updated patches that incorporated comments from
Zhihong and Ikeda-san.


Thanks for updating the patches!

I'm still reading these patches, but I'd like to share some review comments
that I found so far.

(1)
+/* Remove the foreign transaction from FdwXactParticipants */
+void
+FdwXactUnregisterXact(UserMapping *usermapping)
+{
+   Assert(IsTransactionState());
+   RemoveFdwXactEntry(usermapping->umid);
+}

Currently there is no user of FdwXactUnregisterXact().
This function should be removed?


(2)
When I ran the regression test, I got the following failure.

= Contents of ./src/test/modules/test_fdwxact/regression.diffs
diff -U3 
/home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out
 
/home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out
--- 
/home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/expected/test_fdwxact.out
 2021-06-10 02:19:43.808622747 +
+++ 
/home/runner/work/postgresql/postgresql/src/test/modules/test_fdwxact/results/test_fdwxact.out
  2021-06-10 02:29:53.452410462 +
@@ -174,7 +174,7 @@
 SELECT count(*) FROM pg_foreign_xacts;
  count
 ---
- 1
+ 4
 (1 row)


(3)
+errmsg("could not read foreign transaction state 
from xlog at %X/%X",
+   (uint32) (lsn >> 32),
+   (uint32) lsn)));

LSN_FORMAT_ARGS() should be used?


(4)
+extern void RecreateFdwXactFile(TransactionId xid, Oid umid, void *content,
+   int len);

Since RecreateFdwXactFile() is used only in fdwxact.c,
the above "extern" is not necessary?


(5)
+2. Pre-Commit phase (1st phase of two-phase commit)
+we record the corresponding WAL indicating that the foreign server is involved
+with the current transaction before doing PREPARE all foreign transactions.
+Thus, in case we loose connectivity to the foreign server or crash ourselves,
+we will remember that we might have prepared tranascation on the foreign
+server, and try to resolve it when connectivity is restored or after crash
+recovery.

So currently FdwXactInsertEntry() calls XLogInsert() and XLogFlush() for
XLOG_FDWXACT_INSERT WAL record. Additionally we should also wait there
for WAL record to be replicated to the standby if sync replication is enabled?
Otherwise, when the failover happens, new primary (past-standby)
might not have enough XLOG_FDWXACT_INSERT WAL records and
might fail to find some in-doubt foreign transactions.


(6)
XLogFlush() is called for each foreign transaction. So if there are many
foreign transactions, XLogFlush() is called too frequently. Which might
cause unnecessary performance overhead? Instead, for example,
we should call XLogFlush() only at once in FdwXactPrepareForeignTransactions()
after inserting all WAL records for all foreign transactions?


(7)
/* Open connection; report that we'll create a prepared statement. */
fmstate->conn = GetConnection(user, true, >conn_state);
+   MarkConnectionModified(user);

MarkConnectionModified() should be called also when TRUNCATE on
a foreign table is executed?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [PATCH] Fix select from wrong table array_op_test

2021-06-11 Thread Heikki Linnakangas

On 11/06/2021 18:31, Tom Lane wrote:

Jason Kim  writes:

In the middle of GIN index testing, there are some selects that are on
a different table array_op_test that doesn't even have an index.  They
probably were supposed to be selects to table array_index_op_test like
the other ones around the area.


I think it's probably intentional, else why have two tables at all?
I suppose the point of these test cases is to confirm that you get the
same results with or without use of an index.


We already have these same queries in the 'arrays' test against the 
'array_op_test' table, though. It sure looks like a copy-paste error to 
me as well.


- Heikki





Re: [PATCH] Fix select from wrong table array_op_test

2021-06-11 Thread Tom Lane
Jason Kim  writes:
> In the middle of GIN index testing, there are some selects that are on
> a different table array_op_test that doesn't even have an index.  They
> probably were supposed to be selects to table array_index_op_test like
> the other ones around the area.

I think it's probably intentional, else why have two tables at all?
I suppose the point of these test cases is to confirm that you get the
same results with or without use of an index.

Certainly, there's more than one way to do that.  Perhaps we should
have only one table and perform the variant tests by manipulating
enable_indexscan et al.  But I think what you did here is defeating
the intent.

regards, tom lane




Re: logical replication of truncate command with trigger causes Assert

2021-06-11 Thread Tom Lane
Amit Kapila  writes:
> On Fri, Jun 11, 2021 at 12:20 AM Tom Lane  wrote:
>> Another thing
>> I'm wondering is how many of these messages really need to be
>> translated.  We could use errmsg_internal and avoid burdening the
>> translators, perhaps.

> Not sure but I see all existing similar ereport calls don't use 
> errmsg_internal.

I was thinking maybe we could mark all these replication protocol
violation errors non-translatable.  While we don't want to crash on a
protocol violation, it shouldn't really be a user-facing case either.
Thoughts?

regards, tom lane




Re: Race condition in recovery?

2021-06-11 Thread Tom Lane
Kyotaro Horiguchi  writes:
>> ==~_~===-=-===~_~== 
>> pgsql.build/src/bin/pg_verifybackup/tmp_check/log/003_corruption_primary.log 
>> ==~_~===-=-===~_~==
>> ...
>> 2021-06-08 16:17:41.706 CEST [51792:9] 003_corruption.pl LOG:  received 
>> replication command: START_REPLICATION SLOT "pg_basebackup_51792" 0/B00 
>> TIMELINE 1
>> 2021-06-08 16:17:41.706 CEST [51792:10] 003_corruption.pl STATEMENT:  
>> START_REPLICATION SLOT "pg_basebackup_51792" 0/B00 TIMELINE 1
>> (log ends here)

> There seems like some hardware failure?

conchuela has definitely evinced flakiness before.  Not sure what's
up with it, but I have no problem with writing off non-repeatable
failures from that machine.  In any case, it's now passed half a
dozen times in a row on HEAD, so I think we can say that it's okay
with this test.  That leaves jacana, which I'm betting has a
Windows portability issue with the new test.

regards, tom lane




Re: pgbench bug candidate: negative "initial connection time"

2021-06-11 Thread Fabien COELHO




Hello Hayato-san,


I played pgbench with wrong parameters,


That's good:-)


and I found bug-candidate.

1. Do initdb and start.
2. Initialize schema and data with "scale factor" = 1.
3. execute following command many times:

$ pgbench -c 101 -j 10 postgres

Then, sometimes the negative " initial connection time" was returned.
Lateyncy average is also strange.

```
$ pgbench -c 101 -j 10 postgres
starting vacuum...end.
pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: 
FATAL:  sorry, too many clients already


Hmmm.

AFAICR there was a decision to generate a report even if something went 
very wrong, in this case some client could not connect, so some values 
are not initialized, hence the absurd figures, as you show below.


Maybe we should revisit this decision.


initial connection time = -372896921.586 ms



I sought pgbench.c and found a reason.



When a thread failed to get some connections, they do not fill any values to 
thread->bench_start in threadRun().
And if the failure is caused in the final thread (this means threads[nthreads - 
1]->bench_start is zero),
the following if-statement sets bench_start to zero.



I cannot distinguish whether we have to fix it, but I attache the patch.
This simply ignores a result when therad->bench_start is zero.



How do you think?


Hmmm. Possibly. Another option could be not to report anything after some 
errors. I'm not sure, because it would depend on the use case. I guess the 
command returned an error status as well.


I'm going to give it some thoughts.

--
Fabien.




Re: Race condition in recovery?

2021-06-11 Thread Tom Lane
Dilip Kumar  writes:
> On Fri, Jun 11, 2021 at 11:45 AM Kyotaro Horiguchi
>  wrote:
>>> ==~_~===-=-===~_~== 
>>> pgsql.build/src/test/recovery/tmp_check/log/025_stuck_on_old_timeline_primary.log
>>>  ==~_~===-=-===~_~==
>>> ...
>>> The system cannot find the path specified.
>>> 2021-06-10 22:56:17.754 EDT [60c2d0cf.54c:1] LOG:  archive command failed 
>>> with exit code 1
>>> 2021-06-10 22:56:17.754 EDT [60c2d0cf.54c:2] DETAIL:  The failed archive 
>>> command was: /usr/bin/perl 
>>> "/home/pgrunner/bf/root/HEAD/pgsql/src/test/recovery/t/cp_history_files" 
>>> "pg_wal\\00010001" 
>>> "/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_025_stuck_on_old_timeline_primary_data/archives/00010001"

> Wal file copying will not create a problem for us, but I noticed that
> it is failing in copying the history files as well and that is
> creating a problem.

I think jacana uses msys[2?], so this likely indicates a problem
in path sanitization for the archive command.  Andrew, any advice?

regards, tom lane




Re: recovery test failures on hoverfly

2021-06-11 Thread Tom Lane
Michael Paquier  writes:
> On Fri, Jun 11, 2021 at 05:38:34PM +0530, Amit Kapila wrote:
>> It seems the error happens in both the tests when after issuing a
>> KILL, we are trying to reconnect. Can we do anything for this?

> This is the same problem as c757a3da and 6d41dd0, where we write a
> query to a pipe but the kill, causing a failure, makes the test fail
> with a SIGPIPE in IPC::Run as a query is sent down to a pipe.

Indeed.

> I think that using SELECT 1 to test if the server has been restarted
> is a bit crazy.  I would suggest to use instead a loop based on
> pg_isready.

The precedent of the previous fixes would seem to suggest seeing if
we can replace 'SELECT 1' with "undef".  Not sure if that'll work
without annoying changes to poll_query_until, though.

regards, tom lane




Re: Error on pgbench logs

2021-06-11 Thread Fabien COELHO


Bonjour Michaël,


+   /* flush remaining stats */
+   if (!logged && latency == 0.0)
+   logAgg(logfile, agg);


You are right, this is missing the final stats.  Why the choice of
latency here for the check?


For me zero latency really says that there is no actual transaction to 
count, so it is a good trigger for the closing call. I did not wish to add 
a new "flush" parameter, or a specific function. I agree that it looks 
strange, though.


That's just to make the difference between the case where doLog() is 
called while processing the benchmark for the end of the transaction and 
the case where doLog() is called once a thread ends, no?


Yes.

Wouldn't it be better to do a final push of the states once a thread 
reaches CSTATE_FINISHED or CSTATE_ABORTED instead?


The call was already in place at the end of threadRun and had just become 
ineffective. I did not wish to revisit its place and change the overall 
structure, it is just a bug fix. I agree that it could be done differently 
with the added logAgg function which could be called directly. Attached 
another version which does that.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index dc84b7b9b7..62f7994363 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3765,6 +3765,30 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	return CSTATE_END_COMMAND;
 }
 
+/* print aggregated report to logfile */
+static void
+logAgg(FILE *logfile, StatsData *agg)
+{
+	fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f",
+			agg->start_time,
+			agg->cnt,
+			agg->latency.sum,
+			agg->latency.sum2,
+			agg->latency.min,
+			agg->latency.max);
+	if (throttle_delay)
+	{
+		fprintf(logfile, " %.0f %.0f %.0f %.0f",
+agg->lag.sum,
+agg->lag.sum2,
+agg->lag.min,
+agg->lag.max);
+		if (latency_limit)
+			fprintf(logfile, " " INT64_FORMAT, agg->skipped);
+	}
+	fputc('\n', logfile);
+}
+
 /*
  * Print log entry after completing one transaction.
  *
@@ -3793,36 +3817,19 @@ doLog(TState *thread, CState *st,
 	/* should we aggregate the results or not? */
 	if (agg_interval > 0)
 	{
+		pg_time_usec_t	next;
+
 		/*
 		 * Loop until we reach the interval of the current moment, and print
 		 * any empty intervals in between (this may happen with very low tps,
 		 * e.g. --rate=0.1).
 		 */
-
-		while (agg->start_time + agg_interval <= now)
+		while ((next = agg->start_time + agg_interval * INT64CONST(100)) <= now)
 		{
-			/* print aggregated report to logfile */
-			fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f",
-	agg->start_time,
-	agg->cnt,
-	agg->latency.sum,
-	agg->latency.sum2,
-	agg->latency.min,
-	agg->latency.max);
-			if (throttle_delay)
-			{
-fprintf(logfile, " %.0f %.0f %.0f %.0f",
-		agg->lag.sum,
-		agg->lag.sum2,
-		agg->lag.min,
-		agg->lag.max);
-if (latency_limit)
-	fprintf(logfile, " " INT64_FORMAT, agg->skipped);
-			}
-			fputc('\n', logfile);
+			logAgg(logfile, agg);
 
 			/* reset data and move to next interval */
-			initStats(agg, agg->start_time + agg_interval);
+			initStats(agg, next);
 		}
 
 		/* accumulate the current transaction */
@@ -6795,7 +6802,9 @@ done:
 		{
 			/* log aggregated but not yet reported transactions */
 			doLog(thread, state, , false, 0, 0);
+			logAgg(thread->logfile, );
 		}
+
 		fclose(thread->logfile);
 		thread->logfile = NULL;
 	}


Re: pg_regress.c also sensitive to various PG* environment variables

2021-06-11 Thread Alvaro Herrera
On 2021-Jun-11, Michael Paquier wrote:

> Following up with the recent thread that dealt with the same $subject
> for the TAP tests, I have gone through pg_regress.c:
> https://www.postgresql.org/message-id/ylbjjrpuciez7...@paquier.xyz

Good idea.

> The list of environment variables that had better be reset when using
> a temporary instance is very close to TestLib.pm, leading to the
> attached.  Please note that that the list of unsetted parameters has
> been reorganized to be consistent with the TAP tests, and that I have
> added comments referring one and the other.
> 
> Thoughts?

I think if they're to be kept in sync, then the exceptions should be
noted.  I mean, where PGCLIENTENCODING would otherwise be, I'd add
/* PGCLIENTENCODING set above */
/* See below for PGHOSTADDR */
and so on (PGHOST and PGPORT probably don't need this because they're
immediately below; not sure; but I would put them in alphabetical order
in both lists for sure and then that wouldn't apply).  Otherwise I would
think that it's an omission and would set to fix it.

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: Decoding speculative insert with toast leaks memory

2021-06-11 Thread Amit Kapila
On Fri, Jun 11, 2021 at 11:37 AM Dilip Kumar  wrote:
>
> On Thu, Jun 10, 2021 at 7:15 PM Amit Kapila  wrote:
> >
>
> >
> > Please find the patch for HEAD attached. Can you please prepare the
> > patch for back-branches by doing all the changes I have done in the
> > patch for HEAD?
>
> Done
>

Thanks, the patch looks good to me. I'll push these early next week
(Tuesday) unless someone has any other comments or suggestions.

-- 
With Regards,
Amit Kapila.




Re: Add PortalDrop in exec_execute_message

2021-06-11 Thread Tom Lane
Yura Sokolov  writes:
> This makes me think, Close message were intended to be used
> but simply forgotten when libpq patch were made.
> Tom, could I be right?

You could argue all day about what the intentions were nearly twenty
years ago.  But the facts on the ground are that we don't issue Close
in those places, and changing it now would be a de facto protocol
change for applications.  So I'm a hard -1 on these proposals.

(Alvaro's proposed change isn't a protocol break, since pipeline
mode hasn't shipped yet.  It's trying to make some brand new code
act more like old code, which seems like a fine idea.)

I think that the actual problem here has been resolved in
commit bb4aed46a.  Perhaps we should reconsider my decision not to
back-patch that.  Unlike a protocol change, that one could possibly
be sane to back-patch.  I didn't think it was worth the trouble and
risk, but maybe there's a case for it.

regards, tom lane




Re: Added schema level support for publication.

2021-06-11 Thread Bharath Rupireddy
On Fri, Jun 11, 2021, 6:22 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Sat, Jun 5, 2021 at 7:02 PM vignesh C  wrote:
> > Thanks for identifying and reporting this issue. I have \dn with the
> > equivalent query to display only the publication name. The updated
> > patch has the fix for the same.
>
> Currently, FOR ALL TABLES is there to add all the tables(existing and
> future) in the current database in which the publication is created. I
> wonder before providing FOR SCHEMA capability, we better target FOR
> DATABASE first, something like CREATE PUBLICATION ... FOR DATABASE
> foo, bar, baz, qux; Of course users with the proper permissions on the
> specified databases can add them to the publication. This can help to
> add all the tables in other databases as well. Then, the CREATE
> PUBLICATION ... FOR SCHEMA foo, bar, baz, qux; makes more sense.
> Because, my understanding is that: database is a collection of tables,
> schema is a collection of databases. I may be wrong here, but it's
> just a thought. What do you think?
>

Please ignore above comment. I was confused about what a database and
schema is in postgres. I'm sorry for the noise.

https://www.postgresql.org/docs/devel/ddl-schemas.html

Regards,
Bharath Rupireddy.


Re: Added schema level support for publication.

2021-06-11 Thread Bharath Rupireddy
On Sat, Jun 5, 2021 at 7:02 PM vignesh C  wrote:
> Thanks for identifying and reporting this issue. I have \dn with the
> equivalent query to display only the publication name. The updated
> patch has the fix for the same.

Currently, FOR ALL TABLES is there to add all the tables(existing and
future) in the current database in which the publication is created. I
wonder before providing FOR SCHEMA capability, we better target FOR
DATABASE first, something like CREATE PUBLICATION ... FOR DATABASE
foo, bar, baz, qux; Of course users with the proper permissions on the
specified databases can add them to the publication. This can help to
add all the tables in other databases as well. Then, the CREATE
PUBLICATION ... FOR SCHEMA foo, bar, baz, qux; makes more sense.
Because, my understanding is that: database is a collection of tables,
schema is a collection of databases. I may be wrong here, but it's
just a thought. What do you think?

With Regards,
Bharath Rupireddy.




Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-11 Thread Robert Haas
On Thu, Jun 10, 2021 at 9:58 PM tsunakawa.ta...@fujitsu.com
 wrote:
> I understand that.  As I cited yesterday and possibly before, that's why 
> xa_commit() returns various return codes.  So, I have never suggested that 
> FDWs should not report an error and always report success for the commit 
> request.  They should be allowed to report an error.

In the text to which I was responding it seemed like you were saying
the opposite. Perhaps I misunderstood.

> The question I have been asking is how.  With that said, we should only have 
> two options; one is the return value of the FDW commit routine, and the other 
> is via ereport(ERROR).  I suggested the possibility of the former, because if 
> the FDW does ereport(ERROR), Postgres core (transaction manager) may have 
> difficulty in handling the rest of the participants.

I don't think that is going to work. It is very difficult to write
code that doesn't ever ERROR in PostgreSQL. It is not impossible if
the operation is trivial enough, but I think you're greatly
underestimating the complexity of committing the remote transaction.
If somebody had designed PostgreSQL so that every function returns a
return code and every time you call some other function you check that
return code and pass any error up to your own caller, then there would
be no problem here. But in fact the design was that at the first sign
of trouble you throw an ERROR. It's not easy to depart from that
programming model in just one place.

> > Also, leaving aside theoretical arguments, I think it's not
> > realistically possible for an FDW author to write code to commit a
> > prepared transaction that will be safe in the context of running late
> > in PrepareTransaction(), after we've already done
> > RecordTransactionCommit(). Such code can't avoid throwing errors
> > because it can't avoid performing operations and allocating memory.
>
> I'm not completely sure about this.  I thought (and said) that the only thing 
> the FDW does would be to send a commit request through an existing 
> connection.  So, I think it's not a severe restriction to require FDWs to do 
> ereport(ERROR) during commits (of the second phase of 2PC.)

To send a commit request through an existing connection, you have to
send some bytes over the network using a send() or write() system
call. That can fail. Then you have to read the response back over the
network using recv() or read(). That can also fail. You also need to
parse the result that you get from the remote side, which can also
fail, because you could get back garbage for some reason. And
depending on the details, you might first need to construct the
message you're going to send, which might be able to fail too. Also,
the data might be encrypted using SSL, so you might have to decrypt
it, which can also fail, and you might need to encrypt data before
sending it, which can fail. In fact, if you're using the OpenSSL,
trying to call SSL_read() or SSL_write() can both read and write data
from the socket, even multiple times, so you have extra opportunities
to fail.

> (I took "abort" as the same as "rollback" here.)  Once we've sent commit 
> requests to some participants, we can't abort the transaction.  If one FDW 
> returned an error halfway, we need to send commit requests to the rest of 
> participants.

I understand that it's not possible to abort the local transaction to
abort after it's been committed, but that doesn't mean that we're
going to be able to send the commit requests to the rest of the
participants. We want to be able to do that, certainly, but there's no
guarantee that it's actually possible. Again, the remote servers may
be dropped into a volcano, or less seriously, we may not be able to
access them. Also, someone may kill off our session.

> It's a design question, as I repeatedly said, whether and how we should 
> report the error of some participants to the client.  For instance, how 
> should we report the errors of multiple participants?  Concatenate those 
> error messages?

Sure, I agree that there are some questions about how to report errors.

> Anyway, we should design the interface first, giving much thought and 
> respecting the ideas of predecessors (TX/XA, MS DTC, JTA/JTS).  Otherwise, we 
> may end up like "We implemented like this, so the interface is like this and 
> it can only behave like this, although you may find it strange..."  That 
> might be a situation similar to what your comment "the design of PostgreSQL, 
> in all circumstances, the way you recover from an error is to abort the 
> transaction" suggests -- Postgres doesn't have statement-level rollback.

I think that's a valid concern, but we also have to have a plan that
is realistic. Some things are indeed not possible in PostgreSQL's
design. Also, some of these problems are things everyone has to
somehow confront. There's no database doing 2PC that can't have a
situation where one of the machines disappears unexpectedly due to
some natural disaster or 

Re: Fix dropped object handling in pg_event_trigger_ddl_commands

2021-06-11 Thread Michael Paquier
On Fri, Jun 11, 2021 at 11:00:40AM +0300, Aleksander Alekseev wrote:
> The last argument should be `false` then.

Hm, nope.  I think that we had better pass true as argument here.

First, this is more consistent with the identity lookup (OK, it does
not matter as we would have discarded the object after the identity
lookup anyway, but any future shuffling of this code may not be that
wise).  Second, now that I look at it, getObjectTypeDescription() can
never be NULL as we have fallback names for relations, routines and
constraints for all object types so the buffer will be filled with
some data.  Let's replace the bottom of getObjectTypeDescription()
that returns now NULL by Assert(buffer.len > 0).  This code is new as
of v14, so it is better to adjust that sooner than later.
--
Michael


signature.asc
Description: PGP signature


Re: recovery test failures on hoverfly

2021-06-11 Thread Michael Paquier
On Fri, Jun 11, 2021 at 05:38:34PM +0530, Amit Kapila wrote:
> It seems the error happens in both the tests when after issuing a
> KILL, we are trying to reconnect. Can we do anything for this?

This is the same problem as c757a3da and 6d41dd0, where we write a
query to a pipe but the kill, causing a failure, makes the test fail
with a SIGPIPE in IPC::Run as a query is sent down to a pipe.

I think that using SELECT 1 to test if the server has been restarted
is a bit crazy.  I would suggest to use instead a loop based on
pg_isready.
--
Michael


signature.asc
Description: PGP signature


recovery test failures on hoverfly

2021-06-11 Thread Amit Kapila
I noticed that we are getting random failures [1][2][3] in the
recovery test on hoverfly. The failures are in 022_crash_temp_files
and 013_crash_restart. Both the tests failed due to same reason:

ack Broken pipe: write( 13, 'SELECT 1' ) at
/home/nm/src/build/IPC-Run-0.94/lib/IPC/Run/IO.pm line 558.

It seems the error happens in both the tests when after issuing a
KILL, we are trying to reconnect. Can we do anything for this?

[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly=2021-06-11%2006%3A59%3A59
[2] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly=2021-06-06%2007%3A09%3A53
[3] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly=2021-06-05%2008%3A40%3A49

-- 
With Regards,
Amit Kapila.




pg_regress.c also sensitive to various PG* environment variables

2021-06-11 Thread Michael Paquier
Hi all,

Following up with the recent thread that dealt with the same $subject
for the TAP tests, I have gone through pg_regress.c:
https://www.postgresql.org/message-id/ylbjjrpuciez7...@paquier.xyz

The list of environment variables that had better be reset when using
a temporary instance is very close to TestLib.pm, leading to the
attached.  Please note that that the list of unsetted parameters has
been reorganized to be consistent with the TAP tests, and that I have
added comments referring one and the other.

Thoughts?
--
Michael
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index 47d7f31e94..26fbe08d4b 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -104,6 +104,7 @@ BEGIN
 	delete $ENV{LC_ALL};
 	$ENV{LC_MESSAGES} = 'C';
 
+	# This list should be kept in sync with pg_regress.c.
 	my @envkeys = qw (
 	  PGCHANNELBINDING
 	  PGCLIENTENCODING
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index e04d365258..89db370184 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -819,14 +819,33 @@ initialize_environment(void)
 		 * we also use psql's -X switch consistently, so that ~/.psqlrc files
 		 * won't mess things up.)  Also, set PGPORT to the temp port, and set
 		 * PGHOST depending on whether we are using TCP or Unix sockets.
+		 *
+		 * This list should be kept in sync with TestLib.pm.
 		 */
-		unsetenv("PGDATABASE");
-		unsetenv("PGUSER");
-		unsetenv("PGSERVICE");
-		unsetenv("PGSSLMODE");
-		unsetenv("PGREQUIRESSL");
+		unsetenv("PGCHANNELBINDING");
 		unsetenv("PGCONNECT_TIMEOUT");
 		unsetenv("PGDATA");
+		unsetenv("PGDATABASE");
+		unsetenv("PGGSSENCMODE");
+		unsetenv("PGGSSLIB");
+		unsetenv("PGKRBSRVNAME");
+		unsetenv("PGPASSFILE");
+		unsetenv("PGREQUIREPEER");
+		unsetenv("PGREQUIRESSL");
+		unsetenv("PGSERVICE");
+		unsetenv("PGSERVICEFILE");
+		unsetenv("PGSSLCERT");
+		unsetenv("PGSSLCRL");
+		unsetenv("PGSSLCRLDIR");
+		unsetenv("PGSSLKEY");
+		unsetenv("PGSSLMAXPROTOCOLVERSION");
+		unsetenv("PGSSLMINPROTOCOLVERSION");
+		unsetenv("PGSSLMODE");
+		unsetenv("PGSSLROOTCERT");
+		unsetenv("PGSSLSNI");
+		unsetenv("PGTARGETSESSIONATTRS");
+		unsetenv("PGUSER");
+
 #ifdef HAVE_UNIX_SOCKETS
 		if (hostname != NULL)
 			setenv("PGHOST", hostname, 1);


signature.asc
Description: PGP signature


Re: PoC/WIP: Extended statistics on expressions

2021-06-11 Thread Tomas Vondra




On 6/11/21 6:55 AM, Noah Misch wrote:

On Sun, Jun 06, 2021 at 09:13:17PM +0200, Tomas Vondra wrote:


On 6/6/21 7:37 AM, Noah Misch wrote:

On Sat, Mar 27, 2021 at 01:17:14AM +0100, Tomas Vondra wrote:

OK, pushed after a bit more polishing and testing.


This added a "transformed" field to CreateStatsStmt, but it didn't mention
that field in src/backend/nodes.  Should those functions handle the field?



Yup, that's a mistake - it should do whatever CREATE INDEX is doing. Not
sure if it can result in error/failure or just inefficiency (due to
transforming the expressions repeatedly), but it should do whatever
CREATE INDEX is doing.

Thanks for noticing! Fixed by d57ecebd12.


Great.  For future reference, this didn't need a catversion bump.  readfuncs.c
changes need a catversion bump, since the catalogs might contain input for
each read function.  Other src/backend/nodes functions don't face that.  Also,
src/backend/nodes generally process fields in the order that they appear in
the struct.  The order you used in d57ecebd12 is nicer, being more like
IndexStmt, so I'm pushing an order change to the struct.



OK, makes sense. Thanks!

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-11 Thread Ajin Cherian
On Fri, Jun 11, 2021 at 8:14 PM Amit Kapila  wrote:
>
> On Thu, Jun 10, 2021 at 2:04 PM Ajin Cherian  wrote:
> >
>
> The new patches look mostly good apart from the below cosmetic issues.
> I think the question is whether we want to do these for PG-14 or
> postpone them till PG-15. I think these don't appear to be risky
> changes so we can get them in PG-14 as that might help some outside
> core solutions as appears to be the case for Jeff. The changes related
> to start_replication are too specific to the subscriber-side solution
> so we can postpone those along with the subscriber-side 2PC work.
> Jeff, Ajin, what do you think?
>

Since we are exposing two-phase decoding using the
pg_create_replication_slot API, I think
it is reasonable to expose CREATE_REPLICATION_SLOT as well. We can
leave the subscriber side changes
for PG-15.

> Also, I can take care of the below cosmetic issues before committing
> if we decide to do this for PG-14.

Thanks,

Regards,
Ajin Cherian
Fujitsu Australia




Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-11 Thread Amit Kapila
On Thu, Jun 10, 2021 at 2:04 PM Ajin Cherian  wrote:
>

The new patches look mostly good apart from the below cosmetic issues.
I think the question is whether we want to do these for PG-14 or
postpone them till PG-15. I think these don't appear to be risky
changes so we can get them in PG-14 as that might help some outside
core solutions as appears to be the case for Jeff. The changes related
to start_replication are too specific to the subscriber-side solution
so we can postpone those along with the subscriber-side 2PC work.
Jeff, Ajin, what do you think?

Also, I can take care of the below cosmetic issues before committing
if we decide to do this for PG-14.

Few cosmetic issues:
==
1. git diff --check shows
src/bin/pg_basebackup/t/030_pg_recvlogical.pl:109: new blank line at EOF.

2.
+
   
   The following example shows SQL interface that can be used to decode prepared
   transactions. Before you use two-phase commit commands, you must set

Spurious line addition.

3.
/* Build query */
  appendPQExpBuffer(query, "CREATE_REPLICATION_SLOT \"%s\"", slot_name);
  if (is_temporary)
  appendPQExpBufferStr(query, " TEMPORARY");
+
  if (is_physical)

Spurious line addition.

4.
  appendPQExpBuffer(query, " LOGICAL \"%s\"", plugin);
+ if (two_phase && PQserverVersion(conn) >= 14)
+ appendPQExpBufferStr(query, " TWO_PHASE");
+
  if (PQserverVersion(conn) >= 10)
  /* pg_recvlogical doesn't use an exported snapshot, so suppress */
  appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");

I think it might be better to append TWO_PHASE after NOEXPORT_SNAPSHOT
but it doesn't matter much.

5.
+$node->safe_psql('postgres',
+ "BEGIN;INSERT INTO test_table values (11); PREPARE TRANSACTION 'test'");

There is no space after BEGIN but there is a space after INSERT. For
consistency-sake, I will have space after BEGIN as well.



-- 
With Regards,
Amit Kapila.




Re: Race condition in recovery?

2021-06-11 Thread Dilip Kumar
On Fri, Jun 11, 2021 at 11:45 AM Kyotaro Horiguchi
 wrote:
>
> At Thu, 10 Jun 2021 21:53:18 -0400, Tom Lane  wrote in
> tgl> Please note that conchuela and jacana are still failing ...
>
> I forgot jacana's case..
>
> It is failing for the issue the first patch should have fixed.
>
> > ==~_~===-=-===~_~== 
> > pgsql.build/src/test/recovery/tmp_check/log/025_stuck_on_old_timeline_primary.log
> >  ==~_~===-=-===~_~==
> ...
> > The system cannot find the path specified.
> > 2021-06-10 22:56:17.754 EDT [60c2d0cf.54c:1] LOG:  archive command failed 
> > with exit code 1
> > 2021-06-10 22:56:17.754 EDT [60c2d0cf.54c:2] DETAIL:  The failed archive 
> > command was: /usr/bin/perl 
> > "/home/pgrunner/bf/root/HEAD/pgsql/src/test/recovery/t/cp_history_files" 
> > "pg_wal\\00010001" 
> > "/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_025_stuck_on_old_timeline_primary_data/archives/00010001"

Wal file copying will not create a problem for us, but I noticed that
it is failing in copying the history files as well and that is
creating a problem.

2021-06-10 22:56:28.940 EDT [60c2d0db.1208:1] LOG:  archive command
failed with exit code 1
2021-06-10 22:56:28.940 EDT [60c2d0db.1208:2] DETAIL:  The failed
archive command was: /usr/bin/perl
"/home/pgrunner/bf/root/HEAD/pgsql/src/test/recovery/t/cp_history_files"
"pg_wal\\0002.history"
"/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_025_stuck_on_old_timeline_primary_data/archives/0002.history"

I have noticed that the archive command is failing in some other test
case too (002_archiving_standby2.log), see below logs.

==~_~===-=-===~_~==
pgsql.build/src/test/recovery/tmp_check/log/002_archiving_standby2.log
==~_~===-=-===~_~==
...

0 file(s) copied.
2021-06-10 22:44:34.467 EDT [60c2ce10.1270:1] LOG:  archive command
failed with exit code 1
2021-06-10 22:44:34.467 EDT [60c2ce10.1270:2] DETAIL:  The failed
archive command was: copy "pg_wal\\0003.history"
"c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_002_archiving_primary_data/archives\\0003.history"
The system cannot find the path specified.
0 file(s) copied.
2021-06-10 22:44:35.478 EDT [60c2ce10.1270:3] LOG:  archive command
failed with exit code 1
2021-06-10 22:44:35.478 EDT [60c2ce10.1270:4] DETAIL:  The failed
archive command was: copy "pg_wal\\0003.history"
"c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_002_archiving_primary_data/archives\\0003.history"
2021-06-10 22:44:36.113 EDT [60c2ce0c.283c:5] LOG:  received immediate
shutdown request
2021-06-10 22:44:36.129 EDT [60c2ce0c.283c:6] LOG:  database system is shut down

I am not able to figure out why the archive command is failing.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




[PATCH] Fix select from wrong table array_op_test

2021-06-11 Thread Jason Kim
In the middle of GIN index testing, there are some selects that are on
a different table array_op_test that doesn't even have an index.  They
probably were supposed to be selects to table array_index_op_test like
the other ones around the area.

Fix that.  The expected output should stay the same because both tables
use the same array.data.
---
 src/test/regress/expected/create_index.out | 12 ++--
 src/test/regress/sql/create_index.sql  | 12 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/src/test/regress/expected/create_index.out 
b/src/test/regress/expected/create_index.out
index 49f2a158c1..cfdf73179f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -904,23 +904,23 @@ SELECT * FROM array_index_op_test WHERE i <@ '{}' ORDER 
BY seqno;
101 | {} | {}
 (1 row)
 
-SELECT * FROM array_op_test WHERE i = '{NULL}' ORDER BY seqno;
+SELECT * FROM array_index_op_test WHERE i = '{NULL}' ORDER BY seqno;
  seqno |   i|   t
 ---++
102 | {NULL} | {NULL}
 (1 row)
 
-SELECT * FROM array_op_test WHERE i @> '{NULL}' ORDER BY seqno;
+SELECT * FROM array_index_op_test WHERE i @> '{NULL}' ORDER BY seqno;
  seqno | i | t 
 ---+---+---
 (0 rows)
 
-SELECT * FROM array_op_test WHERE i && '{NULL}' ORDER BY seqno;
+SELECT * FROM array_index_op_test WHERE i && '{NULL}' ORDER BY seqno;
  seqno | i | t 
 ---+---+---
 (0 rows)
 
-SELECT * FROM array_op_test WHERE i <@ '{NULL}' ORDER BY seqno;
+SELECT * FROM array_index_op_test WHERE i <@ '{NULL}' ORDER BY seqno;
  seqno | i  | t  
 ---++
101 | {} | {}
@@ -1195,13 +1195,13 @@ SELECT * FROM array_index_op_test WHERE t = '{}' ORDER 
BY seqno;
101 | {} | {}
 (1 row)
 
-SELECT * FROM array_op_test WHERE i = '{NULL}' ORDER BY seqno;
+SELECT * FROM array_index_op_test WHERE i = '{NULL}' ORDER BY seqno;
  seqno |   i|   t
 ---++
102 | {NULL} | {NULL}
 (1 row)
 
-SELECT * FROM array_op_test WHERE i <@ '{NULL}' ORDER BY seqno;
+SELECT * FROM array_index_op_test WHERE i <@ '{NULL}' ORDER BY seqno;
  seqno | i  | t  
 ---++
101 | {} | {}
diff --git a/src/test/regress/sql/create_index.sql 
b/src/test/regress/sql/create_index.sql
index 8bc76f7c6f..9474dabf9e 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -295,10 +295,10 @@ SELECT * FROM array_index_op_test WHERE i = '{}' ORDER BY 
seqno;
 SELECT * FROM array_index_op_test WHERE i @> '{}' ORDER BY seqno;
 SELECT * FROM array_index_op_test WHERE i && '{}' ORDER BY seqno;
 SELECT * FROM array_index_op_test WHERE i <@ '{}' ORDER BY seqno;
-SELECT * FROM array_op_test WHERE i = '{NULL}' ORDER BY seqno;
-SELECT * FROM array_op_test WHERE i @> '{NULL}' ORDER BY seqno;
-SELECT * FROM array_op_test WHERE i && '{NULL}' ORDER BY seqno;
-SELECT * FROM array_op_test WHERE i <@ '{NULL}' ORDER BY seqno;
+SELECT * FROM array_index_op_test WHERE i = '{NULL}' ORDER BY seqno;
+SELECT * FROM array_index_op_test WHERE i @> '{NULL}' ORDER BY seqno;
+SELECT * FROM array_index_op_test WHERE i && '{NULL}' ORDER BY seqno;
+SELECT * FROM array_index_op_test WHERE i <@ '{NULL}' ORDER BY seqno;
 
 CREATE INDEX textarrayidx ON array_index_op_test USING gin (t);
 
@@ -331,8 +331,8 @@ SELECT * FROM array_index_op_test WHERE t && 
'{AAA80240}' ORDER BY seqno;
 SELECT * FROM array_index_op_test WHERE i @> '{32}' AND t && '{AAA80240}' 
ORDER BY seqno;
 SELECT * FROM array_index_op_test WHERE i && '{32}' AND t @> '{AAA80240}' 
ORDER BY seqno;
 SELECT * FROM array_index_op_test WHERE t = '{}' ORDER BY seqno;
-SELECT * FROM array_op_test WHERE i = '{NULL}' ORDER BY seqno;
-SELECT * FROM array_op_test WHERE i <@ '{NULL}' ORDER BY seqno;
+SELECT * FROM array_index_op_test WHERE i = '{NULL}' ORDER BY seqno;
+SELECT * FROM array_index_op_test WHERE i <@ '{NULL}' ORDER BY seqno;
 
 RESET enable_seqscan;
 RESET enable_indexscan;
-- 
2.24.1





pgbench bug candidate: negative "initial connection time"

2021-06-11 Thread kuroda.hay...@fujitsu.com
Hi Hackers,

I played pgbench with wrong parameters, and I found bug-candidate.
The latest commit in my source is 3a09d75.

1. Do initdb and start.
2. Initialize schema and data with "scale factor" = 1.
3. execute following command many times:

$ pgbench -c 101 -j 10 postgres

Then, sometimes the negative " initial connection time" was returned.
Lateyncy average is also strange.

```
$ pgbench -c 101 -j 10 postgres
starting vacuum...end.
pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: 
FATAL:  sorry, too many clients already
pgbench (PostgreSQL) 14.0
transaction type: 
scaling factor: 1
query mode: simple
number of clients: 101
number of threads: 10
number of transactions per client: 10
number of transactions actually processed: 910/1010
latency average = 41387686.662 ms
initial connection time = -372896921.586 ms
tps = 0.002440 (without initial connection time)
```

I sought pgbench.c and found a reason.
When a thread failed to get some connections, they do not fill any values to 
thread->bench_start in threadRun().
And if the failure is caused in the final thread (this means threads[nthreads - 
1]->bench_start is zero),
the following if-statement sets bench_start to zero.

```
   6494 /* first recorded benchmarking start time */
   6495 if (bench_start == 0 || thread->bench_start < 
bench_start)
   6496 bench_start = thread->bench_start;
```

The wrong bench_start propagates to printResult() and then some invalid values 
are appered.

```
   6509 printResults(, pg_time_now() - bench_start, 
conn_total_duration,
   6510  bench_start - start_time, 
latency_late);
```

I cannot distinguish whether we have to fix it, but I attache the patch.
This simply ignores a result when therad->bench_start is zero.

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



ignore_failed_thread.patch
Description: ignore_failed_thread.patch


Re: Fix dropped object handling in pg_event_trigger_ddl_commands

2021-06-11 Thread Aleksander Alekseev
Hi Michael,

> /* The type can never be NULL */
> type = getObjectTypeDescription(, true);

The last argument should be `false` then.


--
Best regards,
Aleksander Alekseev


v4-0001-pg_event_trigger_ddl_commands.patch
Description: Binary data


Re: Duplicate history file?

2021-06-11 Thread Julien Rouhaud
On Fri, Jun 11, 2021 at 03:32:28PM +0900, Kyotaro Horiguchi wrote:
> At Fri, 11 Jun 2021 10:48:32 +0800, Julien Rouhaud  wrote 
> in 
> > 
> > "far from perfect" is a strong understatement for "appears to work but will
> > randomly and silently breaks everything without a simple way to detect it".
> 
> I think it's overstating. It sounds like a story of a mission critical
> world.  How perfect archive_command should be depends on the
> requirements of every system.  Simple cp is actualy sufficient in
> certain log? range of usages, maybe.

I disagree, cp is probably the worst command that can be used for this purpose.
On top on the previous problems already mentioned, you also have the fact that
the copy isn't atomic.  It means that any concurrent restore_command (or
anything that would consume the archived files) will happily process a half
copied WAL file, and in case of any error during the copy you end up with a
file for which you don't know if it contains valid data or not.  I don't see
any case where you would actually want to use that, unless maybe if you want to
benchmark how long it takes before you lose some data.

> > We should document a minimum workable setup, but cp isn't an example of 
> > that,
> > and I don't think that there will be a simple example unless we provide a
> > dedicated utility.
> 
> It looks somewhat strange like "Well, you need a special track to
> drive your car, however, we don't have one. It's your responsibility
> to construct a track that protects it from accidents perfectly.".
> 
> "Yeah, I'm not going to push it so hard and don't care it gets some
> small scratches, couldn't I drive it on a public road?"
> 
> (Sorry for the bad analogy).

I think that a better analogy would be "I don't need working brakes on my car
since I only drive on highway and there aren't any red light there".

> Isn't it almost saying that anything less than pgBackRest isn't
> qualified as archive_program?

I don't know, I'm assuming that barman also provides one, such as wal-e and
wal-g (assuming that the distant providers do their part of the job correctly).
Maybe there are other tools too.  But as long as we don't document what exactly
are the requirements, it's not really a surprise that most people don't
implement them.




Re: Duplicate history file?

2021-06-11 Thread Michael Paquier
On Fri, Jun 11, 2021 at 03:32:28PM +0900, Kyotaro Horiguchi wrote:
> At Fri, 11 Jun 2021 10:48:32 +0800, Julien Rouhaud  wrote 
> in 
>> "far from perfect" is a strong understatement for "appears to work but will
>> randomly and silently breaks everything without a simple way to detect it".

Yeah.  Users like unplugging their hosts, because that's *fast* and
easy to do.

> I think it's overstating. It sounds like a story of a mission critical
> world.  How perfect archive_command should be depends on the
> requirements of every system.  Simple cp is actually sufficient in
> certain log? range of usages, maybe.
> 
>> We should document a minimum workable setup, but cp isn't an example of that,
>> and I don't think that there will be a simple example unless we provide a
>> dedicated utility.
> 
> I think cp can be an example as far as we explain the limitations. (On
> the other hand "test !-f" cannot since it actually prevents server
> from working correctly.)

Disagreed.  I think that we should not try to change this area until
we can document a reliable solution, and a simple "cp" is not that.
Hmm.  A simple command that could be used as reference is for example
"dd" that flushes the file by itself, or we could just revisit the
discussions about having a pg_copy command, or we could document a
small utility in perl that does the job.
--
Michael


signature.asc
Description: PGP signature


Re: Error on pgbench logs

2021-06-11 Thread Kyotaro Horiguchi
At Fri, 11 Jun 2021 15:56:55 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Doesn't threadRun already doing that?

(s/Does/Is)

That's once per thread, sorry for the noise.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Added missing tab completion for alter subscription set option

2021-06-11 Thread Michael Paquier
On Sun, May 23, 2021 at 04:24:59PM +0530, vignesh C wrote:
>   /* Complete "CREATE SUBSCRIPTION  ...  WITH ( " */
>   else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", 
> "("))
> - COMPLETE_WITH("copy_data", "connect", "create_slot", "enabled",
> -   "slot_name", "synchronous_commit");
> + COMPLETE_WITH("binary", "copy_data", "connect", "create_slot",
> +   "enabled", "slot_name", "streaming",
> +   "synchronous_commit");

"copy_data" and "connect" need to be reversed.  Applied.
--
Michael


signature.asc
Description: PGP signature


Re: Error on pgbench logs

2021-06-11 Thread Kyotaro Horiguchi
At Fri, 11 Jun 2021 15:23:41 +0900, Michael Paquier  wrote 
in 
> On Thu, Jun 10, 2021 at 11:29:30PM +0200, Fabien COELHO wrote:
> > +   /* flush remaining stats */
> > +   if (!logged && latency == 0.0)
> > +   logAgg(logfile, agg);
> 
> You are right, this is missing the final stats.  Why the choice of
> latency here for the check?  That's just to make the difference
> between the case where doLog() is called while processing the
> benchmark for the end of the transaction and the case where doLog() is
> called once a thread ends, no?  Wouldn't it be better to do a final
> push of the states once a thread reaches CSTATE_FINISHED or
> CSTATE_ABORTED instead?

Doesn't threadRun already doing that?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Duplicate history file?

2021-06-11 Thread Kyotaro Horiguchi
At Fri, 11 Jun 2021 10:48:32 +0800, Julien Rouhaud  wrote 
in 
> On Fri, Jun 11, 2021 at 11:25:51AM +0900, Kyotaro Horiguchi wrote:
> > 
> > Nevertheless I agree to it, still don't we need a minimum workable
> > setup as the first step? Something like below.
> > 
> > ===
> > The following is an example of the minimal archive_command.
> > 
> > Example: cp %p /blah/%f
> > 
> > However, it is far from perfect. The following is the discussion about
> > what is needed for archive_command to be more reliable.
> > 
> > 
> > 
> 
> "far from perfect" is a strong understatement for "appears to work but will
> randomly and silently breaks everything without a simple way to detect it".

I think it's overstating. It sounds like a story of a mission critical
world.  How perfect archive_command should be depends on the
requirements of every system.  Simple cp is actualy sufficient in
certain log? range of usages, maybe.

> We should document a minimum workable setup, but cp isn't an example of that,
> and I don't think that there will be a simple example unless we provide a
> dedicated utility.

It looks somewhat strange like "Well, you need a special track to
drive your car, however, we don't have one. It's your responsibility
to construct a track that protects it from accidents perfectly.".

"Yeah, I'm not going to push it so hard and don't care it gets some
small scratches, couldn't I drive it on a public road?"

(Sorry for the bad analogy).

I think cp can be an example as far as we explain the limitations. (On
the other hand "test !-f" cannot since it actually prevents server
from working correctly.)

> It could however be something along those lines:
> 
> Example: archive_program %p /path/to/%d
> 
> archive_program being a script ensuring that all those requirements are met:
> 

Isn't it almost saying that anything less than pgBackRest isn't
qualified as archive_program?


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Multi-Column List Partitioning

2021-06-11 Thread Amit Langote
On Fri, Jun 11, 2021 at 12:37 PM Amit Langote  wrote:
> I will look at other parts of the patch next week hopefully.   For
> now, attached is a delta patch that applies on top of your v1, which
> does:
>
> * Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
> * Make qsort_partition_list_value_cmp simply call
> partition_lbound_datum_cmp() instead of having its own logic to
> compare input bounds
> * Move partition_lbound_datum_cmp() into partbounds.c as a static
> function (export seems unnecessary)
> * Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

One more:

* Add all columns of newly added test query in insert.sql to the order
by clause to get predictably ordered output

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: Error on pgbench logs

2021-06-11 Thread Michael Paquier
On Thu, Jun 10, 2021 at 11:29:30PM +0200, Fabien COELHO wrote:
> + /* flush remaining stats */
> + if (!logged && latency == 0.0)
> + logAgg(logfile, agg);

You are right, this is missing the final stats.  Why the choice of
latency here for the check?  That's just to make the difference
between the case where doLog() is called while processing the
benchmark for the end of the transaction and the case where doLog() is
called once a thread ends, no?  Wouldn't it be better to do a final
push of the states once a thread reaches CSTATE_FINISHED or
CSTATE_ABORTED instead?
--
Michael


signature.asc
Description: PGP signature


Re: Question about StartLogicalReplication() error path

2021-06-11 Thread Jeff Davis
On Fri, 2021-06-11 at 10:13 +0530, Amit Kapila wrote:
> Because sometimes clients don't have to do anything for xlog records.
> One example is WAL for DDL where logical decoding didn't produce
> anything for the client but later with keepalive we send the LSN of
> WAL where DDL has finished and the client just responds with the
> position sent by the server as it doesn't have any other pending
> transactions.

If I understand correctly, in this situation it avoids the cost of a
write on the client just to update its stored LSN progress value when
there's no real data to be written. In that case the client would need
to rely on the server's confirmed_flush_lsn instead of its own stored
LSN progress value.

That's a reasonable thing for the *client* to do explicitly, e.g. by
just reading the slot's confirmed_flush_lsn and comparing to its own
stored lsn. But I don't think it's reasonable for the server to just
skip over data requested by the client because it thinks it knows best.

> I think because there is no need to process the WAL that has been
> confirmed by the client. Do you see any problems with this scheme?

Several:

* Replication setups are complex, and it can be easy to misconfigure
something or have a bug in some control code. An error is valuable to
detect the problem closer to the source.

* There are plausible configurations where things could go badly wrong.
For instance, if you are storing the decoded data in another postgres
server with syncrhonous_commit=off, and acknowledging LSNs before they
are durable. A crash of the destination system would be consistent, but
it would be missing some data earlier than the confirmed_flush_lsn. The
client would then request the data starting at its stored lsn progress
value, but the server would skip ahead to the confirmed_flush_lsn;
silently missing data.

* It's contradicted by the docs: "Instructs server to start streaming
WAL for logical replication, starting at WAL location XXX/XXX."

* The comment acknowledges that a user might expect an error in that
case; but doesn't really address why the user would expect an error,
and why it's OK to violate that expectation.

Regards,
Jeff Davis






Re: Race condition in recovery?

2021-06-11 Thread Kyotaro Horiguchi
At Thu, 10 Jun 2021 21:53:18 -0400, Tom Lane  wrote in 
tgl> Please note that conchuela and jacana are still failing ...

I forgot jacana's case..

It is failing for the issue the first patch should have fixed.

> ==~_~===-=-===~_~== 
> pgsql.build/src/test/recovery/tmp_check/log/025_stuck_on_old_timeline_primary.log
>  ==~_~===-=-===~_~==
...
> The system cannot find the path specified.
> 2021-06-10 22:56:17.754 EDT [60c2d0cf.54c:1] LOG:  archive command failed 
> with exit code 1
> 2021-06-10 22:56:17.754 EDT [60c2d0cf.54c:2] DETAIL:  The failed archive 
> command was: /usr/bin/perl 
> "/home/pgrunner/bf/root/HEAD/pgsql/src/test/recovery/t/cp_history_files" 
> "pg_wal\\00010001" 
> "/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_025_stuck_on_old_timeline_primary_data/archives/00010001"

the cp_history_files exits with just "exit" for the files with that
name, which should set status to 0. ActivePerl did so.

If I specified nonexistent command like /hoge/perl, %ERRORLEVEL% is
set to 3, not 1.

I don't find what is happening there so far.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center