date:20220113

Re: MDAM techniques and Index Skip Scan patch

2022-01-13 Thread Dmitry Dolgov

> On Thu, Jan 13, 2022 at 03:27:08PM +, Floris Van Nee wrote:
> >
> > Could you send a rebased version?  In the meantime I will change the status
> > on the cf app to Waiting on Author.
>
> Attached a rebased version.

FYI, I've attached this thread to the CF item as an informational one,
but as there are some patches posted here, folks may get confused. For
those who have landed here with no context, I feel obliged to mention
that now there are two alternative patch series posted under the same
CF item:

* the original one lives in [1], waiting for reviews since the last May
* an alternative one posted here from Floris

[1]: 
https://www.postgresql.org/message-id/flat/20200609102247.jdlatmfyeecg52fi@localhost

Re: Add sub-transaction overflow status in pg_stat_activity

2022-01-13 Thread Julien Rouhaud

On Thu, Jan 13, 2022 at 10:27:31PM +, Bossart, Nathan wrote:
> Thanks for the new patch!
> 
> +   
> +Returns a record of information about the backend's subtransactions.
> +The fields returned are subxact_count 
> identifies
> +number of active subtransaction and subxact_overflow
> + shows whether the backend's subtransaction cache is
> +overflowed or not.
> +   
> +   
> 
> nitpick: There is an extra "" here.

Also the sentence looks a bit weird.  I think something like that would be
better:

> +Returns a record of information about the backend's subtransactions.
> +The fields returned are subxact_count, which
> +identifies the number of active subtransaction and 
> subxact_overflow
> +, which shows whether the backend's subtransaction cache 
> is
> +overflowed or not.

While on the sub-transaction overflow topic, I'm wondering if we should also
raise a warning (maybe optionally) immediately when a backend overflows (so in
GetNewTransactionId()).

Like many I previously had to investigate a slowdown due to sub-transaction
overflow, and even with the information available in a monitoring view (I had
to rely on a quick hackish extension as I couldn't patch postgres) it's not
terribly fun to do this way.  On top of that log analyzers like pgBadger could
help to highlight such a problem.

I don't want to derail this thread so let me know if I should start a distinct
discussion for that.

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-13 Thread Thomas Munro

On Thu, Jan 13, 2022 at 7:36 PM Thomas Munro  wrote:
> ... First I tried to use recv(fd, &c, 1,
> MSG_PEEK) == 0 to detect EOF, which seemed to me to be a reasonable
> enough candidate, but somehow it corrupts the stream (!?),

Ahh, that'd be because recv() and friends are redirected to our
wrappers in socket.c, where we use the overlapped Winsock API (that
is, async network IO), which is documented as not supporting MSG_PEEK.
OK then.

Andres and I chatted about this stuff off list and he pointed out
something else about the wrappers in socket.c: there are more paths in
there that work with socket events, which means more ways to lose the
precious FD_CLOSE event.  I don't know if any of those paths are
reachable in the relevant cases, but it does look a little bit like
the lack of graceful shutdown might have been hiding a whole class of
event tracking bug.

Re: RFC: Logging plan of the running query

2022-01-13 Thread Julien Rouhaud

Hi,

On Fri, Jan 07, 2022 at 09:54:31PM +0900, Fujii Masao wrote:
> 
> On 2022/01/07 20:58, torikoshia wrote:
> > On 2022-01-07 14:30, torikoshia wrote:
> > 
> > > Updated the patch for fixing compiler warning about the format on windows.
> > 
> > I got another compiler warning, updated the patch again.
> 
> Thanks for updating the patch!
> 
> I ran the following query every 0.1s by using \watch psql command from three 
> different sessions while make installcheck test was running.
> 
> SELECT pg_log_query_plan(pid) FROM pg_stat_activity;
> 
> 
> And then I got the segmentation fault as follows.
> 
> 2022-01-07 21:40:32 JST [postmaster] LOG:  server process (PID 51017) was 
> terminated by signal 11: Segmentation fault: 11
> 2022-01-07 21:40:32 JST [postmaster] DETAIL:  Failed process was running: 
> select description, (test_conv(inbytes, 'utf8', 'utf8')).* from 
> utf8_verification_inputs;
> 2022-01-07 21:40:32 JST [postmaster] LOG:  terminating any other active 
> server processes
> 2022-01-07 21:40:32 JST [postmaster] LOG:  all server processes terminated; 
> reinitializing
> 
> 
> The backtrace I got from the core file was:
> 
> (lldb) target create --core "/cores/core.51017"
> Core file '/cores/core.51017' (x86_64) was loaded.
> (lldb) bt
> * thread #1, stop reason = signal SIGSTOP
>   * frame #0: 0x7fff20484552 libsystem_platform.dylib`_platform_strlen + 
> 18
> frame #1: 0x00011076e36c postgres`dopr(target=0x7ffedfd1b1d8, 
> format="\n", args=0x7ffedfd1b450) at snprintf.c:444:20
> frame #2: 0x00011076e133 postgres`pg_vsnprintf(str="Query Text: 
> \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x

Re: relcache not invalidated when ADD PRIMARY KEY USING INDEX

2022-01-13 Thread Julien Rouhaud

Hi,

On Mon, Dec 20, 2021 at 06:45:33AM +, houzj.f...@fujitsu.com wrote:
> 
> I tried to write a patch to fix this by invalidating the relcache after 
> marking
> primary key in index_constraint_create().

The cfbot reports that you have a typo in your regression tests:

https://api.cirrus-ci.com/v1/artifact/task/4806573636714496/regress_diffs/contrib/test_decoding/regression.diffs
diff -U3 /tmp/cirrus-ci-build/contrib/test_decoding/expected/ddl.out 
/tmp/cirrus-ci-build/contrib/test_decoding/results/ddl.out
--- /tmp/cirrus-ci-build/contrib/test_decoding/expected/ddl.out 2022-01-11 
16:46:51.684727957 +
+++ /tmp/cirrus-ci-build/contrib/test_decoding/results/ddl.out  2022-01-11 
16:50:35.584089784 +
@@ -594,7 +594,7 @@
 DELETE FROM table_dropped_index_no_pk WHERE b = 1;
 DELETE FROM table_dropped_index_no_pk WHERE a = 3;
 DROP TABLE table_dropped_index_no_pk;
--- check table with newly added primary key
+-- check tables with newly added primary key

Could you send an updated patch?  As far as I can see the rest of the
regression tests succeed so the patch can probably still be reviewed.  That
being said, there are pending comments by Euler so I think it's appropriate to
zsh:1: command not found: q

Re: Add client connection check during the execution of the query

2022-01-13 Thread Thomas Munro

On Fri, Jan 14, 2022 at 4:35 PM Andres Freund  wrote:
> The more I think about it, the less I see why we *ever* need to re-arm the
> latch in pq_check_connection() in this approach. pq_check_connection() is only
> used from from ProcessInterrupts(), and there's plenty things inside
> ProcessInterrupts() that can cause latches to be reset (e.g. parallel message
> processing causing log messages to be sent to the client, causing network IO,
> which obviously can do a latch reset).

Thanks for the detailed explanation.  I guess I was being overly
cautious and a little myopic, "leave things exactly the way you found
them", so I didn't have to think about any of that.  I see now that
the scenario I was worrying about would be a bug in whatever
latch-wait loop happens to reach this code.  Alright then, here is
just... one... more... patch, this time consuming any latch that gets
in the way and retrying, with no restore.
From d0399282e73ebd47dbead19b56586fff8ba3e9d2 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 30 Apr 2021 10:38:40 +1200
Subject: [PATCH v7 1/2] Add WL_SOCKET_CLOSED for socket shutdown events.

Provide a way for WaitEventSet to report that the remote peer has shut
down its socket, independently of whether there is any buffered data
remaining to be read.  This works only on systems where the kernel
exposes that information, namely:

* WAIT_USE_POLL builds using POLLRDHUP, if available
* WAIT_USE_EPOLL builds using EPOLLRDHUP
* WAIT_USE_KQUEUE builds using EV_EOF

Reviewed-by: Zhihong Yu 
Reviewed-by: Maksim Milyutin 
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
---
 src/backend/storage/ipc/latch.c | 79 +
 src/include/storage/latch.h |  6 ++-
 2 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 61c876beff..9a498b0f12 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -841,6 +841,7 @@ FreeWaitEventSet(WaitEventSet *set)
  * - WL_SOCKET_CONNECTED: Wait for socket connection to be established,
  *	 can be combined with other WL_SOCKET_* events (on non-Windows
  *	 platforms, this is the same as WL_SOCKET_WRITEABLE)
+ * - WL_SOCKET_CLOSED: Wait for socket to be closed by remote peer.
  * - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies
  *
  * Returns the offset in WaitEventSet->events (starting from 0), which can be
@@ -1043,12 +1044,16 @@ WaitEventAdjustEpoll(WaitEventSet *set, WaitEvent *event, int action)
 	else
 	{
 		Assert(event->fd != PGINVALID_SOCKET);
-		Assert(event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE));
+		Assert(event->events & (WL_SOCKET_READABLE |
+WL_SOCKET_WRITEABLE |
+WL_SOCKET_CLOSED));
 
 		if (event->events & WL_SOCKET_READABLE)
 			epoll_ev.events |= EPOLLIN;
 		if (event->events & WL_SOCKET_WRITEABLE)
 			epoll_ev.events |= EPOLLOUT;
+		if (event->events & WL_SOCKET_CLOSED)
+			epoll_ev.events |= EPOLLRDHUP;
 	}
 
 	/*
@@ -1087,12 +1092,18 @@ WaitEventAdjustPoll(WaitEventSet *set, WaitEvent *event)
 	}
 	else
 	{
-		Assert(event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE));
+		Assert(event->events & (WL_SOCKET_READABLE |
+WL_SOCKET_WRITEABLE |
+WL_SOCKET_CLOSED));
 		pollfd->events = 0;
 		if (event->events & WL_SOCKET_READABLE)
 			pollfd->events |= POLLIN;
 		if (event->events & WL_SOCKET_WRITEABLE)
 			pollfd->events |= POLLOUT;
+#ifdef POLLRDHUP
+		if (event->events & WL_SOCKET_CLOSED)
+			pollfd->events |= POLLRDHUP;
+#endif
 	}
 
 	Assert(event->fd != PGINVALID_SOCKET);
@@ -1165,7 +1176,9 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
 	Assert(event->events != WL_LATCH_SET || set->latch != NULL);
 	Assert(event->events == WL_LATCH_SET ||
 		   event->events == WL_POSTMASTER_DEATH ||
-		   (event->events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)));
+		   (event->events & (WL_SOCKET_READABLE |
+			 WL_SOCKET_WRITEABLE |
+			 WL_SOCKET_CLOSED)));
 
 	if (event->events == WL_POSTMASTER_DEATH)
 	{
@@ -1188,9 +1201,9 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
 		 * old event mask to the new event mask, since kevent treats readable
 		 * and writable as separate events.
 		 */
-		if (old_events & WL_SOCKET_READABLE)
+		if (old_events & (WL_SOCKET_READABLE | WL_SOCKET_CLOSED))
 			old_filt_read = true;
-		if (event->events & WL_SOCKET_READABLE)
+		if (event->events & (WL_SOCKET_READABLE | WL_SOCKET_CLOSED))
 			new_filt_read = true;
 		if (old_events & WL_SOCKET_WRITEABLE)
 			old_filt_write = true;
@@ -1210,7 +1223,10 @@ WaitEventAdjustKqueue(WaitEventSet *set, WaitEvent *event, int old_events)
 	 event);
 	}
 
-	Assert(count > 0);
+	/* For WL_SOCKET_READ -> WL_SOCKET_CLOSED, no change needed. */
+	if (count == 0)
+		return;
+
 	Assert(count <= 2);
 
 	rc = kevent(set->kqueue_fd, &k_ev[0], count, NULL, 0, NULL);
@@ -1525,7 +1541,

Re: fix crash with Python 3.11

2022-01-13 Thread kato-...@fujitsu.com

Hello,

I run vcregeress plcheck on Windows Server 2016 with python 3.11.0a3 and python 
3.9.7 which are installed using installer.
All tests are passed. I'm not familiar with PL/Python but I think it's good.

regards, sho kato

Re: row filtering for logical replication

2022-01-13 Thread Peter Smith

Here are my review comments for v64-0001 (review of updates since v63-0001)

~~~

1. doc/src/sgml/ref/create_publication.sgml - typo?

+   The WHERE clause allows simple expressions that
don't have
+   user-defined functions, operators, non-immutable built-in functions.
+  
+

I think there is a missing "or" after that Oxford comma.

e.g.
BEFORE
"... operators, non-immutable built-in functions."
AFTER
"... operators, or non-immutable built-in functions."

~~

2. commit message - typo

You said that the above text (review comment 1) came from the 0001
commit message, so please make the same fix to the commit message.

~~

3. src/backend/replication/logical/tablesync.c - redundant trailing ";"

+ /* Check for row filters. */
+ resetStringInfo(&cmd);
+ appendStringInfo(&cmd,
+ "SELECT DISTINCT pg_get_expr(pr.prqual, pr.prrelid)"
+ "  FROM pg_publication p"
+ "  LEFT OUTER JOIN pg_publication_rel pr"
+ "   ON (p.oid = pr.prpubid AND pr.prrelid = %u),"
+ "  LATERAL pg_get_publication_tables(p.pubname) GPT"
+ " WHERE GPT.relid = %u"
+ "   AND p.pubname IN ( %s );",
+ lrel->remoteid,
+ lrel->remoteid,
+ pub_names.data);

I think that trailing ";" of the SQL is not needed, and nearby SQL
execution code does not include one so maybe better to remove it for
consistency.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: null iv parameter passed to combo_init()

2022-01-13 Thread Zhihong Yu

On Thu, Jan 13, 2022 at 9:09 PM Zhihong Yu  wrote:

>
>
> On Thu, Jan 13, 2022 at 8:09 PM Noah Misch  wrote:
>
>> On Sun, Jan 09, 2022 at 06:45:09PM -0800, Zhihong Yu wrote:
>> > On Sun, Jan 9, 2022 at 1:27 PM Zhihong Yu  wrote:
>> > > After installing gcc-11, ./configure passed (with
>> 0003-memcpy-null.patch).
>> > > In the output of `make check-world`, I don't see `runtime error`.
>>
>> That's expected.  With -fsanitize-undefined-trap-on-error, the program
>> will
>> generate SIGILL when UBSan detects undefined behavior.  To get "runtime
>> error"
>> messages in the postmaster log, drop -fsanitize-undefined-trap-on-error.
>> Both
>> ways of running the tests have uses.  -fsanitize-undefined-trap-on-error
>> is
>> better when you think the code is clean, because a zero "make check-world"
>> exit status confirms the code is clean.  Once you know the code is
>> unclean in
>> some way, -fsanitize-undefined-trap-on-error is better for getting
>> details.
>>
>> > > Though there was a crash (maybe specific to my machine):
>> > >
>> > > Core was generated by
>> > >
>> `/nfusr/dev-server/zyu/postgres/tmp_install/usr/local/pgsql/bin/postgres
>> > > --singl'.
>> > > Program terminated with signal SIGILL, Illegal instruction.
>> > > #0  0x0050642d in write_item.cold ()
>> > > Missing separate debuginfos, use: debuginfo-install
>> > > glibc-2.17-325.el7_9.x86_64 nss-pam-ldapd-0.8.13-25.el7.x86_64
>> > > sssd-client-1.16.5-10.el7_9.10.x86_64
>> > > (gdb) bt
>> > > #0  0x0050642d in write_item.cold ()
>> > > #1  0x00ba9d1b in write_relcache_init_file ()
>> > > #2  0x00bb58f7 in RelationCacheInitializePhase3 ()
>> > > #3  0x00bd5cb5 in InitPostgres ()
>> > > #4  0x00a0a9ea in PostgresMain ()
>>
>> That is UBSan detecting undefined behavior.  A successful patch version
>> will
>> fix write_item(), among many other places that are currently making
>> check-world fail.  I get the same when testing your v5 under "gcc (Debian
>> 11.2.0-13) 11.2.0".  I used the same host as buildfarm member thorntail,
>> and I
>> configured like this:
>>
>>   ./configure -C --with-lz4 --prefix=$HOME/sw/nopath/pghead
>> --enable-tap-tests --enable-debug --enable-depend --enable-cassert
>> CC='ccache gcc-11 -fsanitize=undefined -fsanitize-undefined-trap-on-error'
>> CFLAGS='-O2 -funwind-tables'
>>
>> > Earlier I was using devtoolset-11 which had an `Illegal instruction`
>> error.
>> >
>> > I compiled / installed gcc-11 from source (which took whole afternoon).
>> > `make check-world` passed with patch v3.
>> > In tmp_install/log/install.log, I saw:
>> >
>> > gcc -Wall -Wmissing-prototypes -Wpointer-arith
>> > -Wdeclaration-after-statement -Werror=vla -Wendif-labels
>> > -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
>> > -Wformat-security -fno-strict-aliasing -fwrapv
>> -fexcess-precision=standard
>> > -Wno-format-truncation -Wno-stringop-truncation -fsanitize=undefined
>> > -fsanitize-undefined-trap-on-error -I../../src/port -DFRONTEND
>> > -I../../src/include  -D_GNU_SOURCE   -c -o path.o path.c
>> > rm -f libpgport.a
>>
>> Perhaps this self-compiled gcc-11 is defective, being unable to detect the
>> instances of undefined behavior that other builds detect.  If so, use the
>> "devtoolset-11" gcc instead.  You're also building without optimization;
>> that
>> might be the problem.
>>
>
> I tried both locally built gcc-11 and devtoolset-11 with configure command
> copied from above.
> `make world` failed in both cases with:
>
> performing post-bootstrap initialization ... sh: line 1: 24714 Illegal
> instruction (core dumped)
> ".../postgres/tmp_install/.../postgres/bin/postgres" --single -F -O -j -c
> search_path=pg_catalog -c exit_on_error=true -c log_checkpoints=false
> template1 > /dev/null
> child process exited with exit code 132
>
> #0  0x0050a8d6 in write_item (data=, len= out>, fp=) at relcache.c:6471
> #1  0x00c33273 in write_relcache_init_file (shared=true) at
> relcache.c:6368
> #2  0x00c33c50 in RelationCacheInitializePhase3 () at
> relcache.c:4220
> #3  0x00c55825 in InitPostgres (in_dbname=,
> dboid=3105442800, username=, useroid=,
> out_dbname=0x0, override_allow_connections=) at
> postinit.c:1014
>
> FYI
>
Hi,
I forgot to mention that patch v5 was included during the experiment.

Cheers

Re: null iv parameter passed to combo_init()

2022-01-13 Thread Zhihong Yu

On Thu, Jan 13, 2022 at 8:09 PM Noah Misch  wrote:

> On Sun, Jan 09, 2022 at 06:45:09PM -0800, Zhihong Yu wrote:
> > On Sun, Jan 9, 2022 at 1:27 PM Zhihong Yu  wrote:
> > > After installing gcc-11, ./configure passed (with
> 0003-memcpy-null.patch).
> > > In the output of `make check-world`, I don't see `runtime error`.
>
> That's expected.  With -fsanitize-undefined-trap-on-error, the program will
> generate SIGILL when UBSan detects undefined behavior.  To get "runtime
> error"
> messages in the postmaster log, drop -fsanitize-undefined-trap-on-error.
> Both
> ways of running the tests have uses.  -fsanitize-undefined-trap-on-error is
> better when you think the code is clean, because a zero "make check-world"
> exit status confirms the code is clean.  Once you know the code is unclean
> in
> some way, -fsanitize-undefined-trap-on-error is better for getting details.
>
> > > Though there was a crash (maybe specific to my machine):
> > >
> > > Core was generated by
> > >
> `/nfusr/dev-server/zyu/postgres/tmp_install/usr/local/pgsql/bin/postgres
> > > --singl'.
> > > Program terminated with signal SIGILL, Illegal instruction.
> > > #0  0x0050642d in write_item.cold ()
> > > Missing separate debuginfos, use: debuginfo-install
> > > glibc-2.17-325.el7_9.x86_64 nss-pam-ldapd-0.8.13-25.el7.x86_64
> > > sssd-client-1.16.5-10.el7_9.10.x86_64
> > > (gdb) bt
> > > #0  0x0050642d in write_item.cold ()
> > > #1  0x00ba9d1b in write_relcache_init_file ()
> > > #2  0x00bb58f7 in RelationCacheInitializePhase3 ()
> > > #3  0x00bd5cb5 in InitPostgres ()
> > > #4  0x00a0a9ea in PostgresMain ()
>
> That is UBSan detecting undefined behavior.  A successful patch version
> will
> fix write_item(), among many other places that are currently making
> check-world fail.  I get the same when testing your v5 under "gcc (Debian
> 11.2.0-13) 11.2.0".  I used the same host as buildfarm member thorntail,
> and I
> configured like this:
>
>   ./configure -C --with-lz4 --prefix=$HOME/sw/nopath/pghead
> --enable-tap-tests --enable-debug --enable-depend --enable-cassert
> CC='ccache gcc-11 -fsanitize=undefined -fsanitize-undefined-trap-on-error'
> CFLAGS='-O2 -funwind-tables'
>
> > Earlier I was using devtoolset-11 which had an `Illegal instruction`
> error.
> >
> > I compiled / installed gcc-11 from source (which took whole afternoon).
> > `make check-world` passed with patch v3.
> > In tmp_install/log/install.log, I saw:
> >
> > gcc -Wall -Wmissing-prototypes -Wpointer-arith
> > -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> > -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
> > -Wformat-security -fno-strict-aliasing -fwrapv
> -fexcess-precision=standard
> > -Wno-format-truncation -Wno-stringop-truncation -fsanitize=undefined
> > -fsanitize-undefined-trap-on-error -I../../src/port -DFRONTEND
> > -I../../src/include  -D_GNU_SOURCE   -c -o path.o path.c
> > rm -f libpgport.a
>
> Perhaps this self-compiled gcc-11 is defective, being unable to detect the
> instances of undefined behavior that other builds detect.  If so, use the
> "devtoolset-11" gcc instead.  You're also building without optimization;
> that
> might be the problem.
>

I tried both locally built gcc-11 and devtoolset-11 with configure command
copied from above.
`make world` failed in both cases with:

performing post-bootstrap initialization ... sh: line 1: 24714 Illegal
instruction (core dumped)
".../postgres/tmp_install/.../postgres/bin/postgres" --single -F -O -j -c
search_path=pg_catalog -c exit_on_error=true -c log_checkpoints=false
template1 > /dev/null
child process exited with exit code 132

#0  0x0050a8d6 in write_item (data=, len=, fp=) at relcache.c:6471
#1  0x00c33273 in write_relcache_init_file (shared=true) at
relcache.c:6368
#2  0x00c33c50 in RelationCacheInitializePhase3 () at
relcache.c:4220
#3  0x00c55825 in InitPostgres (in_dbname=,
dboid=3105442800, username=, useroid=,
out_dbname=0x0, override_allow_connections=) at
postinit.c:1014

FYI

Re: Consistently use the function name CreateCheckPoint instead of CreateCheckpoint in code comments

2022-01-13 Thread Robert Treat

On Thu, Jan 13, 2022 at 10:25 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
> The function CreateCheckPoint is specified as CreateCheckpoint in some
> of the code comments whereas in other places it is correctly
> mentioned. Attaching a tiny patch to use CreateCheckPoint consistently
> across code comments.
>

Heh, that's interesting, as I would have said that CreateCheckpoint is
the right casing vs CreateCheckPoint, but it looks like it has always
been the other way (according to
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f0e37a85319e6c113ecd3303cddeb6edd5a6ac44).

Robert Treat
https://xzilla.net

Re: null iv parameter passed to combo_init()

2022-01-13 Thread Noah Misch

On Sun, Jan 09, 2022 at 06:45:09PM -0800, Zhihong Yu wrote:
> On Sun, Jan 9, 2022 at 1:27 PM Zhihong Yu  wrote:
> > After installing gcc-11, ./configure passed (with 0003-memcpy-null.patch).
> > In the output of `make check-world`, I don't see `runtime error`.

That's expected.  With -fsanitize-undefined-trap-on-error, the program will
generate SIGILL when UBSan detects undefined behavior.  To get "runtime error"
messages in the postmaster log, drop -fsanitize-undefined-trap-on-error.  Both
ways of running the tests have uses.  -fsanitize-undefined-trap-on-error is
better when you think the code is clean, because a zero "make check-world"
exit status confirms the code is clean.  Once you know the code is unclean in
some way, -fsanitize-undefined-trap-on-error is better for getting details.

> > Though there was a crash (maybe specific to my machine):
> >
> > Core was generated by
> > `/nfusr/dev-server/zyu/postgres/tmp_install/usr/local/pgsql/bin/postgres
> > --singl'.
> > Program terminated with signal SIGILL, Illegal instruction.
> > #0  0x0050642d in write_item.cold ()
> > Missing separate debuginfos, use: debuginfo-install
> > glibc-2.17-325.el7_9.x86_64 nss-pam-ldapd-0.8.13-25.el7.x86_64
> > sssd-client-1.16.5-10.el7_9.10.x86_64
> > (gdb) bt
> > #0  0x0050642d in write_item.cold ()
> > #1  0x00ba9d1b in write_relcache_init_file ()
> > #2  0x00bb58f7 in RelationCacheInitializePhase3 ()
> > #3  0x00bd5cb5 in InitPostgres ()
> > #4  0x00a0a9ea in PostgresMain ()

That is UBSan detecting undefined behavior.  A successful patch version will
fix write_item(), among many other places that are currently making
check-world fail.  I get the same when testing your v5 under "gcc (Debian
11.2.0-13) 11.2.0".  I used the same host as buildfarm member thorntail, and I
configured like this:

  ./configure -C --with-lz4 --prefix=$HOME/sw/nopath/pghead --enable-tap-tests 
--enable-debug --enable-depend --enable-cassert CC='ccache gcc-11 
-fsanitize=undefined -fsanitize-undefined-trap-on-error' CFLAGS='-O2 
-funwind-tables'

> Earlier I was using devtoolset-11 which had an `Illegal instruction` error.
> 
> I compiled / installed gcc-11 from source (which took whole afternoon).
> `make check-world` passed with patch v3.
> In tmp_install/log/install.log, I saw:
> 
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
> -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
> -Wno-format-truncation -Wno-stringop-truncation -fsanitize=undefined
> -fsanitize-undefined-trap-on-error -I../../src/port -DFRONTEND
> -I../../src/include  -D_GNU_SOURCE   -c -o path.o path.c
> rm -f libpgport.a

Perhaps this self-compiled gcc-11 is defective, being unable to detect the
instances of undefined behavior that other builds detect.  If so, use the
"devtoolset-11" gcc instead.  You're also building without optimization; that
might be the problem.

Re: Add client connection check during the execution of the query

2022-01-13 Thread Andres Freund

Hi,

On 2022-01-11 22:59:13 +1300, Thomas Munro wrote:
> I considered another idea we discussed: if we see a latch event, clear
> it and try again so that other events can be revealed (rince and
> repeat), but remember if that happens and set the latch at the end.  I
> think that still requires PG_FINALLY() if you want to guarantee not to
> eat a latch event if WaitEventSetWait() throws.  This may be a
> theoretical point because things must be pretty broken if
> WaitEventSetWait() is throwing, but I don't like an egregious lack of
> exception safety on principle.

I don't think this is a problem. Not because of WaitEventSetWait() never
throwing, but because it's "just fine" to have reset the latch in that case.

The error will cause control flow to transfer to the next PG_CATCH site. The
point of latches is to avoid racy checks for events (or sleeps with short
timeouts to handle the races). The documented pattern is:

 * for (;;)
 * {
 * ResetLatch();
 * if (work to do)
 * Do Stuff();
 * WaitLatch();
 * }

Latches only work if there's a very limited amount of things happening between
the if (work_to_do) and the WaitLatch().

It definitely is *NOT* be ok to do something like:

for (;;)
{
ResetLatch()
if (work_to_do)
  DoStuff();

PG_TRY():
   something_that_may_throw();
PG_CATCH():
   something_not_throwing();

WaitLatch();
}

For one, elog.c related code might have actually done network IO! During which
the latch very well might be reset. So there's just no way any remotely
reasonable code can rely on preserving latch state across errors.

> I considered another idea we discussed: if we see a latch event, clear
> it and try again so that other events can be revealed (rince and
> repeat), but remember if that happens and set the latch at the end.

The more I think about it, the less I see why we *ever* need to re-arm the
latch in pq_check_connection() in this approach. pq_check_connection() is only
used from from ProcessInterrupts(), and there's plenty things inside
ProcessInterrupts() that can cause latches to be reset (e.g. parallel message
processing causing log messages to be sent to the client, causing network IO,
which obviously can do a latch reset).

It makes sense to use CFI() in a latch loop, but it would have to be at the
bottom or top, not between if (work_to_do) and WaitLatch().

Greetings,

Andres Freund

Consistently use the function name CreateCheckPoint instead of CreateCheckpoint in code comments

2022-01-13 Thread Bharath Rupireddy

Hi,

The function CreateCheckPoint is specified as CreateCheckpoint in some
of the code comments whereas in other places it is correctly
mentioned. Attaching a tiny patch to use CreateCheckPoint consistently
across code comments.

Regards,
Bharath Rupireddy.


v1-0001-Consistently-use-the-function-name-CreateCheckPoi.patch
Description: Binary data

Re: small development tip: Consider using the gold linker

2022-01-13 Thread Peter Geoghegan

On Fri, Jul 20, 2018 at 8:16 PM Peter Geoghegan  wrote:
> On Mon, Jul 9, 2018 at 4:40 PM, Andres Freund  wrote:
> > FWIW, I've lately noticed that I spend a fair time waiting for the
> > linker during edit-compile-test cycles.  Due to an independent issue I
> > just used the gold linker, and the speedup is quite noticable.
> > For me just adding '-fuse-ld=gold' to CFLAGS works.
>
> I tried this out today. It makes quite a noticeable difference for me.
> Thanks for the tip.

The 2022 version of the same tip: switching over to LLVM lld seems to
offer a further significant speedup over gold. Not surprising, since
lld is specifically promoted as a linker that is faster than gold [1].
Again, this appears to just be a case of installing lld, and adding
'-fuse-ld=lld' to CFLAGS -- a very easy switch to make. Link time is
still a noticeable contributor to overall build time for me, even with
gold (I use ccache, of course).

There is another linker called mold [2]. It claims to be faster than
lld (which was faster than gold, which was faster than ld). I didn't
bother trying it out.

[1] https://llvm.org/devmtg/2017-10/slides/Ueyama-lld.pdf - slide 14
[2] https://github.com/rui314/mold
-- 
Peter Geoghegan

Re: Refactoring of compression options in pg_basebackup

2022-01-13 Thread Michael Paquier

On Thu, Jan 13, 2022 at 01:36:00PM -0500, Robert Haas wrote:
> 1. If, as you propose, we add a new flag --compression-method=METHOD
> then how will the user specify server-side compression?

This would require a completely different option switch, which is
basically the same thing as what you are suggesting with
--server-compress.

> 2. If, as we seem to agree, the compression method is more important
> than the compression level, then why is the option to set the
> less-important thing called just --compress, and the option to set the
> more important thing has a longer name?

I agree that the method is more important than the level for most
users, and I would not mind dropping completely --compress in favor of
something else, which is something I implied upthread.

> I proposed to solve both of these problems by using
> --compression-level=NUMBER to set the compression level and
> --compress=METHOD or --server-compress=METHOD to set the algorithm and
> specify on which side it is to be applied. If, instead of doing that,
> we go with what you have proposed here, then I don't understand how to
> fit server-side compression into the framework in a reasonably concise
> way. I think we would end up with something like pg_basebackup
> --compression-method=lz4 --compress-on-server, which seems rather long
> and awkward. Do you have a better idea?

Using --compression-level=NUMBER and --server-compress=METHOD to
specify a server-side compression method with a level is fine by me,
but I find the reuse of --compress to specify a compression method 
confusing as it maps with the past option we have kept in
pg_basebackup for a couple of years now.  Based on your suggested set
of options, we could then have a --client-compress=METHOD and
--compression-level=NUMBER to specify a client-side compression method
with a level.  If we do that, I guess that we should then:
1) Block the combination of --server-compress and --client-compress.
2) Remove the existing -Z/--compress and -z/--gzip.

You have implied 1) upthread as far as I recall, 2) is something I am
adding on top of it.

> I think I understand what the patch is doing. I just think it creates
> a problem for my patch. And I'd like to know whether you have an idea
> how to solve that problem. And if not, then I'd like you to consider
> the solution that I am proposing rather than the patch you've already
> got.

I am fine to drop this thread's patch with its set of options and work
on top of your proposal, aka what's drafted two paragraphs above.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2022-01-13 Thread Julien Rouhaud

Hi,

On Thu, Jan 13, 2022 at 06:44:12PM -0800, Andres Freund wrote:
> 
> The point is that we need the check for WAL writes / xid assignments / etc
> *either* way. There are ways extensions could trigger problems like e.g. xid
> assigned, besides bgworker doing stuff. Or postgres components doing so
> unintentionally.
> 
> Erroring out in situation where we *know* that there were concurrent changes
> unacceptable during pg_upgrade is always the right call. Such errors can be
> debugged and then addressed (removing the extension from s_p_l, fixing the
> extension, etc).
> 
> In contrast to that, preventing upgrades from succeeding because an extension
> has a dependency on bgworkers working, just because the bgworker *could* be
> doing something bad is different. The bgworker might never write, have a check
> for binary upgrade mode, etc. It may not be realistic to fix and extension to
> work without the bgworkers.
> 
> Imagine something like an table access method that has IO workers or such.

IIUC if a table access method has IO workers that starts doing writes quickly
(or any similar extension that *is* required to be present during upgrade but
that should be partially disabled), the only way to do a pg_upgrade would be to
make sure that the extension explicitly checks for the binary-upgrade mode and
don't do any writes, or provide a GUC for the same, since it should still
preloaded?  I'm fine with that, but that should probably be documented.
> 
> 
> > Andres, do you still have an objection with either preventing bgworker
> > registration/launch or WAL-write during the impacted pg_upgrade steps, or a
> > better alternative to fix the problem?
> 
> I still object to the approach of preventing bgworker registration. It doesn't
> provide much protection and might cause hard to address problems for some
> extensions.
> 
> I don't think I ever objected to preventing WAL-writes, I even proposed that
> upthread? Unless you suggest doing it specifically in bgworkers, rather than
> preventing similar problems outside bgworkers as well.

Sorry I missed that when re-reading the thread.  And no I'm not suggesting
preventing WAL writes in bgworkers only.

Since there's a clear consensus on how to fix the problem, I'm switching the
patch as Waiting on Author.

Re: [PATCH] Disable bgworkers during servers start in pg_upgrade

2022-01-13 Thread Andres Freund

Hi,

On 2022-01-12 17:54:31 +0800, Julien Rouhaud wrote:
> On Sat, Aug 28, 2021 at 10:40:42AM +0900, Michael Paquier wrote:
> > On Fri, Aug 27, 2021 at 12:28:42PM -0700, Andres Freund wrote:
> > > Isn't that just going to end up with extension code erroring out and/or
> > > blocking waiting for a bgworker to start?
> >
> > Well, that's the point to block things during an upgrade.  Do you have
> > a list of requirements you'd like to see satisfied here?  POWA would
> > be fine with blocking the execution of bgworkers AFAIK (Julien feel
> > free to correct me here if necessary).  It could be possible that
> > preventing extension code to execute this way could prevent hazards as
> > well.  The idea from upthread to prevent any writes and/or WAL
> > activity is not really different as extensions may still generate an
> > error because of pg_upgrade's safety measures we'd put in, no?

The point is that we need the check for WAL writes / xid assignments / etc
*either* way. There are ways extensions could trigger problems like e.g. xid
assigned, besides bgworker doing stuff. Or postgres components doing so
unintentionally.

Erroring out in situation where we *know* that there were concurrent changes
unacceptable during pg_upgrade is always the right call. Such errors can be
debugged and then addressed (removing the extension from s_p_l, fixing the
extension, etc).

In contrast to that, preventing upgrades from succeeding because an extension
has a dependency on bgworkers working, just because the bgworker *could* be
doing something bad is different. The bgworker might never write, have a check
for binary upgrade mode, etc. It may not be realistic to fix and extension to
work without the bgworkers.

Imagine something like an table access method that has IO workers or such.

> Andres, do you still have an objection with either preventing bgworker
> registration/launch or WAL-write during the impacted pg_upgrade steps, or a
> better alternative to fix the problem?

I still object to the approach of preventing bgworker registration. It doesn't
provide much protection and might cause hard to address problems for some
extensions.

I don't think I ever objected to preventing WAL-writes, I even proposed that
upthread? Unless you suggest doing it specifically in bgworkers, rather than
preventing similar problems outside bgworkers as well.

Greetings,

Andres Freund

Re: Schema variables - new implementation for Postgres 15

2022-01-13 Thread Julien Rouhaud

Hi,

On Thu, Jan 13, 2022 at 07:32:26PM +0100, Pavel Stehule wrote:
> čt 13. 1. 2022 v 19:23 odesílatel Dean Rasheed 
> napsal:
> 
> > On Thu, 13 Jan 2022 at 17:42, Pavel Stehule 
> > wrote:
> > >
> > > I like the idea of prioritizing tables over variables with warnings when
> > collision is detected. It cannot break anything. And it allows to using
> > short identifiers when there is not collision.
> >
> > Yeah, that seems OK, as long as it's clearly documented. I don't think
> > a warning is necessary.

What should be the behavior for a cached plan that uses a variable when a
conflicting relation is later created?  I think that it should be the same as a
search_path change and the plan should be discarded.

> The warning can be disabled by default, but I think it should be there.
> This is a signal, so some in the database schema should be renamed. Maybe -
> session_variables_ambiguity_warning.

I agree that having a way to know that a variable has been bypassed can be
useful.

> > (FWIW, testing with dbfiddle, that appears to match Db2's behaviour).
> >
> 
> Thank you for check

Do you know what's oracle's behavior on that?

I've been looking at the various dependency handling, and I noticed that
collation are ignored, while they're accepted syntax-wise:

=# create collation mycollation (locale = 'fr-FR', provider = 'icu');
CREATE COLLATION

=# create variable myvariable text collate mycollation;
CREATE VARIABLE

=# select classid::regclass, objid, objsubid, refclassid::regclass, refobjid, 
refobjsubid from pg_depend where classid::regclass::text = 'pg_variable' or 
refclassid::regclass::text = 'pg_variable';
   classid   | objid | objsubid |  refclassid  | refobjid | refobjsubid
-+---+--+--+--+-
 pg_variable | 16407 |0 | pg_namespace | 2200 |   0
(1 row)

=# let myvariable = 'AA';
LET

=# select 'AA' collate "en-x-icu" < myvariable;
 ?column?
--
 f
(1 row)

=# select 'AA' collate "en-x-icu" < myvariable collate mycollation;
ERROR:  42P21: collation mismatch between explicit collations "en-x-icu" and 
"mycollation"
LINE 1: select 'AA' collate "en-x-icu" < myvariable collate mycollat...

So it's missing both dependency recording for variable's collation and also
teaching various code that variables can have a collation.

It's also missing some invalidation detection.  For instance:

=# create variable myval text;
CREATE VARIABLE

=# let myval = 'pg_class';
LET

=# prepare s(text) as select relname from pg_class where relname = $1 or 
relname = myval;
PREPARE

=# set plan_cache_mode = force_generic_plan ;
SET

=# execute s ('');
 relname
--
 pg_class
(1 row)

=# drop variable myval ;
DROP VARIABLE

=# create variable myval int;
CREATE VARIABLE

=# execute s ('');
ERROR:  XX000: cache lookup failed for session variable 16408

The plan should have been discarded and the new plan should fail for type
problem.

Strangely, subsequent calls don't error out:

=# execute s('');
 relname
-
(0 rows)

But doing an explain shows that there's a problem:

=# explain execute s('');
ERROR:  XX000: cache lookup failed for variable 16408

Re: In-placre persistance change of a relation

2022-01-13 Thread Kyotaro Horiguchi

I found a bug.

mdmarkexists() didn't close the tentatively opend fd. This is a silent
leak on Linux and similars and it causes delete failure on Windows.
It was the reason of the CI failure.

027_persistence_change.pl uses interactive_psql() that doesn't work on
the Windos VM on the CI.

In this version the following changes have been made in 0001.

- Properly close file descriptor in mdmarkexists.

- Skip some tests when IO::Pty is not available.
  It might be better to separate that part.

Looking again the ALTER TABLE ALL IN TABLESPACE SET LOGGED patch, I
noticed that it doesn't implement OWNED BY part and doesn't have test
and documenttaion (it was PoC). Added all of them to 0002.

At Tue, 11 Jan 2022 09:33:55 +, Jakub Wartak  
wrote in 
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:not tested
> 
> I've retested v15 of the patch with everything that came to my mind. The 
> patch passes all my tests (well, there's this just windows / cfbot issue). 
> Patch looks good to me. I haven't looked in-depth at the code, so patch might 
> still need review.

Thanks for checking.

> FYI, about potential usage of this patch: the most advanced test that I did 
> was continually bouncing wal_level - it works. So chain of :
> 1. wal_level=replica->minimal
> 2. alter table set unlogged and load a lot of data, set logged
> 3. wal_level=minimal->replica
> 4. barman incremental backup # rsync(1) just backups changed files in steps 2 
> and 3 (not whole DB)
> 5. some other (logged) work
> The idea is that when performing mass-alterations to the DB (think nightly 
> ETL/ELT on TB-sized DBs), one could skip backing up most of DB and then just 
> quickly backup only the changed files - during the maintenance window - e.g. 
> thanks to local-rsync barman mode. This is the output of barman show-backups 
> after loading data to unlogged table each such cycle:
> mydb 20220110T100236 - Mon Jan 10 10:05:14 2022 - Size: 144.1 GiB - WAL Size: 
> 16.0 KiB
> mydb 20220110T094905 - Mon Jan 10 09:50:12 2022 - Size: 128.5 GiB - WAL Size: 
> 80.2 KiB
> mydb 20220110T094016 - Mon Jan 10 09:40:17 2022 - Size: 109.1 GiB - WAL Size: 
> 496.3 KiB
> And dedupe ratio of the last one: Backup size: 144.1 GiB. Actual size on 
> disk: 36.1 GiB (-74.96% deduplication ratio).  

Ah, The patch skips duping relation files. This is advantageous that
that not only eliminates the I/O activities the duping causes but also
reduce the size of incremental backup.  I didn't noticed only the
latter advantage.

> The only thing I've found out that bouncing wal_level also forces 
> max_wal_senders=X -> 0 -> X which in turn requires dropping replication slot 
> for pg_receievewal (e.g. barman receive-wal 
> --create-slot/--drop-slot/--reset). I have tested the restore using barman 
> recover afterwards to backup 20220110T094905  and indeed it worked OK using 
> this patch too.

Year, it is irrelevant to this patch but I'm annoyed by the
restriction.  I think it would be okay that max_wal_senders is
forcibly set to 0 while wal_level=minimal..

> The new status of this patch is: Needs review

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From d6bf0bd0d60391b24d5be7942b546acfffa3d7b1 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 11 Nov 2020 21:51:11 +0900
Subject: [PATCH v16 1/2] In-place table persistence change

Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data
rewriting, currently it runs heap rewrite which causes large amount of
file I/O.  This patch makes the command run without heap rewrite.
Addition to that, SET LOGGED while wal_level > minimal emits WAL using
XLOG_FPI instead of massive number of HEAP_INSERT's, which should be
smaller.

Also this allows for the cleanup of files left behind in the crash of
the transaction that created it.
---
 src/backend/access/rmgrdesc/smgrdesc.c|  52 ++
 src/backend/access/transam/README |   8 +
 src/backend/access/transam/xact.c |   7 +
 src/backend/access/transam/xlog.c |  17 +
 src/backend/catalog/storage.c | 545 +-
 src/backend/commands/tablecmds.c  | 266 +++--
 src/backend/replication/basebackup.c  |   3 +-
 src/backend/storage/buffer/bufmgr.c   |  88 +++
 src/backend/storage/file/fd.c |   4 +-
 src/backend/storage/file/reinit.c | 344 +++
 src/backend/storage/smgr/md.c |  94 ++-
 src/backend/storage/smgr/smgr.c   |  32 +
 src/backend/storage/sync/sync.c   |  20 +-
 src/bin/pg_rewind/parsexlog.c |  24 +
 src/common/relpath.c  |  47 +-
 src/include/catalog/storage.h |   3 +
 src/include/catalog/storage_xlog.h|  42 +-
 src/include/common/r

Re: Skipping logical replication transactions on subscriber side

2022-01-13 Thread Masahiko Sawada

On Thu, Jan 13, 2022 at 10:07 AM tanghy.f...@fujitsu.com
 wrote:
>
> On Wed, Jan 12, 2022 2:02 PM Masahiko Sawada  wrote:
> >
> > I've attached an updated patch that incorporated all comments I got so far.
> >
>
> Thanks for updating the patch. Here are some comments:

Thank you for the comments!

>
> 1)
> +  Skip applying changes of the particular transaction.  If incoming data
>
> Should "Skip" be "Skips" ?
>
> 2)
> +  prepared by enabling two_phase on susbscriber.  
> After h
> +  the logical replication successfully skips the transaction, the 
> transaction
>
> The "h" after word "After" seems redundant.
>
> 3)
> +   Skipping the whole transaction includes skipping the cahnge that may not 
> violate
>
> "cahnge" should be "changes" I think.
>
> 4)
> +/*
> + * True if we are skipping all data modification changes (INSERT, UPDATE, 
> etc.) of
> + * the specified transaction at MySubscription->skipxid.  Once we start 
> skipping
> ...
> + */
> +static TransactionId skipping_xid = InvalidTransactionId;
> +#define is_skipping_changes() (TransactionIdIsValid(skipping_xid))
>
> Maybe we should modify this comment. Something like:
> skipping_xid is valid if we are skipping all data modification changes ...
>
> 5)
> +   if (!superuser())
> +   ereport(ERROR,
> +   
> (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> +errmsg("must 
> be superuser to set %s", "skip_xid")));
>
> Should we change the message to "must be superuser to skip xid"?
> Because the SQL stmt is "ALTER SUBSCRIPTION ... SKIP (xid = XXX)".

I agree with all the comments above. These are incorporated into the
latest v4 patch I've just submitted[1].

Regards,

[1] 
postgresql.org/message-id/CAD21AoBZC87nY1pCaexk1uBA68JSBmy2-UqLGirT9g-RVMhjKw%40mail.gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Skipping logical replication transactions on subscriber side

2022-01-13 Thread Masahiko Sawada

On Wed, Jan 12, 2022 at 11:10 PM vignesh C  wrote:
>
> On Wed, Jan 12, 2022 at 11:32 AM Masahiko Sawada  
> wrote:
> >
> > On Wed, Jan 12, 2022 at 12:21 PM Amit Kapila  
> > wrote:
> > >
> > > On Wed, Jan 12, 2022 at 5:49 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Tue, Jan 11, 2022 at 7:08 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Tue, Jan 11, 2022 at 1:51 PM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> > > > > > On second thought, the same is true for other cases, for example,
> > > > > > preparing the transaction and clearing skip_xid while handling a
> > > > > > prepare message. That is, currently we don't clear skip_xid while
> > > > > > handling a prepare message but do that while handling 
> > > > > > commit/rollback
> > > > > > prepared message, in order to avoid the worst case. If we do both
> > > > > > while handling a prepare message and the server crashes between 
> > > > > > them,
> > > > > > it ends up that skip_xid is cleared and the transaction will be
> > > > > > resent, which is identical to the worst-case above.
> > > > > >
> > > > >
> > > > > How are you thinking to update the skip xid before prepare? If we do
> > > > > it in the same transaction then the changes in the catalog will be
> > > > > part of the prepared xact but won't be committed. Now, say if we do it
> > > > > after prepare, then the situation won't be the same because after
> > > > > restart the same xact won't appear again.
> > > >
> > > > I was thinking to commit the catalog change first in a separate
> > > > transaction while not updating origin LSN and then prepare an empty
> > > > transaction while updating origin LSN.
> > > >
> > >
> > > But, won't it complicate the handling if in the future we try to
> > > enhance this API such that it skips partial changes like skipping only
> > > for particular relation(s) or particular operations as discussed
> > > previously in this thread?
> >
> > Right. I was thinking that if we accept the situation that the user
> > has to set skip_xid again in case of the server crashes, we might be
> > able to accept also the situation that the user has to clear skip_xid
> > in a case of the server crashes. But it seems the former is less
> > problematic.
> >
> > I've attached an updated patch that incorporated all comments I got so far.
>
> Thanks for the updated patch, few comments:

Thank you for the comments!

> 1) Currently skip xid is not displayed in describe subscriptions, can
> we include it too:
> \dRs+  sub1
> List of subscriptions
>  Name |  Owner  | Enabled | Publication | Binary | Streaming | Two
> phase commit | Synchronous commit |Conninfo
> --+-+-+-++---+--++
>  sub1 | vignesh | t   | {pub1}  | f  | f | e
>  | off| dbname=postgres host=localhost
> (1 row)
>
> 2) This import "use PostgreSQL::Test::Utils;" is not required:
> +# Tests for skipping logical replication transactions.
> +use strict;
> +use warnings;
> +use PostgreSQL::Test::Cluster;
> +use PostgreSQL::Test::Utils;
> +use Test::More tests => 6;
>
> 3) Some of the comments uses a punctuation mark and some of them does
> not use, Should we keep it consistent:
> +# Wait for worker error
> +$node_subscriber->poll_query_until(
> +   'postgres',
>
> +# Set skip xid
> +$node_subscriber->safe_psql(
> +   'postgres',
>
> +# Create publisher node.
> +my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
> +$node_publisher->init(allows_streaming => 'logical');
>
>
> +# Create subscriber node.
> +my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
>
> 4) Should this be changed:
> + * True if we are skipping all data modification changes (INSERT,
> UPDATE, etc.) of
> + * the specified transaction at MySubscription->skipxid.  Once we
> start skipping
> + * changes, we don't stop it until the we skip all changes of the
> transaction even
> + * if pg_subscription is updated that and MySubscription->skipxid
> gets changed or
> to:
> + * True if we are skipping all data modification changes (INSERT,
> UPDATE, etc.) of
> + * the specified transaction at MySubscription->skipxid.  Once we
> start skipping
> + * changes, we don't stop it until we skip all changes of the transaction 
> even
> + * if pg_subscription is updated that and MySubscription->skipxid
> gets changed or
>
> In "stop it until the we skip all changes", here the is not required.
>

I agree with all the comments above. I've attached an updated patch.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v4-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
Description: Binary data

Re: row filtering for logical replication

2022-01-13 Thread Amit Kapila

On Fri, Jan 14, 2022 at 5:48 AM Peter Smith  wrote:
>
> On Thu, Jan 13, 2022 at 5:49 PM Peter Smith  wrote:
> >
> > 4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter 
> > comments
> >
> > +
> > +/*
> > + * Change is checked against the row filter, if any.
> > + *
> > + * If it returns true, the change is replicated, otherwise, it is not.
> > + *
> > + * FOR INSERT: evaluates the row filter for new tuple.
> > + * FOR DELETE: evaluates the row filter for old tuple.
> > + * For UPDATE: evaluates the row filter for old and new tuple. If both
> > + * evaluations are true, it sends the UPDATE. If both evaluations are 
> > false, it
> > + * doesn't send the UPDATE. If only one of the tuples matches the row 
> > filter
> > + * expression, there is a data consistency issue. Fixing this issue 
> > requires a
> > + * transformation.
> > + *
> > + * Transformations:
> > + * Updates are transformed to inserts and deletes based on the
> > + * old tuple and new tuple. The new action is updated in the
> > + * action parameter. If not updated, action remains as update.
> > + *
> > + * Case 1: old-row (no match)new-row (no match)  -> (drop change)
> > + * Case 2: old-row (no match)new row (match) -> INSERT
> > + * Case 3: old-row (match)   new-row (no match)  -> DELETE
> > + * Case 4: old-row (match)   new row (match) -> UPDATE
> > + *
> > + * If the change is to be replicated this function returns true, else 
> > false.
> > + *
> > + * Examples:
> >
> > The function header comment says the same thing 2x about the return values.
> >
> > The 1st text "If it returns true, the change is replicated, otherwise,
> > it is not." should be replaced by the better wording of the 2nd text
> > ("If the change is to be replicated this function returns true, else
> > false."). Then, that 2nd text can be removed (from where it is later
> > in this same comment).
>
> Hi Hou-san, thanks for all the v64 updates!
>
> I think the above comment was only partly fixed.
>
> The v64-0001 comment still says:
> + * If it returns true, the change is replicated, otherwise, it is not.
>
...
...
>
> But maybe it is best to rearrange the whole thing like:
> "Returns true if the change is to be replicated, else false."
>

+1 to change as per this suggestion.

-- 
With Regards,
Amit Kapila.

minor bug in sort_inner_and_outer()

2022-01-13 Thread Tomas Vondra

Hi,

While looking at sort_inner_and_outer() I was rather confused what is
stored in all_pathkeys, because the code does this:

  List *all_pathkeys;

  ...

  all_pathkeys = select_outer_pathkeys_for_merge(root,
 extra->mergeclause_list,
 joinrel);

  foreach(l, all_pathkeys)
  {
  List *front_pathkey = (List *) lfirst(l);
  ...

  /* Make a pathkey list with this guy first */
  if (l != list_head(all_pathkeys))
  outerkeys = lcons(front_pathkey,
...);
  else
  ...

which seems to suggest all_pathkeys is a list of lists, because why else
would front_pathkey be a (List *). But that doesn't seem to be the case,
front_pathkey is actually a PathKey, not a List, as demonstrated by gdb:

(gdb) p *front_pathkey
$2 = {type = T_PathKey, length = 0, ...}

Maybe it's some clever list-fu that I can't comprehend, but I guess it's
a bug present since ~2004. It's benign because we only ever pass the
front_pathkey to lcons() which does not really care.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index f96fc9fd282..81e04b6ac9d 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -1258,7 +1258,7 @@ sort_inner_and_outer(PlannerInfo *root,
 
 	foreach(l, all_pathkeys)
 	{
-		List	   *front_pathkey = (List *) lfirst(l);
+		PathKey	   *front_pathkey = lfirst_node(PathKey, l);
 		List	   *cur_mergeclauses;
 		List	   *outerkeys;
 		List	   *innerkeys;

Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

2022-01-13 Thread Tomas Vondra

On 1/13/22 21:12, Arne Roland wrote:
>  Hi!
> 
>> From: Tomas Vondra 
>> Subject: Re: PATCH: generate fractional cheapest paths in
> generate_orderedappend_path
>>  
>> test-# SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2)
>> ORDER BY id1 ASC, id2 ASC LIMIT 10;
>>    QUERY PLAN
>>
>>
> --
>>   Limit
>> ->  Merge Left Join
>>   Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2))
>>   ->  Append
>> ->  Index Only Scan using fract_t0_id1_id2_idx on
>>   fract_t0 x_1
>> ->  Incremental Sort
>>   Sort Key: x_2.id1, x_2.id2
>>   Presorted Key: x_2.id1
>>   ->  Index Scan using fract_t1_pkey on fract_t1 x_2
>>   ->  Materialize
>> ->  Append
>>   ->  Incremental Sort
>> Sort Key: y_1.id1, y_1.id2
>> Presorted Key: y_1.id1
>> ->  Index Scan using fract_t0_pkey on
>>  fract_t0 y_1
>>   Index Cond: (id1 = id1)
>>   Filter: (id2 = id2)
>>   ->  Incremental Sort
>> Sort Key: y_2.id1, y_2.id2
>> Presorted Key: y_2.id1
>> ->  Index Scan using fract_t1_pkey on
>>  fract_t1 y_2
>>   Index Cond: (id1 = id1)
>>   Filter: (id2 = id2)
>> (23 rows)
>> [...]
>> So that seems reasonable
> 
> Maybe I'm just slow, but that doesn't seem reasonable to me. It doesn't
> look like a valid plan to me. Sure all the nodes are arranged like I'd
> like them to be. But what are the id1/id2 bound we have in the index and
> filter conditions?
> 

I'm not claiming the plan is 100% correct, and you may have a point
about the index condition / filter in the materialize branch.

But the overall plan shape (with incremental sort nodes on both sides)
seems reasonable to me. The materialize node is expected (incremental
sort does not support rescans cheaply).

Maybe that's not any cheaper than just doing merge join on the first
column, and filter on the second. But we should be able to decide that
based on cost, I think.

>> [...]but there's a couple issues too:
>>
>> 1) Planning works (hence we can run explain), but execution fails
>> because of segfault in CheckVarSlotCompatibility - it gets NULL slot for
>> some reason. I haven't looked into the details, but I'd guess we need to
>> pass a different rel to create_incrementalsort_path, not childrel.
> 
> In case my above concern is valid, maybe the executor is just as
> confused as I am. Such conditions should generate VarSlots, no? If so,
> where should they come from?
> 

Yeah, something like that.

> Sadly I don't have time to debug that in depth today.
> 
>> 2) enable_partitionwisejoin=on seems to cause some confusion, because it
>> results in picking a different plan with higher cost. But that's clearly
>> not because of this patch.
> 
> Short version: I'd neither conceptually expect costs to be lower in any
> case, nor would that be desirable, because our estimates aren't perfect.
> 
> Long version: What do you mean by confusion. The plan I get with the
> patch doesn't seem to confusing to me. Generally something like that is
> to be expected. enable_partitionwisejoin changes the way this planing
> works by rewriting the entire query effectively rule based. So we end up
> with a completely different query. I'd in particular expect slightly
> different startup costs.
> So if we activate this we consider completely different plans, I
> struggle to come up with a meaningful example where there is any overlap
> at all. Thus it doesn't surprise me conceptually.
> From practical experience I'd say: If they are about the same plan, the
> costs estimates work somewhat okish.
> If we change the way we compose the nodes together, we sometimes end up
> with radical different costs for doing the same. While I suspect there
> are a lot of corner cases causing this, I've seen quite a few where this
> was due to the fact, that our planer often has insignificant statistics
> to know something and takes a guess. This has gotten way better of
> recent years, but it's in particular for non-trivial joins still a
> problem in practice.
> 

By confusion I meant that if you plan the query with partitionwise join
enabled, you get a plan with cost X, and if you disable it you get a
different plan with cost Y, where (Y < X). Which is somewhat unexpected,
because that seems to simply reduce the set of plans we explore.

But as you say, enable_partitionwise_join kinda "switches" between two
planning modes.

Re: row filtering for logical replication

2022-01-13 Thread Peter Smith

On Thu, Jan 13, 2022 at 5:49 PM Peter Smith  wrote:
>
> Thanks for posting the merged v63.
>
> Here are my review comments for the v63-0001 changes.
>
...
> ~~~
>
> 4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter comments
>
> +
> +/*
> + * Change is checked against the row filter, if any.
> + *
> + * If it returns true, the change is replicated, otherwise, it is not.
> + *
> + * FOR INSERT: evaluates the row filter for new tuple.
> + * FOR DELETE: evaluates the row filter for old tuple.
> + * For UPDATE: evaluates the row filter for old and new tuple. If both
> + * evaluations are true, it sends the UPDATE. If both evaluations are false, 
> it
> + * doesn't send the UPDATE. If only one of the tuples matches the row filter
> + * expression, there is a data consistency issue. Fixing this issue requires 
> a
> + * transformation.
> + *
> + * Transformations:
> + * Updates are transformed to inserts and deletes based on the
> + * old tuple and new tuple. The new action is updated in the
> + * action parameter. If not updated, action remains as update.
> + *
> + * Case 1: old-row (no match)new-row (no match)  -> (drop change)
> + * Case 2: old-row (no match)new row (match) -> INSERT
> + * Case 3: old-row (match)   new-row (no match)  -> DELETE
> + * Case 4: old-row (match)   new row (match) -> UPDATE
> + *
> + * If the change is to be replicated this function returns true, else false.
> + *
> + * Examples:
>
> The function header comment says the same thing 2x about the return values.
>
> The 1st text "If it returns true, the change is replicated, otherwise,
> it is not." should be replaced by the better wording of the 2nd text
> ("If the change is to be replicated this function returns true, else
> false."). Then, that 2nd text can be removed (from where it is later
> in this same comment).

Hi Hou-san, thanks for all the v64 updates!

I think the above comment was only partly fixed.

The v64-0001 comment still says:
+ * If it returns true, the change is replicated, otherwise, it is not.

I thought the 2nd text is better:
"If the change is to be replicated this function returns true, else false."

But maybe it is best to rearrange the whole thing like:
"Returns true if the change is to be replicated, else false."

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: support for MERGE

2022-01-13 Thread Zhihong Yu

On Thu, Jan 13, 2022 at 4:43 AM Alvaro Herrera 
wrote:

> Apologies, there was a merge failure and I failed to notice.  Here's the
> correct patch.
>
> (This is also in github.com/alvherre/postgres/tree/merge-15)
>
> Hi,
For v6-0001-MERGE-SQL-Command-following-SQL-2016.patch :

+* Either we were dealing with a NOT MATCHED tuple or
+* ExecMergeNotMatched() returned "false", indicating the previously

I think there was a typo above: ExecMergeMatched() returned "false"
because ExecMergeMatched() is called above the comment.

Cheers

Re: libpq compression (part 2)

2022-01-13 Thread Justin Pryzby

On Fri, Jan 14, 2022 at 02:12:17AM +0500, Daniil Zakhlystov wrote:
> Hi, Justin!
> 
> First of all, thanks for the detailed review. I’ve applied your patches to 
> the current version.

Note that my message had other comments that weren't addressed in this patch.

Your 0003 patch has a couple "noise" hunks that get rid of ^M characters added
in previous patches.  The ^M shouldn't be added in the first place.  Did you
apply my fixes using git-am or something else ?

On Fri, Jan 14, 2022 at 02:12:17AM +0500, Daniil Zakhlystov wrote:
> > On 12 Jan 2022, at 19:15, Justin Pryzby  wrote:
> > 
> > zlib still causes check-world to get stuck.  I first mentioned this last 
> > March:
> > 20210319062800.gi11...@telsasoft.com
> > 
...
> > I removed the thread from the CFBOT until that's resolved.
> 
> I’ve fixed the failing tests, now they should pass.

macos: passed
linux: timed out after 1hr
freebsd: failed in pg_rewind: ... <= replay_lsn AND state = 'streaming' FROM ...
windows: "Failed test 'data_checksums=on is reported on an offline cluster 
stdout /(?^:^on$)/'" / WARNING:  01000: algorithm zlib is not supported

Note that it's possible and easy to kick off a CI run using any github account:
see ./src/tools/ci/README

For me, it's faster than running check-world -j4 locally, and runs tests on 4
OSes.

I re-ran your branch under my own account and linux didn't get stuck (and the
compiler warnings tests passed).  But on a third attempt, macos failed the
pg_rewind test, and bsd failed the subscription test:
| SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('s', 
'r');

-- 
Justin

Re: Consider parallel for lateral subqueries with limit

2022-01-13 Thread James Coleman

On Tue, Jan 4, 2022 at 9:59 PM James Coleman  wrote:
>
> On Tue, Jan 4, 2022 at 5:31 PM Tom Lane  wrote:
> >
> > Greg Nancarrow  writes:
> > > The patch LGTM.
> > > I have set the status to "Ready for Committer".
> >
> > I don't really see why this patch is even a little bit safe.
> > The argument for it seems to be that a lateral subquery will
> > necessarily be executed in such a way that each complete iteration
> > of the subquery, plus joining to its outer rel, happens within a
> > single worker ... but where is the guarantee of that?  Once
> > you've marked the rel as parallel-safe, the planner is free to
> > consider all sorts of parallel join structures.  I'm afraid this
> > would be easily broken as soon as you look at cases with three or
> > more rels.  Or maybe even just two.  The reason for the existing
> > restriction boils down to this structure being unsafe:
> >
> > Gather
> > NestLoop
> > Scan ...
> > Limit
> > Scan ...
> >
> > and I don't see how the existence of a lateral reference
> > makes it any safer.
>
> Thanks for taking a look. I'm not following how the structure you
> posited is inherently unsafe when it's a lateral reference. That
> limit/scan (if lateral) has to be being executed per tuple in the
> outer scan, right? And if it's a unique execution per tuple, then
> consistency across tuples (that are in different workers) can't be a
> concern.
>
> Is there a scenario I'm missing where lateral can currently be
> executed in that way in that structure (or a different one)?

To expand on this:

Suppose lateral is not in play. Then if we have a plan like:

Gather
NestLoop
Scan X
Limit
Scan Y

Because we have the result "X join Limit(Y)"  we need "Limit(Y)" to be
consistent across all of the possible executions of "Limit(Y)" (i.e.,
in each worker it executes in). That means (absent infrastructure for
guaranteeing a unique ordering) we obviously can't parallelize the
inner side of the join as the limit may be applied in different ways
in each worker's execution.

Now suppose lateral is in play. Then (given the same plan) instead of
our result being "X join Limit(Y)" the result is "X join Limit(Y sub
X)", that is, each row in X is joined to a unique invocation of
"Limit(Y)". In this case we are already conceivably getting different
results for each execution of the subquery "Limit(Y)" even if we're
not running those executions across multiple workers. Whether we've
optimized such a subquery into running a single execution per row in X
or have managed to optimize it in such a way that a single execution
of "Limit(Y)" may be shared by multiple rows in X makes no difference
because there are no relational guarantees that that is the case
(conceptually each row in X gets its own results for "Limit(Y)").

I've not been able to come up with a scenario where this doesn't hold
-- even if part of the join or the subquery execution happens in a
different worker. I believe that for there to be a parallel safety
problem here you'd need to have a given subquery execution for a row
in X be executed multiple times. Alternatively I've been trying to
reason about whether there might be a scenario where a 3rd rel is
involved (i.e., the inner rel is itself a join rel), but as far as I
can tell the moment we end up with a join structure such that the
lateral rel is the one with the limit we'd be back to being safe again
for the reasons claimed earlier.

If there's something obvious (or not so obvious) I'm missing, I'd
appreciate a counterexample, but I'm currently unable to falsify my
original claim that the lateral reference is a fundamental difference
that allows us to consider this parallel safe.

Thanks,
James Coleman

Re: null iv parameter passed to combo_init()

2022-01-13 Thread Zhihong Yu

On Wed, Jan 12, 2022 at 7:08 PM Zhihong Yu  wrote:

>
>
> On Wed, Jan 12, 2022 at 6:49 PM Noah Misch  wrote:
>
>> On Mon, Jan 10, 2022 at 03:34:27PM -0800, Zhihong Yu wrote:
>> > On Sun, Jan 9, 2022 at 6:45 PM Zhihong Yu  wrote:
>> > > gcc -Wall -Wmissing-prototypes -Wpointer-arith
>> > > -Wdeclaration-after-statement -Werror=vla -Wendif-labels
>> > > -Wmissing-format-attribute -Wimplicit-fallthrough=3
>> -Wcast-function-type
>> > > -Wformat-security -fno-strict-aliasing -fwrapv
>> -fexcess-precision=standard
>> > > -Wno-format-truncation -Wno-stringop-truncation -fsanitize=undefined
>> > > -fsanitize-undefined-trap-on-error -I../../src/port -DFRONTEND
>> > > -I../../src/include  -D_GNU_SOURCE   -c -o path.o path.c
>> >
>> > Patch v3 passes `make check-world`
>>
>> The patch uses the "LENGTH_VAR != 0" style in px.c, but it uses
>> "POINTER_VAR
>> != NULL" style in the other files.  Please use "LENGTH_VAR != 0" style in
>> each
>> place you're changing.
>>
>> Assuming the next version looks good, I'll likely back-patch it to v10.
>> Would
>> anyone like to argue for a back-patch all the way to 9.2?
>>
> Hi,
> Please take a look at patch v5.
>
> Cheers
>
Noah:
Do you have any more review comments ?

Thanks

Re: Add spin_delay() implementation for Arm in s_lock.h

2022-01-13 Thread Blake, Geoff

As promised, here is the remaining data:

1 worker, w/o patch: 5236 ms +/- 252ms
1 worker, w/   patch: 5529 ms +/- 168ms

2 worker, w/o patch: 4917 ms +/- 180ms
2 worker, w/   patch: 4745 ms +/- 169ms

4 worker, w/o patch: 6564 ms +/- 336ms
4 worker, w/   patch: 6105 ms +/- 177ms

8 worker, w/o patch: 9575 ms +/- 2375ms
8 worker, w/   patch: 8115 ms +/- 391ms

16 worker, w/o patch: 19367 ms +/- 3543ms
16 worker, w/   patch: 18004 ms +/- 3701ms

32 worker, w/o patch: 101509 ms +/- 22651ms
32 worker, w/   patch: 104234 ms +/- 26821ms

48 worker, w/o patch: 243329 ms +/- 70037ms
48 worker, w/   patch: 189965 ms +/- 79459ms

64 worker, w/o patch: 552443 ms +/- 22841ms
64 worker, w/   patch: 502727 ms +/- 45253ms

From this data, on average the patch is beneficial at high worker (CPU) counts 
tested: 48, 63.  At 32 and below the performance is relatively close to each 
other.  

Thanks,
Geoff

Re: Add sub-transaction overflow status in pg_stat_activity

2022-01-13 Thread Bossart, Nathan

Thanks for the new patch!

+   
+Returns a record of information about the backend's subtransactions.
+The fields returned are subxact_count identifies
+number of active subtransaction and subxact_overflow
+ shows whether the backend's subtransaction cache is
+overflowed or not.
+   
+   

nitpick: There is an extra "" here.

Would it be more accurate to say that subxact_count is the number of
subxids that are cached?  It can only ever go up to
PGPROC_MAX_CACHED_SUBXIDS.

Nathan

Re: do only critical work during single-user vacuum?

2022-01-13 Thread Bossart, Nathan

On 1/13/22, 4:58 AM, "John Naylor"  wrote:
> On Wed, Jan 12, 2022 at 12:26 PM Bossart, Nathan  wrote:
>> As I've stated upthread, Sawada-san's suggested approach was my
>> initial reaction to this thread.  I'm not wedded to the idea of adding
>> new options, but I think there are a couple of advantages.  For both
>> single-user mode and normal operation (which may be in imminent
>> wraparound danger), you could use the same command:
>>
>> VACUUM (MIN_XID_AGE 16, ...);
>
> My proposed top-level statement can also be used in normal operation,
> so the only possible advantage is configurability. But I don't really
> see any advantage in that -- I don't think we should be moving in the
> direction of adding more-intricate ways to paper over the deficiencies
> in autovacuum scheduling. (It could be argued that I'm doing exactly
> that in this whole thread, but [imminent] shutdown situations have
> other causes besides deficient scheduling.)

The new top-level command would be configurable, right?  Your patch
uses autovacuum_freeze_max_age/autovacuum_multixact_freeze_max_age, so
the behavior of this new command now depends on the values of
parameters that won't obviously be related to it.  If these parameters
are set very low (e.g., the default values), then this command will
end up doing far more work than is probably necessary.

If we did go the route of using a parameter to determine which tables
to vacuum, I think vacuum_failsafe_age is a much better candidate, as
it defaults to a much higher value that is more likely to prevent
doing extra work.  That being said, I don't know if overloading
parameters is the right way to go.

>> (As an aside, we'd need to figure out how XID and MXID options would
>> work together.  Presumably most users would want to OR them.)
>>
>> This doesn't really tie in super nicely with the failsafe mechanism,
>> but adding something like a FAILSAFE option doesn't seem right to me,
>
> I agree -- it would be awkward and messy as an option. However, I see
> the same problem with xid/mxid -- I would actually argue they are not
> even proper options; they are "selectors". Your comments above about
> 1) needing to OR them and 2) emitting a message when a VACUUM command
> doesn't actually do anything are evidence of that fact.

That's a fair point.  But I don't think these problems are totally
intractable.  We already emit "skipping" messages from VACUUM
sometimes, and interactions between VACUUM options exist today, too.
For example, FREEZE is redundant when FULL is specified, and
INDEX_CLEANUP is totally ignored when FULL is used.

>> The other advantage I see with age-related options is that it can be
>> useful for non-imminent-wraparound situations as well.  For example,
>> maybe a user just wants to manually vacuum everything (including
>> indexes) with an age above 500M on the weekends.
>
> There is already vaccumdb for that, and I think it's method of
> selecting tables is sound -- I'm not convinced that pushing table
> selection to the server command as "options" is an improvement.

I guess I'm ultimately imagining the new options as replacing the
vacuumdb implementation.  IOW vacuumdb would just use MIN_(M)XID_AGE
behind the scenes (as would a new top-level command).

Nathan

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-01-13 Thread Peter Geoghegan

On Thu, Jan 13, 2022 at 12:19 PM Robert Haas  wrote:
> I can't follow this. If the idea is that we're going to
> opportunistically freeze a page whenever that allows us to mark it
> all-visible, then the remaining question is what XID age we should use
> to force freezing when that rule doesn't apply.

That is the idea, yes.

> It seems to me that
> there is a rebuttable presumption that that case ought to work just as
> it does today - and I think I hear you saying that it should NOT work
> as it does today, but should use some other threshold. Yet I can't
> understand why you think that.

Cases where we can not get a cleanup lock fall into 2 sharply distinct
categories in my mind:

1. Cases where our inability to get a cleanup lock signifies nothing
at all about the page in question, or any page in the same table, with
the same workload.

2. Pathological cases. Cases where we're at least at the mercy of the
application to do something about an idle cursor, where the situation
may be entirely hopeless on a long enough timeline. (Whether or not it
actually happens in the end is less significant.)

As far as I can tell, based on testing, category 1 cases are fixed by
the patch series: while a small number of pages from tables in
category 1 cannot be cleanup-locked during each VACUUM, even with the
patch series, it happens at random, with no discernable pattern. The
overall result is that our ability to advance relfrozenxid is really
not impacted *over time*. It's reasonable to suppose that lightning
will not strike in the same place twice -- and it would really have to
strike several times to invalidate this assumption. It's not
impossible, but the chances over time are infinitesimal -- and the
aggregate effect over time (not any one VACUUM operation) is what
matters.

There are seldom more than 5 or so of these pages, even on large
tables. What are the chances that some random not-yet-all-frozen block
(that we cannot freeze tuples on) will also have the oldest
couldn't-be-frozen XID, even once? And when it is the oldest, why
should it be the oldest by very many XIDs? And what are the chances
that the same page has the same problem, again and again, without that
being due to some pathological workload thing?

Admittedly you may see a blip from this -- you might notice that the
final relfrozenxid value for that one single VACUUM isn't quite as new
as you'd like. But then the next VACUUM should catch up with the
stable long term average again. It's hard to describe exactly why this
effect is robust, but as I said, empirically, in practice, it appears
to be robust. That might not be good enough as an explanation that
justifies committing the patch series, but that's what I see. And I
think I will be able to nail it down.

AFAICT that just leaves concern for cases in category 2. More on that below.

> Even if you eventually get there, it may take
> multiple days before you find a time when a table is immediately
> available, whereas if you had just gone over there and stood in line,
> you likely would have been seated in under an hour and savoring the
> goodness of quality deep-dish pizza not too long thereafter. The same
> principle applies here.

I think that you're focussing on individual VACUUM operations, whereas
I'm more concerned about the aggregate effect of a particular policy
over time.

Let's assume for a moment that the only thing that we really care
about is reliably keeping relfrozenxid reasonably recent. Even then,
waiting for a cleanup lock (to freeze some tuples) might be the wrong
thing to do. Waiting in line means that we're not freezing other
tuples (nobody else can either). So we're allowing ourselves to fall
behind on necessary, routine maintenance work that allows us to
advance relfrozenxidin order to advance relfrozenxid.

> I do think that waiting for a cleanup lock when the age of the page is
> only vacuum_freeze_min_age seems like it might be too aggressive, but
> I don't think that's how it works. AFAICS, it's based on whether the
> vacuum is marked as aggressive, which has to do with
> vacuum_freeze_table_age, not vacuum_freeze_min_age. Let's turn the
> question around: if the age of the oldest XID on the page is >150
> million transactions and the buffer cleanup lock is not available now,
> what makes you think that it's any more likely to be available when
> the XID age reaches 200 million or 300 million or 700 million?

This is my concern -- what I've called category 2 cases have this
exact quality. So given that, why not freeze what you can, elsewhere,
on other pages that don't have the same issue (presumably the vast
vast majority in the table)? That way you have the best possible
chance of recovering once the DBA gets a clue and fixes the issue.

> There
> is perhaps an argument for some kind of tunable that eventually shoots
> the other session in the head (if we can identify it, anyway) but it
> seems to me that regardless of what threshold we pick, polling is
> strictly

Re: Adding CI to our tree

2022-01-13 Thread Andrew Dunstan

On 1/13/22 13:55, Andres Freund wrote:
>> It needs to avoid running on the buildfarm, right ?
> I guess so. It currently appears to have its own logic for finding contrib
> (and other) tap tests:
>
> foreach my $testdir (glob("$pgsql/contrib/*"))
> {
> next unless -d "$testdir/t";
> my $testname = basename($testdir);
> next unless step_wanted("contrib-$testname");
> print time_str(), "running contrib test $testname ...\n" if 
> $verbose;
> run_tap_test("$testdir", "contrib-$testname", undef);
> }
>
> but does run vcregress contribcheck, modulescheck.
>
>
> Andrew, do you see a better way than what Justin is proposing here?
>

I can probably adjust to whatever we decide to do. But I think we're
really just tinkering at the edges here. What I think we really need is
the moral equivalent of `make check-world` in one invocation of
vcregress.pl.

While we're on the subject of vcregress.pl, there's this recent
discussion, which is on my list of things to return to:

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Patch: Code comments: why some text-handling functions are leakproof

2022-01-13 Thread Robert Haas

On Tue, Jan 11, 2022 at 2:07 AM Gurjeet Singh  wrote:
> Please see attached a small patch to document why some text-processing 
> functions are marked as leakproof, while some others are not.
>
> This is more or less a verbatim copy of Tom's comment in email thread at [1].
>
> I could not find an appropriate spot to place these comments, so I placed 
> them on bttextcmp() function, The only other place that I could see we can 
> place these comments is in the file src/backend/optimizer/README, because 
> there is some consideration given to leakproof functions in optimizer docs. 
> But these comments seem quite out of place in optimizer docs.

It doesn't seem particularly likely that someone who is thinking about
changing this in the future would notice the comment in the place
where you propose to put it, nor that they would read the optimizer
README.

Furthermore, I don't know that everyone agrees with Tom about this. I
do agree that it's more important to mark relational operators
leakproof than other things, and I also agree that conservatism is
warranted. But that does not mean that someone could not make a
compelling argument for marking other functions leakproof.

I think we will be better off leaving this alone.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-01-13 Thread Robert Haas

On Fri, Jan 7, 2022 at 5:20 PM Peter Geoghegan  wrote:
> I thought I was being conservative by suggesting
> autovacuum_freeze_max_age/2. My first thought was to teach VACUUM to
> make its FreezeLimit "OldestXmin - autovacuum_freeze_max_age". To me
> these two concepts really *are* the same thing: vacrel->FreezeLimit
> becomes a backstop, just as anti-wraparound autovacuum (the
> autovacuum_freeze_max_age cutoff) becomes a backstop.

I can't follow this. If the idea is that we're going to
opportunistically freeze a page whenever that allows us to mark it
all-visible, then the remaining question is what XID age we should use
to force freezing when that rule doesn't apply. It seems to me that
there is a rebuttable presumption that that case ought to work just as
it does today - and I think I hear you saying that it should NOT work
as it does today, but should use some other threshold. Yet I can't
understand why you think that.

> I couldn't agree more. In fact, I was mostly thinking about how to
> *help* these users. Insisting on waiting for a cleanup lock before it
> becomes strictly necessary (when the table age is only 50
> million/vacuum_freeze_min_age) is actually a big part of the problem
> for these users. vacuum_freeze_min_age enforces a false dichotomy on
> aggressive VACUUMs, that just isn't unhelpful. Why should waiting on a
> cleanup lock fix anything?

Because waiting on a lock means that we'll acquire it as soon as it's
available. If you repeatedly call your local Pizzeria Uno's and ask
whether there is a wait, and head to the restaurant only when the
answer is in the negative, you may never get there, because they may
be busy every time you call - especially if you always call around
lunch or dinner time. Even if you eventually get there, it may take
multiple days before you find a time when a table is immediately
available, whereas if you had just gone over there and stood in line,
you likely would have been seated in under an hour and savoring the
goodness of quality deep-dish pizza not too long thereafter. The same
principle applies here.

I do think that waiting for a cleanup lock when the age of the page is
only vacuum_freeze_min_age seems like it might be too aggressive, but
I don't think that's how it works. AFAICS, it's based on whether the
vacuum is marked as aggressive, which has to do with
vacuum_freeze_table_age, not vacuum_freeze_min_age. Let's turn the
question around: if the age of the oldest XID on the page is >150
million transactions and the buffer cleanup lock is not available now,
what makes you think that it's any more likely to be available when
the XID age reaches 200 million or 300 million or 700 million? There
is perhaps an argument for some kind of tunable that eventually shoots
the other session in the head (if we can identify it, anyway) but it
seems to me that regardless of what threshold we pick, polling is
strictly less likely to find a time when the page is available than
waiting for the cleanup lock. It has the counterbalancing advantage of
allowing the autovacuum worker to do other useful work in the meantime
and that is indeed a significant upside, but at some point you're
going to have to give up and admit that polling is a failed strategy,
and it's unclear why 150 million XIDs - or probably even 50 million
XIDs - isn't long enough to say that we're not getting the job done
with half measures.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

2022-01-13 Thread Arne Roland

 Hi!

> From: Tomas Vondra 
> Subject: Re: PATCH: generate fractional cheapest paths in 
> generate_orderedappend_path
>
> test-# SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2)
> ORDER BY id1 ASC, id2 ASC LIMIT 10;
>QUERY PLAN
>
> --
>   Limit
> ->  Merge Left Join
>   Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2))
>   ->  Append
> ->  Index Only Scan using fract_t0_id1_id2_idx on
>   fract_t0 x_1
> ->  Incremental Sort
>   Sort Key: x_2.id1, x_2.id2
>   Presorted Key: x_2.id1
>   ->  Index Scan using fract_t1_pkey on fract_t1 x_2
>   ->  Materialize
> ->  Append
>   ->  Incremental Sort
> Sort Key: y_1.id1, y_1.id2
> Presorted Key: y_1.id1
> ->  Index Scan using fract_t0_pkey on
>  fract_t0 y_1
>   Index Cond: (id1 = id1)
>   Filter: (id2 = id2)
>   ->  Incremental Sort
> Sort Key: y_2.id1, y_2.id2
> Presorted Key: y_2.id1
> ->  Index Scan using fract_t1_pkey on
>  fract_t1 y_2
>   Index Cond: (id1 = id1)
>   Filter: (id2 = id2)
> (23 rows)
> [...]
> So that seems reasonable

Maybe I'm just slow, but that doesn't seem reasonable to me. It doesn't look 
like a valid plan to me. Sure all the nodes are arranged like I'd like them to 
be. But what are the id1/id2 bound we have in the index and filter conditions?

> [...]but there's a couple issues too:
>
> 1) Planning works (hence we can run explain), but execution fails
> because of segfault in CheckVarSlotCompatibility - it gets NULL slot for
> some reason. I haven't looked into the details, but I'd guess we need to
> pass a different rel to create_incrementalsort_path, not childrel.

In case my above concern is valid, maybe the executor is just as confused as I 
am. Such conditions should generate VarSlots, no? If so, where should they come 
from?

Sadly I don't have time to debug that in depth today.

> 2) enable_partitionwisejoin=on seems to cause some confusion, because it
> results in picking a different plan with higher cost. But that's clearly
> not because of this patch.

Short version: I'd neither conceptually expect costs to be lower in any case, 
nor would that be desirable, because our estimates aren't perfect.

Long version: What do you mean by confusion. The plan I get with the patch 
doesn't seem to confusing to me. Generally something like that is to be 
expected. enable_partitionwisejoin changes the way this planing works by 
rewriting the entire query effectively rule based. So we end up with a 
completely different query. I'd in particular expect slightly different startup 
costs.
So if we activate this we consider completely different plans, I struggle to 
come up with a meaningful example where there is any overlap at all. Thus it 
doesn't surprise me conceptually.
>From practical experience I'd say: If they are about the same plan, the costs 
>estimates work somewhat okish.
If we change the way we compose the nodes together, we sometimes end up with 
radical different costs for doing the same. While I suspect there are a lot of 
corner cases causing this, I've seen quite a few where this was due to the 
fact, that our planer often has insignificant statistics to know something and 
takes a guess. This has gotten way better of recent years, but it's in 
particular for non-trivial joins still a problem in practice.

> 3) I'm still a bit skeptical about the cost of this implementation - it
> builds the incrementalsort path, just to throw most of the paths away.
> It'd be much better to just calculate the cost, which should be much
> cheaper, and only do the heavy work for the one "best" path.

Maybe we should profile this to get a rough estimate, how much time we spend 
building these incremental paths. From a code perspective it's non trivial to 
me where the time is lost.

> 4) One thing I did not realize before is what pathkeys we even consider
> here. Imagine you have two tables:
>
> CREATE TABLE t1 (a int, b int, primary key (a));
> CREATE TABLE t2 (a int, b int, primary key (a));
>
> and query
>
> SELECT * FROM t1 JOIN t2 USING (a,b);
>
> It seems reasonable to also consider pathkeys (a,b) because that's make
> e.g. mergejoin much cheaper, right? But no, we'll not do that - we only
> consider pathkeys that at least one child relation has, so in this case
> it's just (a). Which means we'll never con

Re: tab completion of enum values is broken

2022-01-13 Thread Tom Lane

Peter Eisentraut  writes:
> This doesn't work anymore:
> create type e2 as enum ('foo', 'bar');
> alter type e2 rename value 'b
> This now results in
> alter type e2 rename value 'b'

The main issue here is that Query_for_list_of_enum_values[_with_schema]
is designed to match against a pre-quoted list of enum values,
which was appropriate when it was written because we hadn't configured
Readline to do anything special with quotes.  Now that we have, the
string that is being supplied to match against lacks the leading quote,
so that we need to remove the quote_literal() calls from those queries.

A secondary problem is that if we fail to identify any match, Readline
nonetheless decides to append a trailing quote.  That's why you get the
added quote above, and it still happens with the query fix.  I'm not
sure why Readline thinks it should do that.  I worked around it in the
attached draft patch by setting rl_completion_suppress_quote = 1 in our
failure-to-match case, but that feels like using a big hammer rather
than a proper solution.

I'm not totally satisfied with this patch for a couple of reasons:

1. It'll allow the user to enter a non-quoted enum value,
for example alter type e2 rename value b
produces
alter type e2 rename value bar 
It's not clear to me that there's any way around that, though.
I tried returning pre-quoted values as we did before (ie,
changing only the WHERE clauses in the queries) but then
Readline fails to match anything.  We do have code to force
quoting of actual filenames, but I think that's dependent on
going through rl_filename_completion_function(), which of course
we can't do here.

2. It doesn't seem like there's any nice way to deal with enum
values that contain single quotes (which need to be doubled).
Admittedly the use-case for that is probably epsilon, but
it annoys me that it doesn't work.

In the end, it seems like the value of this specific completion
rule is not large enough to justify doing a ton of work to
eliminate #1 or #2.  So I propose doing the attached and calling
it good.  Maybe we could add a test case.

Oh ... experimenting on macOS (with the system-provided libedit)
shows no bug here.  So I guess we'll need to make this conditional
somehow, perhaps on USE_FILENAME_QUOTING_FUNCTIONS.  That's another
reason for not going overboard.

regards, tom lane

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 39be6f556a..e856d918ae 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -679,20 +679,20 @@ static const SchemaQuery Query_for_list_of_collations = {
 "OR '\"' || nspname || '\"' ='%s') "

 #define Query_for_list_of_enum_values \
-"SELECT pg_catalog.quote_literal(enumlabel) "\
+"SELECT enumlabel "\
 "  FROM pg_catalog.pg_enum e, pg_catalog.pg_type t "\
 " WHERE t.oid = e.enumtypid "\
-"   AND substring(pg_catalog.quote_literal(enumlabel),1,%d)='%s' "\
+"   AND substring(enumlabel,1,%d)='%s' "\
 "   AND (pg_catalog.quote_ident(typname)='%s' "\
 "OR '\"' || typname || '\"'='%s') "\
 "   AND pg_catalog.pg_type_is_visible(t.oid)"

 #define Query_for_list_of_enum_values_with_schema \
-"SELECT pg_catalog.quote_literal(enumlabel) "\
+"SELECT enumlabel "\
 "  FROM pg_catalog.pg_enum e, pg_catalog.pg_type t, pg_catalog.pg_namespace n "\
 " WHERE t.oid = e.enumtypid "\
 "   AND n.oid = t.typnamespace "\
-"   AND substring(pg_catalog.quote_literal(enumlabel),1,%d)='%s' "\
+"   AND substring(enumlabel,1,%d)='%s' "\
 "   AND (pg_catalog.quote_ident(typname)='%s' "\
 "OR '\"' || typname || '\"'='%s') "\
 "   AND (pg_catalog.quote_ident(nspname)='%s' "\
@@ -4413,8 +4413,12 @@ psql_completion(const char *text, int start, int end)
 	if (matches == NULL)
 	{
 		COMPLETE_WITH_CONST(true, "");
+		/* Also, prevent Readline from appending stuff to the non-match */
 #ifdef HAVE_RL_COMPLETION_APPEND_CHARACTER
 		rl_completion_append_character = '\0';
+#endif
+#ifdef HAVE_RL_COMPLETION_SUPPRESS_QUOTE
+		rl_completion_suppress_quote = 1;
 #endif
 	}

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-13 Thread Andres Freund

Hi,

On 2021-12-31 18:12:37 +0530, Bharath Rupireddy wrote:
> Currently the server is erroring out when unable to remove/parse a
> logical rewrite file in CheckPointLogicalRewriteHeap wasting the
> amount of work the checkpoint has done and preventing the checkpoint
> from finishing.

This seems like it'd make failures to remove the files practically
invisible. Which'd have it's own set of problems?

What motivated proposing this change?

Greetings,

Andres Freund

Re: Adding CI to our tree

2022-01-13 Thread Andres Freund

Hi,

On 2022-01-13 13:06:42 -0600, Justin Pryzby wrote:
> Question: are data-checksums tested at all ?  The only thing I can find is 
> that
> some buildfarm members *might* exercise it during installcheck.

There's some coverage via src/bin/pg_basebackup/t/010_pg_basebackup.pl and
src/bin/pg_checksums/t/002_actions.pl - but that's not a whole lot.

Might be worth converting one of the "additional" pg_regress runs to use
data-checksums? E.g. pg_upgrade's, or the one being introduced in the "replay"
test?
https://postgr.es/m/CA%2BhUKGK-%2Bmg6RWiDu0JudF6jWeL5%2BgPmi8EKUm1eAzmdbwiE_A%40mail.gmail.com

> From b67cd05895c8fa42e13e6743db36412a68292607 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Sun, 9 Jan 2022 22:54:32 -0600
> Subject: [PATCH 2/7] CI: run initdb with --no-sync for windows

Applied this already.

> From 885becd19f630a69ab1de44cefcdda21ca8146d6 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Tue, 11 Jan 2022 01:30:37 -0600
> Subject: [PATCH 4/7] cirrus/linux: script test.log..
> 
> For consistency, and because otherwise errors can be lost (?) or hard to find.

> -  make -s ${CHECK} ${CHECKFLAGS} -j${TEST_JOBS}
> +  script --command "make -s ${CHECK} ${CHECKFLAGS} -j${TEST_JOBS}" 
> test.log
>  EOF

I'm not following this one? all the output is in the CI run already, you can
download it already as well?

The only reason to use script as a wrapper is that otherwise make on
freebsd/macos warns about fcntl failures?

>only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || 
> $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*linux.*'
> @@ -183,7 +178,7 @@ task:
>  mkdir -p ${CCACHE_DIR}
>  chown -R postgres:postgres ${CCACHE_DIR}
>  echo '* - memlock 134217728' > /etc/security/limits.d/postgres.conf
> -su postgres -c "ulimit -l -H && ulimit -l -S"
> +su postgres -c "ulimit -l -H && ulimit -l -S" # XXX

Hm?

Greetings,

Andres Freund

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-13 Thread Bossart, Nathan

On 1/12/22, 10:03 PM, "Bharath Rupireddy" 
 wrote:
> On Thu, Jan 13, 2022 at 3:47 AM Bossart, Nathan  wrote:
>> The only feedback I have for the patch is that I don't think the new
>> comments are necessary.
>
> I borrowed the comments as-is from the CheckPointSnapBuild introduced
> by the commit b89e15105. IMO, let the comments be there as they
> explain why we are not emitting ERRORs, however I will leave it to the
> committer to decide on that.

Huh, somehow I missed that when I looked at it yesterday.  I'm going
to bump this one to ready-for-committer, then.

Nathan

Re: Adding CI to our tree

2022-01-13 Thread Justin Pryzby

On Thu, Jan 13, 2022 at 10:55:27AM -0800, Andres Freund wrote:
> I'll probably apply that part and 0002 separately.

I've hacked on it a bit more now..

Question: are data-checksums tested at all ?  The only thing I can find is that
some buildfarm members *might* exercise it during installcheck.

I added pg_regress --initdb-opts since that seems to be a deficiency.

-- 
Justin
>From 10121f588d0b3cab67ee810a718511879d6f24a8 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 9 Jan 2022 18:25:02 -0600
Subject: [PATCH 1/7] vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1

---
 .cirrus.yml|  4 +-
 contrib/pg_stat_statements/Makefile|  2 +-
 contrib/test_decoding/Makefile |  2 +-
 src/test/modules/snapshot_too_old/Makefile |  2 +-
 src/test/modules/worker_spi/Makefile   |  2 +-
 src/tools/msvc/vcregress.pl| 46 +++---
 6 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 19b3737fa11..02ea7e67189 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -398,9 +398,9 @@ task:
   test_isolation_script:
 - perl src/tools/msvc/vcregress.pl isolationcheck
   test_modules_script:
-- perl src/tools/msvc/vcregress.pl modulescheck
+- perl src/tools/msvc/vcregress.pl modulescheck install
   test_contrib_script:
-- perl src/tools/msvc/vcregress.pl contribcheck
+- perl src/tools/msvc/vcregress.pl contribcheck install
   stop_script:
 - tmp_install\bin\pg_ctl.exe stop -D tmp_check/db -l tmp_check/postmaster.log
   test_ssl_script:
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 7fabd96f38d..d732e1ade73 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -15,7 +15,7 @@ PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements"
 
 LDFLAGS_SL += $(filter -lm, $(LIBS))
 
-REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
+REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = pg_stat_statements oldextversions
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a31e0b8795..14fd847ba7f 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
 	oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
 	twophase_snapshot
 
-REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
+REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 
 # Disabled because these tests require "wal_level=logical", which
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index dfb4537f63c..752a0039fdc 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -5,7 +5,7 @@
 EXTRA_CLEAN = $(pg_regress_clean_files)
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
-ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
+ISOLATION_OPTS = --temp-config=$(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
 
 # Disabled because these tests require "old_snapshot_threshold" >= 0, which
 # typical installcheck users do not have (e.g. buildfarm clients).
diff --git a/src/test/modules/worker_spi/Makefile b/src/test/modules/worker_spi/Makefile
index cbf9b2e37fd..d9f7d9bab6d 100644
--- a/src/test/modules/worker_spi/Makefile
+++ b/src/test/modules/worker_spi/Makefile
@@ -9,7 +9,7 @@ PGFILEDESC = "worker_spi - background worker example"
 REGRESS = worker_spi
 
 # enable our module in shared_preload_libraries for dynamic bgworkers
-REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/worker_spi/dynamic.conf
+REGRESS_OPTS = --temp-config=$(top_srcdir)/src/test/modules/worker_spi/dynamic.conf
 
 # Disable installcheck to ensure we cover dynamic bgworkers.
 NO_INSTALLCHECK = 1
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index 7575acdfdf5..b24ea11aa4a 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -443,6 +443,7 @@ sub plcheck
 sub subdircheck
 {
 	my $module = shift;
+	my $installcheck = shift || 1;
 
 	if (   !-d "$module/sql"
 		|| !-d "$module/expected"
@@ -452,7 +453,7 @@ sub subdircheck
 	}
 
 	chdir $module;
-	my @tests = fetchTests();
+	my @tests = fetchTests($installcheck);
 
 	# Leave if no tests are listed in the module.
 	if (scalar @tests == 0)
@@ -462,6 +463,7 @@ sub subdircheck
 	}
 
 	my @opts = fetchRegressOpts();
+	push @opts, "--temp-instance=tmp_check" i

Re: UNIQUE null treatment option

2022-01-13 Thread Pavel Borisov

>
> I wonder if the logic for setting BTScanInsertData.anynullkeys inside
> _bt_mkscankey() is the place to put your test for
> rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That
> would probably necessitate renaming anynullkeys, but that's okay. This
> feels more natural to me because a NULL key column in a NULLS NOT
> DISTINCT unique constraint is very similar to a NULL non-key column in
> an INCLUDE index, as far as our requirements go -- and so both cases
> should probably be dealt with at the same point.
>

A good point, indeed!

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: Adding CI to our tree

2022-01-13 Thread Andres Freund

Hi,

On 2022-01-10 16:07:48 -0600, Justin Pryzby wrote:
> On Sun, Jan 09, 2022 at 11:57:44AM -0800, Andres Freund wrote:
> > On 2022-01-09 13:16:50 -0600, Justin Pryzby wrote:
> > > diff --git a/contrib/test_decoding/Makefile 
> > > b/contrib/test_decoding/Makefile
> > > index 9a31e0b8795..14fd847ba7f 100644
> > > --- a/contrib/test_decoding/Makefile
> > > +++ b/contrib/test_decoding/Makefile
> > > @@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup 
> > > concurrent_ddl_dml \
> > >   oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
> > >   twophase_snapshot
> > >
> > > -REGRESS_OPTS = --temp-config 
> > > $(top_srcdir)/contrib/test_decoding/logical.conf
> > > +REGRESS_OPTS = 
> > > --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
> > >  ISOLATION_OPTS = --temp-config 
> > > $(top_srcdir)/contrib/test_decoding/logical.conf
> >
> > Not sure why these are part of the diff?
>
> Because otherwise vcregress runs pg_regress --temp-config test1 test2 [...]
> ..which means test1 gets eaten as the argument to --temp-config

Ah. I see you changed that globally, good...

I'll probably apply that part and 0002 separately.


> > This doesn't really seem like a scalable path forward - duplicating
> > configuration in more places doesn't seem sane. It seems it'd make more 
> > sense
> > to teach vcregress.pl to run NO_INSTALLCHECK targets properly? ISTM that
> > changing the options passed to pg_regress based on fetchTests() return value
> > wouldn't be too hard?
>
> It needs to run the tests with separate instance.  Maybe you're suggesting to
> use --temp-instance.

Yes.


> It needs to avoid running on the buildfarm, right ?

I guess so. It currently appears to have its own logic for finding contrib
(and other) tap tests:

foreach my $testdir (glob("$pgsql/contrib/*"))
{
next unless -d "$testdir/t";
my $testname = basename($testdir);
next unless step_wanted("contrib-$testname");
print time_str(), "running contrib test $testname ...\n" if 
$verbose;
run_tap_test("$testdir", "contrib-$testname", undef);
}

but does run vcregress contribcheck, modulescheck.


Andrew, do you see a better way than what Justin is proposing here?

Greetings,

Andres Freund

Re: UNIQUE null treatment option

2022-01-13 Thread Peter Geoghegan

On Thu, Jan 13, 2022 at 10:36 AM Peter Geoghegan  wrote:
> I wonder if the logic for setting BTScanInsertData.anynullkeys inside
> _bt_mkscankey() is the place to put your test for
> rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That
> would probably necessitate renaming anynullkeys, but that's okay. This
> feels more natural to me because a NULL key column in a NULLS NOT
> DISTINCT unique constraint is very similar to a NULL non-key column in
> an INCLUDE index, as far as our requirements go -- and so both cases
> should probably be dealt with at the same point.

Correction: I meant to write "...a NULL key column in a NULLS DISTINCT
unique constraint is very similar...".

-- 
Peter Geoghegan

Re: UNIQUE null treatment option

2022-01-13 Thread Peter Geoghegan

On Wed, Dec 29, 2021 at 2:06 AM Peter Eisentraut
 wrote:
> Here is a rebased version of this patch.

BTScanInsertData.anynullkeys already effectively means "if the index
is a unique index, then we don't actually need to go through
_bt_check_unique(), or perform any other checkingunique steps". This
is really an instruction about what to do (or not do), based on the
specifics of the values for the insertion scan key plus the index
definition. In other words, the code in _bt_mkscankey() that sets up
BTScanInsertData (an insertion scankey) was written with the exact
requirements of btinsert() in mind -- nothing more.

I wonder if the logic for setting BTScanInsertData.anynullkeys inside
_bt_mkscankey() is the place to put your test for
rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That
would probably necessitate renaming anynullkeys, but that's okay. This
feels more natural to me because a NULL key column in a NULLS NOT
DISTINCT unique constraint is very similar to a NULL non-key column in
an INCLUDE index, as far as our requirements go -- and so both cases
should probably be dealt with at the same point.

-- 
Peter Geoghegan

Re: Refactoring of compression options in pg_basebackup

2022-01-13 Thread Robert Haas

On Fri, Jan 7, 2022 at 1:43 AM Michael Paquier  wrote:
> Which is why things should be checked once all the options are
> processed.  I'd recommend that you read the option patch a bit more,
> that may help.

I don't think that the problem here is my lack of understanding. I
have two basic concerns about your proposed patch:

1. If, as you propose, we add a new flag --compression-method=METHOD
then how will the user specify server-side compression?
2. If, as we seem to agree, the compression method is more important
than the compression level, then why is the option to set the
less-important thing called just --compress, and the option to set the
more important thing has a longer name?

I proposed to solve both of these problems by using
--compression-level=NUMBER to set the compression level and
--compress=METHOD or --server-compress=METHOD to set the algorithm and
specify on which side it is to be applied. If, instead of doing that,
we go with what you have proposed here, then I don't understand how to
fit server-side compression into the framework in a reasonably concise
way. I think we would end up with something like pg_basebackup
--compression-method=lz4 --compress-on-server, which seems rather long
and awkward. Do you have a better idea?

I think I understand what the patch is doing. I just think it creates
a problem for my patch. And I'd like to know whether you have an idea
how to solve that problem. And if not, then I'd like you to consider
the solution that I am proposing rather than the patch you've already
got.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Schema variables - new implementation for Postgres 15

2022-01-13 Thread Pavel Stehule

čt 13. 1. 2022 v 19:23 odesílatel Dean Rasheed 
napsal:

> On Thu, 13 Jan 2022 at 17:42, Pavel Stehule 
> wrote:
> >
> > I like the idea of prioritizing tables over variables with warnings when
> collision is detected. It cannot break anything. And it allows to using
> short identifiers when there is not collision.
>
> Yeah, that seems OK, as long as it's clearly documented. I don't think
> a warning is necessary.
>

The warning can be disabled by default, but I think it should be there.
This is a signal, so some in the database schema should be renamed. Maybe -
session_variables_ambiguity_warning.


> (FWIW, testing with dbfiddle, that appears to match Db2's behaviour).
>

Thank you for check

Regards

Pavel


> Regards,
> Dean
>

Re: Schema variables - new implementation for Postgres 15

2022-01-13 Thread Dean Rasheed

On Thu, 13 Jan 2022 at 17:42, Pavel Stehule  wrote:
>
> I like the idea of prioritizing tables over variables with warnings when 
> collision is detected. It cannot break anything. And it allows to using short 
> identifiers when there is not collision.

Yeah, that seems OK, as long as it's clearly documented. I don't think
a warning is necessary.

(FWIW, testing with dbfiddle, that appears to match Db2's behaviour).

Regards,
Dean

Re: Adding CI to our tree

2022-01-13 Thread Andres Freund

Hi,

On 2021-12-30 17:46:52 -0800, Andres Freund wrote:
> I plan to push this after another cycle of tests passing (and driving
> over the bay bridge...) unless I hear protests.

I noticed that it's harder to see compile warnings on msvc in CI than it was
at an earlier point.  There used to be a summary of errors at the end.

That turns out to be an uninteded consequence of the option to reduce msbuild
verbosity.

> +# Use parallelism, disable file tracker, we're never going to rebuild...
> +MSBFLAGS: -m -verbosity:minimal /p:TrackFileAccess=false

Unless somebody protests quickly, I'm going to add
  "-consoleLoggerParameters:Summary;ForceNoAlign"
to that.

Greetings,

Andres Freund

Re: disfavoring unparameterized nested loops

2022-01-13 Thread Robert Haas

On Wed, Jan 12, 2022 at 7:20 PM Peter Geoghegan  wrote:
> Should I take it that you've dropped this project? I was rather hoping
> that you'd get back to it, FWIW.

Honestly, I'd forgotten all about it.  While in theory I'd like to do
something about this, but I just have too many other things going on.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Time to drop plpython2?

2022-01-13 Thread Andres Freund

Hi,

On 2022-01-11 13:59:32 +0100, Peter Eisentraut wrote:
> I suspect at this point, the Meson move isn't going to happen for PostgreSQL
> 15.

Yes - I IIRC even noted early in the thread that I don't think it's realistic
to finish it in time for 15. There's just a good amount of portability hacking
left to be done, and a good chunk of cleanup. I have a new colleague working
on automating testing it on more platforms.

Getting a few of the prerequisite patches in would make it more likely to
succeed in 16 though.

> Even if the code were ready, which it is not, then this kind of thing
> would surely be something more sensible to put into PG16 early rather than
> PG15 late.

Right. There's also a lot of discussion around transition paths etc that need
to be happening first. A lot of other projects that moved ran with multiple
buildsystems for ~1 release, so things could mopped up without blocking
everyone.

> I think what I wrote in the message I'm responding to is the best way
> forward, although maybe with less than 60 days now.

Already happened, but I'm a belated +1 ;)

Greetings,

Andres Freund

Re: SLRUs in the main buffer pool, redux

2022-01-13 Thread Robert Haas

On Thu, Jan 13, 2022 at 9:00 AM Thomas Munro  wrote:
> I was re-reviewing the proposed batch of GUCs for controlling the SLRU
> cache sizes[1], and I couldn't resist sketching out $SUBJECT as an
> obvious alternative.  This patch is highly experimental and full of
> unresolved bits and pieces (see below for some), but it passes basic
> tests and is enough to start trying the idea out and figuring out
> where the real problems lie.  The hypothesis here is that CLOG,
> multixact, etc data should compete for space with relation data in one
> unified buffer pool so you don't have to tune them, and they can
> benefit from the better common implementation (mapping, locking,
> replacement, bgwriter, checksums, etc and eventually new things like
> AIO, TDE, ...).

I endorse this hypothesis. The performance cliff when the XID range
you're regularly querying exceeds the hardcoded constant is quite
steep, and yet we can't just keep pushing that constant up. Linear
search does not scale well to infinitely large arrays.[citation
needed]

> [ long list of dumpster-fire level problems with the patch ]

Honestly, none of this sounds that bad. I mean, it sounds bad in the
sense that you're going to have to fix all of this somehow and I'm
going to unhelpfully give you no advice whatsoever about how to do
that, but my guess is that a moderate amount of persistence will be
sufficient for you to get the job done. None of it sounds hopeless.

Before fixing all of that, one thing you might want to consider is
whether it uh, works. And by "work" I don't mean "get the right
answer" even though I agree with my esteemed fellow hacker that this
is an important thing to do.[1] What I mean is that it would be good
to see some evidence that the number of buffers that end up being used
to cache any particular SLRU is somewhat sensible, and varies by
workload. For example, consider a pgbench workload. As you increase
the scale factor, the age of the oldest XIDs that you regularly
encounter will also increase, because on the average, the row you're
now updating will not have been updated for a larger number of
transactions. So it would be interesting to know whether all of the
CLOG buffers that are regularly being accessed do in fact remain in
cache - and maybe even whether buffers that stop being regularly
accessed get evicted in the face of cache pressure.

Also, the existing clog code contains a guard that absolutely prevents
the latest CLOG buffer from being evicted. Because - at least in a
pgbench test like the one postulated above, and probably in general -
the frequency of access to older CLOG buffers decays exponentially,
evicting the newest or even the second or third newest CLOG buffer is
really bad. At present, there's a hard-coded guard to prevent the
newest buffer from being evicted, which is a band-aid, but an
effective one. Even with that band-aid, evicting any of the most
recent few can produce a system-wide stall, where every backend ends
up waiting for the evicted buffer to be retrieved. It would be
interesting to know whether the same problem can be recreated with
your patch, because the buffer eviction algorithm for shared buffers
is only a little bit less dumb than the one for SLRUs, and can pretty
commonly devolve into little more than evict-at-random.
Evict-at-random is very bad here, because evicting a hot CLOG page is
probably even worse than evicting, say, a btree root page.

Another interesting test might be one that puts pressure on some other
SLRU, like pg_multixact or pg_subtrans. In general SLRU pages that are
actually being used are hot enough that we should keep them in cache
almost no matter what else is competing for cache space ... but the
number of such pages can be different from one SLRU to another, and
can change over time.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

[1] http://postgr.es/m/3151122.1642086...@sss.pgh.pa.us

Re: Schema variables - new implementation for Postgres 15

2022-01-13 Thread Pavel Stehule

čt 13. 1. 2022 v 18:01 odesílatel Joel Jacobson  napsal:

> On Thu, Jan 13, 2022, at 20:12, Pavel Stehule wrote:
> >I cannot imagine how the "window" keyword can work in SQL context. In
> Javascript "window" is an object - it is not a keyword, and it makes sense
> in usual Javascript context inside HTML browsers.
>
> I was thinking since Javascript is by far the most known programming
> language, the "window" word would be familiar and easy to remember, but I
> agree, it's not perfect.
>

Mainly the "window" is just a global variable. It is not a special keyword.
So the syntax object.property is usual.

> Hm, "values" would be nice, it's reserved in SQL:2016 [1] and in
> DB2/Mimer/MySQL/Oracle/SQL Server/Teradata [2], but unfortunately not in
> PostgreSQL [1], so perhaps not doable.
>
> Syntax:
>
> values.[schema name].[variable name]
>

This doesn't help too much. This syntax is too long. It can solve the
described issue, but only when all three parts will be required, and
writing every time VALUES.schemaname.variablename is not too practical. And
if we require this three part identifier every time, then it can be used
with the already supported dbname.schemaname.varname. Almost all collisions
can be fixed by using a three part identifier. But it doesn't look too
handy.

I like the idea of prioritizing tables over variables with warnings when
collision is detected. It cannot break anything. And it allows to using
short identifiers when there is not collision. If somebody don't want to
any collision then can use schema "vars", "values", or what he/she likes.
It is near to your proposal - it is not too often so people use table alias
like "values" (although in EAV case it is possible).

> [1] https://www.postgresql.org/docs/current/sql-keywords-appendix.html
> [2] https://en.wikipedia.org/wiki/SQL_reserved_words
>
>

Re: Custom Operator for citext LIKE predicates question

2022-01-13 Thread Efrain J. Berdecia

Good points. At least on the limited testing we did, we were able to get the 
same answer back with both executions; at least for the use cases we tested. 
We are still doing more thourough testing.
This is an application that is been ported from MS SQL server to postgres and 
apparently the migration dba team determined citext was the way to go to 
maintain MSSQL existing usage of the data in the columns.

Thanks,Efrain J. Berdecia 

On Thursday, January 13, 2022, 10:10:38 AM EST, Tom Lane 
 wrote:  

 "Efrain J. Berdecia"  writes:
> In our setup it has actually worked per the explains provided making the 
> query run in milliseconds instead of seconds.

To me, "work" includes "get the right answer".  I do not think you
are getting the same answers that citext would normally provide.
If you don't care about case-insensitivity, why don't you just
use plain text?

            regards, tom lane

Re: Avoiding smgrimmedsync() during nbtree index builds

2022-01-13 Thread Robert Haas

On Wed, Sep 29, 2021 at 2:36 PM Melanie Plageman
 wrote:
> unbuffered_write() and unbuffered_extend() might be able to be used even
> if unbuffered_prep() and unbuffered_finish() are not used -- for example
> hash indexes do something I don't entirely understand in which they call
> smgrextend() directly when allocating buckets but then initialize the
> new bucket pages using the bufmgr machinery.

My first thought was that someone might do this to make sure that we
don't run out of disk space after initializing some but not all of the
buckets. Someone might have some reason for wanting to avoid that
corner case. However, in _hash_init() that explanation doesn't make
any sense, because an abort would destroy the entire relation. And in
_hash_alloc_buckets() the variable "zerobuf" points to a buffer that
is not, in fact, all zeroes. So my guess is this is just old, crufty
code - either whatever reasons somebody had for doing it that way are
no longer valid, or there wasn't any good reason even at the time.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Schema variables - new implementation for Postgres 15

2022-01-13 Thread Joel Jacobson

On Thu, Jan 13, 2022, at 20:12, Pavel Stehule wrote:
>I cannot imagine how the "window" keyword can work in SQL context. In 
>Javascript "window" is an object - it is not a keyword, and it makes sense in 
>usual Javascript context inside HTML browsers.

I was thinking since Javascript is by far the most known programming language, 
the "window" word would be familiar and easy to remember, but I agree, it's not 
perfect.

Hm, "values" would be nice, it's reserved in SQL:2016 [1] and in 
DB2/Mimer/MySQL/Oracle/SQL Server/Teradata [2], but unfortunately not in 
PostgreSQL [1], so perhaps not doable.

Syntax:

values.[schema name].[variable name]

[1] https://www.postgresql.org/docs/current/sql-keywords-appendix.html
[2] https://en.wikipedia.org/wiki/SQL_reserved_words

Re: Time to drop plpython2?

2022-01-13 Thread Tom Lane

Peter Eisentraut  writes:
> On 12.01.22 19:49, Tom Lane wrote:
>> Anyway, getting back to the point: I think we should notify the
>> owners ASAP and set a 30-day deadline.

> Sure, let's do that.  I don't have a buildfarm animal these days, so I'm 
> not on that list, so it would be great if you could do that

Done.  I told them "mid February", so we can plan on say the 15th
as the target date for pushing a patch.

I realized BTW that the meson business is not relevant for prairiedog
or gaur.  Those animals will die regardless of python version because
they can't build ninja (for lack of ).  So I think maybe
I'll install python 3.1 and see if that compatibility claim is really
true ;-)

regards, tom lane

Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

2022-01-13 Thread Tomas Vondra

FWIW this is now marked as committed. I've created a separate entry in 
the next CF for the incremental sort part.



https://commitfest.postgresql.org/37/3513/

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Avoiding smgrimmedsync() during nbtree index builds

2022-01-13 Thread Justin Pryzby

On Tue, Jan 11, 2022 at 12:10:54PM -0500, Melanie Plageman wrote:
> On Mon, Jan 10, 2022 at 5:50 PM Melanie Plageman  
> wrote:
> >
> > I have attached a v3 which includes two commits -- one of which
> > implements the directmgr API and uses it and the other which adds
> > functionality to use either directmgr or bufmgr API during index build.
> >
> > Also registering for march commitfest.
> 
> Forgot directmgr.h. Attached v4

Thanks - I had reconstructed it first ;)

I think the ifndef should be outside the includes:

> +++ b/src/include/storage/directmgr.h
..
> +#include "access/xlogdefs.h"
..
> +#ifndef DIRECTMGR_H
> +#define DIRECTMGR_H

This is failing on windows CI when I use initdb --data-checksums, as attached.

https://cirrus-ci.com/task/5612464120266752
https://api.cirrus-ci.com/v1/artifact/task/5612464120266752/regress_diffs/src/test/regress/regression.diffs

+++ c:/cirrus/src/test/regress/results/bitmapops.out2022-01-13 
00:47:46.704621200 +
..
+ERROR:  could not read block 0 in file "base/16384/30310": read only 0 of 8192 
bytes

-- 
Justin
>From e1a88c25725bc5b34ca9deb1a0b785048c0c5c28 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 12 Jan 2022 22:31:09 -0600
Subject: [PATCH 3/3] cirrus: run initdb with --data-checksums for windows

---
 .cirrus.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 19b3737fa11..efedd1f0873 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -390,7 +390,7 @@ task:
 - perl src/tools/msvc/vcregress.pl check parallel
   startcreate_script:
 # paths to binaries need backslashes
-- tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l tmp_check/initdb.log
+- tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l tmp_check/initdb.log --options=--data-checksums
 - echo include '%TEMP_CONFIG%' >> tmp_check/db/postgresql.conf
 - tmp_install\bin\pg_ctl.exe start -D tmp_check/db -l tmp_check/postmaster.log
   test_pl_script:
-- 
2.17.1

Re: UNIQUE null treatment option

2022-01-13 Thread Maxim Orlov

I find this patch useful. It includes changes in documentation and tests.
Code itself looks reasonable to me. Since, unique constraint check is done
by corresponding btree index, it makes this feature implementation
elegant and lightweight.

In my view, it is sufficient that heap relation can have different nulls
treatment in unique constraints for different unique columns. For example:
CREATE TABLE t (i INT UNIQUE NULLS DISTINCT, a INT UNIQUE NULLS NOT
DISTINCT);

All the tests are running ok on Linux and MacOS X.

Although, patch doesn't apply with default git apply options. Only with the
"three way merge" option (-3). Consider rebasing it, please. Then, in my
view, it can be "Ready for committer".
-- 
Best regards,
Maxim Orlov.

Re: tab completion of enum values is broken

2022-01-13 Thread Tom Lane

Peter Eisentraut  writes:
> This doesn't work anymore:
> create type e2 as enum ('foo', 'bar');
> alter type e2 rename value 'b
> This now results in
> alter type e2 rename value 'b'

Ugh.  I'll take a look.

regards, tom lane

Re: Custom Operator for citext LIKE predicates question

2022-01-13 Thread Tom Lane

"Efrain J. Berdecia"  writes:
> In our setup it has actually worked per the explains provided making the 
> query run in milliseconds instead of seconds.

To me, "work" includes "get the right answer".  I do not think you
are getting the same answers that citext would normally provide.
If you don't care about case-insensitivity, why don't you just
use plain text?

regards, tom lane

Re: 2022-01 Commitfest

2022-01-13 Thread Dmitry Dolgov

> On Wed, Jan 12, 2022 at 01:41:42PM +0800, Julien Rouhaud wrote:
> Hi,
>
> The January commitfest should have started almost two weeks ago, but given 
> that
> nothing happened until now I think that it's safe to assume that either
> everyone forgot or no one wanted to volunteer.
>
> I'm therfore volunteering to manage this commitfest, although since it's
> already quite late it's probably going to be a bit chaotic and a best effort,
> but it's better than nothing.

Much appreciated, thanks!

Re: Time to drop plpython2?

2022-01-13 Thread Peter Eisentraut


On 12.01.22 19:49, Tom Lane wrote:

Anyway, getting back to the point: I think we should notify the
owners ASAP and set a 30-day deadline.  We should try to get this
done before the March CF starts, so it's too late for a 60-day
grace period.  In any case, the worst-case scenario for an owner
is to disable --with-python until they have time to do an upgrade,
so it doesn't seem like a month is a big problem.


Sure, let's do that.  I don't have a buildfarm animal these days, so I'm 
not on that list, so it would be great if you could do that

Re: Built-in connection pooler

2022-01-13 Thread Li Japin



On Mar 22, 2021, at 4:59 AM, Zhihong Yu 
mailto:z...@yugabyte.com>> wrote:

Hi,

+  With load-balancing policy postmaster choose 
proxy with lowest load average.
+  Load average of proxy is estimated by number of clients connection 
assigned to this proxy with extra weight for SSL connections.

I think 'load-balanced' may be better than 'load-balancing'.
postmaster choose proxy -> postmaster chooses proxy

+  Load average of proxy is estimated by number of clients connection 
assigned to this proxy with extra weight for SSL connections.

I wonder if there would be a mixture of connections with and without SSL.

+ Terminate an idle connection pool worker after the specified number 
of milliseconds.

Should the time unit be seconds ? It seems a worker would exist for at least a 
second.

+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group

It would be better to update the year in the header.

+* Use then for launching pooler worker backends and report error

Not sure I understand the above sentence. Did you mean 'them' instead of 'then' 
?

Cheers

On Sun, Mar 21, 2021 at 11:32 AM Konstantin Knizhnik 
mailto:knizh...@garret.ru>> wrote:
People asked me to resubmit built-in connection pooler patch to commitfest.
Rebased version of connection pooler is attached.


Hi, hackers

Does the PostgreSQL core do not interested in the built-in connection pool? If 
so, could
somebody tell me why we do not need it? If not, how can we do for this to make 
it in core?

Thanks in advance!

[Sorry if you already receive this email, since I typo an invalid pgsql list 
email address in previous email.]

--
Best regards
Japin Li

Re: Add non-blocking version of PQcancel

2022-01-13 Thread Jelte Fennema

Attached is an updated patch which I believe fixes windows and the other test 
failures.
At least on my machine make check-world passes now when compiled with 
--enable-tap-tests

I also included a second patch which adds some basic documentation for the 
libpq tests.


0001-Add-non-blocking-version-of-PQcancel.patch
Description: 0001-Add-non-blocking-version-of-PQcancel.patch


0002-Add-documentation-for-libpq_pipeline-tests.patch
Description: 0002-Add-documentation-for-libpq_pipeline-tests.patch

Re: Isolation levels on primary and standby

2022-01-13 Thread Bharath Rupireddy

On Thu, Jan 13, 2022 at 4:47 PM RKN Sai Krishna
 wrote:
>
> Hello All,
>
> It looks like we could have different isolation levels on primary and
> standby servers in the context of replication. If the primary crashes
> and a standby server is made as primary, there could be change in
> query results because of isolation levels. Is that expected?

I think it is possible because the standbys are free to use their own
isolation levels for different purposes. During the failover onto the
standby, the code/tool that's triggering the failover will have to
take care of resetting the isolation level back to the crashed
primary. Presently, the standby requires the max_connections,
max_worker_processes, max_wal_senders, max_prepared_transactions and
max_locks_per_transaction (see the code in
CheckRequiredParameterValues) parameters to be the same as with the
primary, otherwise the standby doesn't start. The postgres doesn't
enforce the standby's isolation level with the primary, though.

IIUC, the WAL that gets generated on the primary doesn't depend on its
isolation level, in other words, the WAL records have no information
of the isolation level. It is the MVCC snapshot, that is taken at the
start of the txn, doing the trick for different isolation levels.

Regards,
Bharath Rupireddy.

Re: Schema variables - new implementation for Postgres 15

2022-01-13 Thread Pavel Stehule

čt 13. 1. 2022 v 15:29 odesílatel Joel Jacobson  napsal:

> On Thu, Jan 13, 2022, at 18:24, Dean Rasheed wrote:
> > Those are examples that a malicious user might use, but even without
> > such examples, I think it would be far too easy to inadvertently break
> > a large application by defining a variable that conflicted with a
> > column name you didn't know about.
>
> I think there is also a readability problem with the non-locality of this
> feature.
>
> I think it would be better to have an explicit namespace for these global
> variables, so that when reading code, they would stand-out.
> As a bonus, that would also solve the risk of breaking code, as you
> pointed out.
>
> Most code should never need any global variables at all, so in the rare
> occasions when they are needed, I think it's perfectly fine if some more
> verbose fully-qualified syntax was needed to use them, rather than to
> pollute the namespace and risk breaking code.
>

There are few absolutely valid use cases

1. scripting - currently used GUC instead session variables are slow, and
without types

2. RLS

3. Migration from Oracle - although I agree, so package variables are used
more times badly, it used there. And only in few times is possibility to
refactor code when you do migration from Oracle to Postgres, and there is
necessity to have session variables,


> I want to bring up an idea presented earlier in a different thread:
>
> How about exploiting reserved SQL keywords followed by a dot, as special
> labels?
>
> This could solve the problem with this patch, as well as the other root
> label patch to access function parameters.
>
> It's an unorthodox idea, but due to legacy, I think we need to be
> creative, if we want a safe solution with no risk of breaking any code,
> which I think should be a requirement.
>
> Taking inspiration from Javascript, how about using the SQL reserved
> keyword "window"?
> In Javascript, "window.variableName" means that the variable variableName
> declared at the global scope.
>

I cannot imagine how the "window" keyword can work in SQL context. In
Javascript "window" is an object - it is not a keyword, and it makes sense
in usual Javascript context inside HTML browsers.

Regards

Pavel



>
> Furthermore:
>
> "from" could be used to access function/procedure IN parameters.
> "to" could be used to access function OUT parameters.
> "from" or "to" could be used to access function INOUT parameters.
>
> Examples:
>
> SELECT u.user_id
> INTO to.user_id
> FROM users u
> WHERE u.username = from.username;
>
> -- After authentication, the authenticated user_id could be stored as a
> global variable:
> window.user_id := to.user_id;
>
> -- The authenticated user_id could then be used in queries that should
> filter on user_id:
> SELECT o.order_id
> FROM orders o
> WHERE o.user_id = window.user_id;
>
> This would require endorsement from the SQL committee of course, otherwise
> we would face problems if they suddenly would introduce syntax where a
> reserved keyword could be followed by a dot.
>
> I think from a readability perspective, it works, since the different
> meanings can be distinguished by writing one in UPPERCASE and the other in
> lowercase.
>
> /Joel
>

Re: Schema variables - new implementation for Postgres 15

2022-01-13 Thread Joel Jacobson

On Thu, Jan 13, 2022, at 18:24, Dean Rasheed wrote:
> Those are examples that a malicious user might use, but even without
> such examples, I think it would be far too easy to inadvertently break
> a large application by defining a variable that conflicted with a
> column name you didn't know about.

I think there is also a readability problem with the non-locality of this 
feature.

I think it would be better to have an explicit namespace for these global 
variables, so that when reading code, they would stand-out.
As a bonus, that would also solve the risk of breaking code, as you pointed out.

Most code should never need any global variables at all, so in the rare 
occasions when they are needed, I think it's perfectly fine if some more 
verbose fully-qualified syntax was needed to use them, rather than to pollute 
the namespace and risk breaking code.

I want to bring up an idea presented earlier in a different thread:

How about exploiting reserved SQL keywords followed by a dot, as special labels?

This could solve the problem with this patch, as well as the other root label 
patch to access function parameters.

It's an unorthodox idea, but due to legacy, I think we need to be creative, if 
we want a safe solution with no risk of breaking any code, which I think should 
be a requirement.

Taking inspiration from Javascript, how about using the SQL reserved keyword 
"window"?
In Javascript, "window.variableName" means that the variable variableName 
declared at the global scope.

Furthermore:

"from" could be used to access function/procedure IN parameters.
"to" could be used to access function OUT parameters.
"from" or "to" could be used to access function INOUT parameters.

Examples:

SELECT u.user_id
INTO to.user_id
FROM users u
WHERE u.username = from.username;

-- After authentication, the authenticated user_id could be stored as a global 
variable:
window.user_id := to.user_id;

-- The authenticated user_id could then be used in queries that should filter 
on user_id:
SELECT o.order_id
FROM orders o
WHERE o.user_id = window.user_id;

This would require endorsement from the SQL committee of course, otherwise we 
would face problems if they suddenly would introduce syntax where a reserved 
keyword could be followed by a dot.

I think from a readability perspective, it works, since the different meanings 
can be distinguished by writing one in UPPERCASE and the other in lowercase.

/Joel

Re: Schema variables - new implementation for Postgres 15

2022-01-13 Thread Pavel Stehule

čt 13. 1. 2022 v 13:54 odesílatel Dean Rasheed 
napsal:

> On Wed, 3 Nov 2021 at 13:05, Tomas Vondra 
> wrote:
> >
> > 2) I find this a bit confusing:
> >
> > SELECT non_existent_variable;
> > test=# select s;
> > ERROR:  column "non_existent_variable" does not exist
> > LINE 1: select non_existent_variable;
> >
> > I wonder if this means using SELECT to read variables is a bad idea, and
> > we should have a separate command, just like we have LET (instead of
> > just using UPDATE in some way).
> >
>
> Hmm. This way of reading variables worries me for a different reason
> -- I think it makes it all too easy to break existing applications by
> inadvertently (or deliberately) defining variables that conflict with
> column names referred to in existing queries.
>
> For example, if I define a variable called "relkind", then psql's \sv
> meta-command is broken because the query it performs can't distinguish
> between the column and the variable.
>
> Similarly, there's ambiguity between alias.colname and
> schema.variablename. So, for example, if I do the following:
>
> CREATE SCHEMA n;
> CREATE VARIABLE n.nspname AS int;
>
> then lots of things are broken, including pg_dump and a number of psql
> meta-commands. I don't think it's acceptable to make it so easy for a
> user to break the system in this way.
>
> Those are examples that a malicious user might use, but even without
> such examples, I think it would be far too easy to inadvertently break
> a large application by defining a variable that conflicted with a
> column name you didn't know about.
>

This is a valid issue, and it should be solved, or reduce a risk

I see two possibilities

a) easy solution can be implementation of other conflict strategy -
variables have lower priority than tables with possibility to raise
warnings if some identifiers are ambiguous. This is easy to implement, and
with warning I think there should not be some unwanted surprises for
developers. This is safe in meaning - no variable can break any query.

b) harder implementation (but I long think about it) can be implementation
of schema scope access. It can be used for implementation of schema private
objects. It doesn't solve the described issue, but it can reduce the risk
of collision just for one schema.

Both possibilities can be implemented together - but the @b solution should
be implemented from zero - and it is more generic concept, and then I
prefer @a

Dean, can @a work for you?

Regards

Pavel



> Regards,
> Dean
>

SLRUs in the main buffer pool, redux

2022-01-13 Thread Thomas Munro

Hi,

I was re-reviewing the proposed batch of GUCs for controlling the SLRU
cache sizes[1], and I couldn't resist sketching out $SUBJECT as an
obvious alternative.  This patch is highly experimental and full of
unresolved bits and pieces (see below for some), but it passes basic
tests and is enough to start trying the idea out and figuring out
where the real problems lie.  The hypothesis here is that CLOG,
multixact, etc data should compete for space with relation data in one
unified buffer pool so you don't have to tune them, and they can
benefit from the better common implementation (mapping, locking,
replacement, bgwriter, checksums, etc and eventually new things like
AIO, TDE, ...).

I know that many people have talked about doing this and maybe they
already have patches along these lines too; I'd love to know what
others imagined differently/better.

In the attached sketch, the SLRU caches are psuedo-relations in
pseudo-database 9.  Yeah.  That's a straw-man idea stolen from the
Zheap/undo project[2] (I also stole DiscardBuffer() from there);
better ideas for identifying these buffers without making BufferTag
bigger are very welcome.  You can list SLRU buffers with:

  WITH slru(relfilenode, path) AS (VALUES (0, 'pg_xact'),
  (1, 'pg_multixact/offsets'),
  (2, 'pg_multixact/members'),
  (3, 'pg_subtrans'),
  (4, 'pg_serial'),
  (5, 'pg_commit_ts'),
  (6, 'pg_notify'))
  SELECT bufferid, path, relblocknumber, isdirty, usagecount, pinning_backends
FROM pg_buffercache NATURAL JOIN slru
   WHERE reldatabase = 9
   ORDER BY path, relblocknumber;

Here are some per-cache starter hypotheses about locking that might be
completely wrong and obviously need real analysis and testing.

pg_xact:

I couldn't easily get rid of XactSLRULock, because it didn't just
protect buffers, it's also used to negotiate "group CLOG updates".  (I
think it'd be nice to replace that system with an atomic page update
scheme so that concurrent committers stay on CPU, something like [3],
but that's another topic.)  I decided to try a model where readers
only have to pin the page (the reads are sub-byte values that we can
read atomically, and you'll see a value as least as fresh as the time
you took the pin, right?), but writers have to take an exclusive
content lock because otherwise they'd clobber each other at byte
level, and because they need to maintain the page LSN consistently.
Writing back is done with a share lock as usual and log flushing can
be done consistently.  I also wanted to try avoiding the extra cost of
locking and accessing the buffer mapping table in common cases, so I
use ReadRecentBuffer() for repeat access to the same page (this
applies to the other SLRUs too).

pg_subtrans:

I got rid of SubtransSLRULock because it only protected page contents.
Can be read with only a pin.  Exclusive page content lock to write.

pg_multixact:

I got rid of the MultiXact{Offset,Members}SLRULock locks.  Can be read
with only a pin.  Writers take exclusive page content lock.  The
multixact.c module still has its own MultiXactGenLock.

pg_commit_ts:

I got rid of CommitTsSLRULock since it only protected buffers, but
here I had to take shared content locks to read pages, since the
values can't be read atomically.   Exclusive content lock to write.

pg_serial:

I could not easily get rid of SerialSLRULock, because it protects the
SLRU + also some variables in serialControl.  Shared and exclusive
page content locks.

pg_notify:

I got rid of NotifySLRULock.  Shared and exclusive page content locks
are used for reading and writing.  The module still has a separate
lock NotifyQueueLock to coordinate queue positions.

Some problems tackled incompletely:

* I needed to disable checksums and in-page LSNs, since SLRU pages
hold raw data with no header.  We'd probably eventually want regular
(standard? formatted?) pages (the real work here may be implementing
FPI for SLRUs so that checksums don't break your database on torn
writes).  In the meantime, suppressing those things is done by the
kludge of recognising database 9 as raw data, but there should be
something better than this.  A separate array of size NBuffer holds
"external" page LSNs, to drive WAL flushing.

* The CLOG SLRU also tracks groups of async commit LSNs in a fixed
sized array.  The obvious translation would be very wasteful (an array
big enough for NBuffers * groups per page), but I hope that there is a
better way to do this... in the sketch patch I changed it to use the
single per-page LSN for simplicity (basically group size is 32k
instead of 32...), which is certainly not good enough.

Some stupid problems not tackled yet:

* It holds onto the virtual file descriptor for the last segment
accessed, but there is no invalidation for w

Re: Logical replication timeout problem

2022-01-13 Thread Amit Kapila

On Thu, Jan 13, 2022 at 3:43 PM Fabrice Chapuis  wrote:
>
> first phase: postgres read WAL files and generate 1420 snap files.
> second phase: I guess, but on this point maybe you can clarify, postgres has 
> to decode the snap files and remove them if no statement must be applied on a 
> replicated table.
> It is from this point that the worker process exit after 1 minute timeout.
>

Okay, I think the problem could be that because we are skipping all
the changes of transaction there is no communication sent to the
subscriber and it eventually timed out. Actually, we try to send
keep-alive at transaction boundaries like when we call
pgoutput_commit_txn. The pgoutput_commit_txn will call
OutputPluginWrite->WalSndWriteData. I think to tackle the problem we
need to try to send such keepalives via WalSndUpdateProgress and
invoke that in pgoutput_change when we skip sending the change.

-- 
With Regards,
Amit Kapila.

Re: Non-decimal integer literals

2022-01-13 Thread Peter Eisentraut

Another modest update, because of the copyright year update preventing 
the previous patches from applying cleanly.


I also did a bit of work on the ecpg scanner so that it also handles 
some errors on par with the main scanner.


There is still no automated testing of this in ecpg, but I have a bunch 
of single-line test files that can provoke various errors.  I will keep 
these around and maybe put them into something more formal in the future.



On 30.12.21 10:43, Peter Eisentraut wrote:
There has been some other refactoring going on, which made this patch 
set out of date.  So here is an update.


The old pg_strtouint64() has been removed, so there is no longer a 
naming concern with patch 0001.  That one should be good to go.


I also found that yet another way to parse integers in pg_atoi() has 
mostly faded away in utility, so I removed the last two callers and 
removed the function in 0002 and 0003.


The remaining patches are as before, with some of the review comments 
applied.  I still need to write some lexing unit tests for ecpg, which I 
haven't gotten to yet.  This affects patches 0004 and 0005.


As mentioned before, patches 0006 and 0007 are more feature previews at 
this point.



On 01.12.21 16:47, Peter Eisentraut wrote:

On 25.11.21 18:51, John Naylor wrote:
If we're going to change the comment anyway, "the parser" sounds more 
natural. Aside from that, 0001 and 0002 can probably be pushed now, 
if you like.


done


--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -365,6 +365,10 @@ real ({integer}|{decimal})[Ee][-+]?{digit}+
  realfail1 ({integer}|{decimal})[Ee]
  realfail2 ({integer}|{decimal})[Ee][-+]

+integer_junk {integer}{ident_start}
+decimal_junk {decimal}{ident_start}
+real_junk {real}{ident_start}

A comment might be good here to explain these are only in ECPG for 
consistency with the other scanners. Not really important, though.


Yeah, it's a bit weird that not all the symbols are used in ecpg.  
I'll look into explaining this better.



0006

+{hexfail} {
+ yyerror("invalid hexadecimal integer");
+ }
+{octfail} {
+ yyerror("invalid octal integer");
   }
-{decimal} {
+{binfail} {
+ yyerror("invalid binary integer");
+ }

It seems these could use SET_YYLLOC(), since the error cursor doesn't 
match other failure states:


ok

We might consider some tests for ECPG since lack of coverage has been 
a problem.


right

Also, I'm curious: how does the spec work as far as deciding the year 
of release, or feature-freezing of new items?


The schedule has recently been extended again, so the current plan is 
for SQL:202x with x=3, with feature freeze in mid-2022.


So the feature patches in this thread are in my mind now targeting 
PG15+1.  But the preparation work (up to v5-0005, and some other 
number parsing refactoring that I'm seeing) could be considered for PG15.


I'll move this to the next CF and come back with an updated patch set 
in a little while.



From e7aad2b81e9be2b53dad73c66e692a80fc2f81e1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 30 Dec 2021 10:26:37 +0100
Subject: [PATCH v7 1/7] Move scanint8() to numutils.c

Move scanint8() to numutils.c and rename to pg_strtoint64().  We
already have a "16" and "32" version of that, and the code inside the
functions was aligned, so this move makes all three versions
consistent.  The API is also changed to no longer provide the errorOK
case.  Users that need the error checking can use strtoi64().

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 src/backend/parser/parse_node.c | 12 ++-
 src/backend/replication/pgoutput/pgoutput.c |  9 ++-
 src/backend/utils/adt/int8.c| 90 +
 src/backend/utils/adt/numutils.c| 84 +++
 src/bin/pgbench/pgbench.c   |  4 +-
 src/include/utils/builtins.h|  1 +
 src/include/utils/int8.h| 25 --
 7 files changed, 103 insertions(+), 122 deletions(-)
 delete mode 100644 src/include/utils/int8.h

diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index ba9baf140c..8dd821b761 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -26,7 +26,6 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
-#include "utils/int8.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 #include "utils/varbit.h"
@@ -353,7 +352,6 @@ make_const(ParseState *pstate, A_Const *aconst)
 {
Const  *con;
Datum   val;
-   int64   val64;
Oid typeid;
int typelen;
booltypebyval;
@@ -384,8 +382,15 @@ make_const(ParseState *pstate, A_Const *aconst)
break;
 
case T_Float:
+   {
/* could be an ov

RE: row filtering for logical replication

2022-01-13 Thread houzj.f...@fujitsu.com

On Thursday, January 13, 2022 2:49 PM Peter Smith 
> Thanks for posting the merged v63.
> 
> Here are my review comments for the v63-0001 changes.
> 
> ~~~

Thanks for the comments!

> 1. src/backend/replication/logical/proto.c - logicalrep_write_tuple
> 
>   TupleDesc desc;
> - Datum values[MaxTupleAttributeNumber];
> - bool isnull[MaxTupleAttributeNumber];
> + Datum*values;
> + bool*isnull;
>   int i;
>   uint16 nliveatts = 0;
> 
> Those separate declarations for values / isnull are not strictly
> needed anymore, so those vars could be deleted. IIRC those were only
> added before when there were both slots and tuples. OTOH, maybe you
> prefer to keep it this way just for code readability?

Yes, I prefer the current style for code readability.

> 
> 2. src/backend/replication/pgoutput/pgoutput.c - typedef
> 
> +typedef enum RowFilterPubAction
> +{
> + PUBACTION_INSERT,
> + PUBACTION_UPDATE,
> + PUBACTION_DELETE,
> + NUM_ROWFILTER_PUBACTIONS  /* must be last */
> +} RowFilterPubAction;
> 
> This typedef is not currently used by any of the code.
> 
> So I think choices are:
> 
> - Option 1: remove the typedef, because nobody is using it.
> 
> - Option 2: keep the typedef, but use it! e.g. everywhere there is an
> exprstate array index variable probably it should be declared as a
> 'RowFilterPubAction idx' instead of just 'int idx'.

Thanks, I used the option 1.

> 
> 3. src/backend/replication/pgoutput/pgoutput.c - map_changetype_pubaction
> 
> After this recent v63 refactoring and merging of some APIs it seems
> that the map_changetype_pubaction is now ONLY used by
> pgoutput_row_filter function. So this can now be a static member of
> pgoutput_row_filter function instead of being declared at file scope.
> 

Changed.

> 
> 4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
> comments
> The function header comment says the same thing 2x about the return values.
> 

Changed.

> 
> ~~~
> 
> 5. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
> 
> + ExprContext*ecxt;
> + int filter_index = map_changetype_pubaction[*action];
> +
> + /* *action is already assigned default by caller */
> + Assert(*action == REORDER_BUFFER_CHANGE_INSERT ||
> +*action == REORDER_BUFFER_CHANGE_UPDATE ||
> +*action == REORDER_BUFFER_CHANGE_DELETE);
> +
> 
> Accessing the map_changetype_pubaction array should be done *after* the
> Assert.
> 
> ~~~

Changed.

> 
> 6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
> 
> ExprState *filter_exprstate;
> ...
> filter_exprstate = entry->exprstate[map_changetype_pubaction[*action]];
> 
> Please consider what way you think is best.

Changed as suggested.

> 
> 7. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
> There are several things not quite right with that comment:
> a. I thought now it should refer to "slots" instead of "tuples"
> 
> Suggested replacement comment:

Changed but I prefer "tuple" which is easy to understand.

> ~~~
> 
> 8. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
> 
> + if (!new_slot || !old_slot)
> + {
> + ecxt->ecxt_scantuple = new_slot ? new_slot : old_slot;
> + result = pgoutput_row_filter_exec_expr(entry->exprstate[filter_index],
> +ecxt);
> +
> + FreeExecutorState(estate);
> + PopActiveSnapshot();
> +
> + return result;
> + }
> +
> + tmp_new_slot = new_slot;
> + slot_getallattrs(new_slot);
> + slot_getallattrs(old_slot);
> 
> I think after this "if" condition then the INSERT, DELETE and simple
> UPDATE are already handled. So, the remainder of the code is for
> deciding what update transformation is needed etc.
> 
> I think there should be some block comment somewhere here to make that
> more obvious.

Changed.
> ~~
> 
> 9. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
> 
> Many of the comments in this function are still referring to old/new
> "tuple". Now that all the params are slots instead of tuples maybe now
> all the comments should also refer to "slots" instead of "tuples".
> Please search all the comments - e.g. including all the "Case 1:" ...
> "Case 4:" comments.

I also think tuple still makes sense, so I didn’t change this.

> 
> 10. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter_init
> 
> + int idx_ins = PUBACTION_INSERT;
> + int idx_upd = PUBACTION_UPDATE;
> + int idx_del = PUBACTION_DELETE;
> 
> These variables are unnecessary now... They previously were added only
> as short synonyms because the other enum names were too verbose (e.g.
> REORDER_BUFFER_CHANGE_INSERT) but now that we have shorter enum
> names
> like PUBACTION_INSERT we can just use those names directly
> 
Changed.

> 
> 11. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter_init
> 
> I felt that the code would seem more natural if the
> pgoutput_row_filter_init function came *before* the
> pgoutput_row_filter function in the source code.
> 
Changed.

> 
> 12. src/backend/replication/pgoutput/pgoutput.c - pgoutput_ch

Re: UNIQUE null treatment option

2022-01-13 Thread Pavel Borisov

+1 for commiting this feature. Consider this useful.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

RE: row filtering for logical replication

2022-01-13 Thread houzj.f...@fujitsu.com

On Thur, Jan 13, 2022 5:22 PM Alvaro Herrera  wrote:
> >  /*
> > + * Only 3 publication actions are used for row filtering ("insert",
> > +"update",
> > + * "delete"). See RelationSyncEntry.exprstate[].
> > + */
> > +typedef enum RowFilterPubAction
> > +{
> > +   PUBACTION_INSERT,
> > +   PUBACTION_UPDATE,
> > +   PUBACTION_DELETE,
> > +   NUM_ROWFILTER_PUBACTIONS  /* must be last */ }
> RowFilterPubAction;
> 
> Please do not add NUM_ROWFILTER_PUBACTIONS as an enum value.  It's a bit
> of a lie and confuses things, because your enum now has 4 possible values, not
> 3.  I suggest to #define NUM_ROWFILTER_PUBACTIONS
> (PUBACTION_DELETE+1) instead.
> 
> > +   int idx_ins = PUBACTION_INSERT;
> > +   int idx_upd = PUBACTION_UPDATE;
> > +   int idx_del = PUBACTION_DELETE;
> 
> I don't understand the purpose of these variables; can't you just use the
> constants?

Thanks for the comments !
Changed the code as suggested.

Best regards,
Hou zj

Re: do only critical work during single-user vacuum?

2022-01-13 Thread John Naylor

On Wed, Jan 12, 2022 at 12:26 PM Bossart, Nathan  wrote:
>
> > - For the general case, we would now have the ability to vacuum a
> > table, and possibly have no effect at all. That seems out of place
> > with the other options.
>
> Perhaps a message would be emitted when tables are specified but
> skipped due to the min-xid-age option.
>
> As I've stated upthread, Sawada-san's suggested approach was my
> initial reaction to this thread.  I'm not wedded to the idea of adding
> new options, but I think there are a couple of advantages.  For both
> single-user mode and normal operation (which may be in imminent
> wraparound danger), you could use the same command:
>
> VACUUM (MIN_XID_AGE 16, ...);

My proposed top-level statement can also be used in normal operation,
so the only possible advantage is configurability. But I don't really
see any advantage in that -- I don't think we should be moving in the
direction of adding more-intricate ways to paper over the deficiencies
in autovacuum scheduling. (It could be argued that I'm doing exactly
that in this whole thread, but [imminent] shutdown situations have
other causes besides deficient scheduling.)

> (As an aside, we'd need to figure out how XID and MXID options would
> work together.  Presumably most users would want to OR them.)
>
> This doesn't really tie in super nicely with the failsafe mechanism,
> but adding something like a FAILSAFE option doesn't seem right to me,

I agree -- it would be awkward and messy as an option. However, I see
the same problem with xid/mxid -- I would actually argue they are not
even proper options; they are "selectors". Your comments above about
1) needing to OR them and 2) emitting a message when a VACUUM command
doesn't actually do anything are evidence of that fact.

> The other advantage I see with age-related options is that it can be
> useful for non-imminent-wraparound situations as well.  For example,
> maybe a user just wants to manually vacuum everything (including
> indexes) with an age above 500M on the weekends.

There is already vaccumdb for that, and I think it's method of
selecting tables is sound -- I'm not convinced that pushing table
selection to the server command as "options" is an improvement.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: Schema variables - new implementation for Postgres 15

2022-01-13 Thread Dean Rasheed

On Wed, 3 Nov 2021 at 13:05, Tomas Vondra  wrote:
>
> 2) I find this a bit confusing:
>
> SELECT non_existent_variable;
> test=# select s;
> ERROR:  column "non_existent_variable" does not exist
> LINE 1: select non_existent_variable;
>
> I wonder if this means using SELECT to read variables is a bad idea, and
> we should have a separate command, just like we have LET (instead of
> just using UPDATE in some way).
>

Hmm. This way of reading variables worries me for a different reason
-- I think it makes it all too easy to break existing applications by
inadvertently (or deliberately) defining variables that conflict with
column names referred to in existing queries.

For example, if I define a variable called "relkind", then psql's \sv
meta-command is broken because the query it performs can't distinguish
between the column and the variable.

Similarly, there's ambiguity between alias.colname and
schema.variablename. So, for example, if I do the following:

CREATE SCHEMA n;
CREATE VARIABLE n.nspname AS int;

then lots of things are broken, including pg_dump and a number of psql
meta-commands. I don't think it's acceptable to make it so easy for a
user to break the system in this way.

Those are examples that a malicious user might use, but even without
such examples, I think it would be far too easy to inadvertently break
a large application by defining a variable that conflicted with a
column name you didn't know about.

Regards,
Dean

Re: Custom Operator for citext LIKE predicates question

2022-01-13 Thread Efrain J. Berdecia

Thank you for the feedback.
In our setup it has actually worked per the explains provided making the query 
run in milliseconds instead of seconds.
We weren't sure if this should be something that could be added natively with 
future Postgres deployments.
Thanks,Efrain J. Berdecia 

On Thursday, January 13, 2022, 12:58:27 AM EST, Tom Lane 
 wrote:  

 "Efrain J. Berdecia"  writes:
> After attempting to use gin and gist indexes for our queries that run against 
> citext columns, our team has come up with the following to make our queries 
> run from 2 mins to 25ms;CREATE EXTENSION pg_trgmCREATE EXTENSION btree_gin 
> --may not be needed, checking
> CREATE OPERATOR CLASS gin_trgm_ops_ci_newFOR TYPE citext USING ginASOPERATOR 
> 1 % (text, text),FUNCTION 1 btint4cmp (int4, int4),FUNCTION 2 
> gin_extract_value_trgm (text, internal),FUNCTION 3 gin_extract_query_trgm 
> (text, internal, int2, internal, internal, internal, internal),FUNCTION 4 
> gin_trgm_consistent (internal,int2, text, int4, internal, internal, internal, 
> internal),STORAGE int4;
> ALTER OPERATOR FAMILY gin_trgm_ops_ci_new USING gin ADDOPERATOR 3 ~~ (citext, 
> citext),OPERATOR 4 ~~* (citext, citext);ALTER OPERATOR FAMILY 
> gin_trgm_ops_ci_new USING gin ADDOPERATOR 7 %> (text, text),FUNCTION 6 
> (text,text) gin_trgm_triconsistent (internal, int2, text, int4, internal, 
> internal, internal);

> Our question is, does anyone see any flaw on this? 

Umm ... does it actually work?  I'd expect that you get case-sensitive
comparison behavior in such an index, because those support functions
are for plain text and they're not going to know that you'd like
case-insensitive behavior.

You generally can't make a new gin or gist opclass without actually
writing some C code, because the support functions embody all
the semantics of the operators.

            regards, tom lane

Re: Pluggable toaster

2022-01-13 Thread Nikita Malakhov

Hi all!

Simon, thank you for your review.
I'll try to give a brief explanation on some topics you've mentioned.
My colleagues would correct me if I miss the point and provide some more
details.

>Agreed, Oleg has made some very clear analysis of the value of having
>a higher degree of control over toasting from within the datatype.
Currently we see the biggest flaw in TOAST functionality is that it
does not provide any means for extension and modification except
modifying the core code itself. It is not possible to use any other
TOAST strategy except existing in the core, the same issue is with
assigning different TOAST methods to columns and datatypes.
The main point in this patch is actually to provide an open API and
syntax for creation of new Toasters as pluggable extensions, and
to make an existing (default) toaster to work via this API without
affecting its function. Also, the default toaster is strongly cross-tied
with Heap access, with somewhat unclear code relations (headers,
function locations and calls, etc.) that are not quite good logically
structured and ask to be straightened out.

>In my understanding, we want to be able to
>1. Access data from a toasted object one slice at a time, by using
>knowledge of the structure
>2. If toasted data is updated, then update a minimum number of
>slices(s), without rewriting the existing slices
>3. If toasted data is expanded, then allownew slices to be appended to
>the object without rewriting the existing slices
There are two main ideas behind Pluggable Toaster patch -
First - to provide an extensible API for all Postgres developers, to
be able to develop and plug in custom toasters as independent
extensions for different data types and columns, to use different
toast strategies, access and compression methods, and so on;
Second - to refactor current Toast functionality, to improve Toast
code structure and make it more logically structured and
understandable, to 'detach' default ('generic', as it is currently
named, or maybe the best naming for it to be 'heap') toaster from DBMS
core code, route it through new API and hide all existing internal
specific Toast functionality behind new API.

All the points you mentioned are made available for development by
this patch (and, actually, some are being developed - in the
bytea_appendable_toaster part of this patch or jSONb toaster by Nikita
Glukhov, he could provide much better explanation on this topic).

>> Modification of current toaster for all tasks and cases looks too
>> complex, moreover, it  will not works for  custom data types. Postgres
>> is an extensible database,  why not to extent its extensibility even
>> further, to have pluggable TOAST! We  propose an idea to separate
>> toaster from  heap using  toaster API similar to table AM API etc.
>> Following patches are applicable over patch in [1]

>ISTM that we would want the toast algorithm to be associated with the
>datatype, not the column?
>Can you explain your thinking?
This possibility is considered for future development.

>We already have Expanded toast format, in-memory, which was designed
>specifically to allow us to access sub-structure of the datatype
>in-memory. So I was expecting to see an Expanded, on-disk, toast
>format that roughly matched that concept, since Tom has already shown
>us the way. (varatt_expanded). This would be usable by both JSON and
>PostGIS.
The main disadvantage is that it does not suppose either usage of any
other toasting strategies, or compressions methods except plgz and
lz4.

>Some other thoughts:

>I imagine the data type might want to keep some kind of dictionary
>inside the main toast pointer, so we could make allowance for some
>optional datatype-specific private area in the toast pointer itself,
>allowing a mix of inline and out-of-line data, and/or a table of
>contents to the slices.
It is partly implemented in jSONb custom Toaster, as I mentioned
above, and also could be considered for future improvement of existing
Toaster as an extension.

>I'm thinking could also tackle these things at the same time:
>* We want to expand TOAST to 64-bit pointers, so we can have more
>pointers in a table
This issue is being discussed but not currently implemented, it was
considered as one of the possible future improvements.

>* We want to avoid putting the data length into the toast pointer, so
>we can allow the toasted data to be expanded without rewriting
>everything (to avoid O(N^2) cost)
May I correct you - actual relation is O(N), not O(N^2).
Currently data length is stored outside customized toaster data, in
the varatt_custom structure that is supposed to be used by all custom
(extended) toasters. Data that is specific to some custom Toasted will
be stored inside va_toasterdata structure.

Looking forward to your thoughts on our work.

--
Best regards,
Nikita A. Malakhov

On Wed, Jan 5, 2022 at 5:46 PM Simon Riggs 
wrote:

> On Thu, 30 Dec 2021 at 16:40, Teodor Sigaev  wrote:
>
> > We are working on custom to

Re: Isolation levels on primary and standby

2022-01-13 Thread Andrey Borodin




> 13 янв. 2022 г., в 16:16, RKN Sai Krishna  
> написал(а):
> 
> It looks like we could have different isolation levels on primary and
> standby servers in the context of replication. If the primary crashes
> and a standby server is made as primary, there could be change in
> query results because of isolation levels. Is that expected?

Hi, RKN!

Transaction isolation level can be set at the beginning of each individual 
transaction.
You can read more about transaction isolation level at [0].

There are many settings which can be configured different on Standby. E.g. 
default_transaction_isolation [1]. This is expected behaviour, AFAIK.

Thanks!

Best regards, Andrey Borodin.


[0] https://www.postgresql.org/docs/current/transaction-iso.html
[1] https://www.postgresql.org/docs/current/runtime-config-client.html

Re: Support tab completion for upper character inputs in psql

2022-01-13 Thread Peter Eisentraut


On 07.01.22 06:17, tanghy.f...@fujitsu.com wrote:

On Friday, January 7, 2022 1:08 PM, Japin Li  wrote:

+/*
+ * pg_string_tolower - Fold a string to lower case if the string is not quoted
+ * and only contains ASCII characters.
+ * For German/Turkish etc text, no change will be made.
+ *
+ * The returned value has to be freed.
+ */
+static char *
+pg_string_tolower_if_ascii(const char *text)
+{

s/pg_string_tolower/pg_string_tolower_if_ascii/ for comments.



Thanks for your review.
Comment fixed in the attached V11 patch.


As I just posted over at [0], the tab completion of enum values appears 
to be broken at the moment, so I can't really analyze what impact your 
patch would have on it.  (But it makes me suspicious about the test case 
in your patch.)  I suspect it would treat enum labels as 
case-insensitive, which would be wrong.  But we need to fix that issue 
first before we can proceed here.


The rest of the patch seems ok in principle, since AFAICT enums are the 
only query result in tab-complete.c that are not identifiers and thus 
subject to case issues.


I would perhaps move the pg_string_tolower_if_ascii() calls to before 
escape_string() in each case.  It won't make a difference to the result, 
but it seems conceptually better.



[0]: 
https://www.postgresql.org/message-id/8ca82d89-ec3d-8b28-8291-500efaf23...@enterprisedb.com

tab completion of enum values is broken

2022-01-13 Thread Peter Eisentraut


This doesn't work anymore:

create type e2 as enum ('foo', 'bar');
alter type e2 rename value 'b

This now results in

alter type e2 rename value 'b'

Bisecting blames

commit cd69ec66c88633c09bc9a984a7f0930e09c7c96e
Author: Tom Lane 
Date:   Thu Jan 23 11:07:12 2020 -0500

Improve psql's tab completion for filenames.

which did deal with quoting of things to be completed, so it seems very 
plausible to be some collateral damage.


The queries issued by the completion engine are

bad

LOG:  statement: SELECT pg_catalog.quote_literal(enumlabel)   FROM 
pg_catalog.pg_enum e, pg_catalog.pg_type t  WHERE t.oid = e.enumtypid 
AND substring(pg_catalog.quote_literal(enumlabel),1,1)='b'AND 
(pg_catalog.quote_ident(typname)='e2' OR '"' || typname || 
'"'='e2')AND pg_catalog.pg_type_is_visible(t.oid)	LIMIT 1000


good

LOG:  statement: SELECT pg_catalog.quote_literal(enumlabel)   FROM 
pg_catalog.pg_enum e, pg_catalog.pg_type t  WHERE t.oid = e.enumtypid 
AND substring(pg_catalog.quote_literal(enumlabel),1,2)='''b'AND 
(pg_catalog.quote_ident(typname)='e2' OR '"' || typname || 
'"'='e2')AND pg_catalog.pg_type_is_visible(t.oid)	LIMIT 1000


I tried quickly fiddling with the substring() call to correct for the 
changed offset somehow, but didn't succeed.  Needs more analysis.

Isolation levels on primary and standby

2022-01-13 Thread RKN Sai Krishna

Hello All,

It looks like we could have different isolation levels on primary and
standby servers in the context of replication. If the primary crashes
and a standby server is made as primary, there could be change in
query results because of isolation levels. Is that expected?

Thanks,
RKN

Re: Stream replication test fails of cfbot/windows server 2019

2022-01-13 Thread Pavel Borisov

>
> For the record, cfbot only started running the recovery tests on
> Windows a couple of weeks ago (when the new improved .cirrus.yml
> landed in the tree).  I don't know if it's significant that Pavel's
> patch is failing every time:
>
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3464
>
> ... while one mentioned by Michail has lower frequency random failures:
>
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/2979


Thomas, it's not exactly so. The patch
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3464
failed
on the described reason only once.
Previous redness was due to compiler warnings in previous patch versions.

Furthermore, I've sent an updated contrib patch with very minor
improvements (completely unrelated to stream replication), and now test
passes.
I suppose there's some intermittent problem with windows cfbot
infrastructure which arise randomly sometimes and affects some random
patches being tested. I have feeling that all patches pushed around
yesterday's morning were affected and now everything works good.

It's pity I still have no idea about the source of the problems besides my
look at different cfbot behavior sometimes.
--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: MDAM techniques and Index Skip Scan patch

2022-01-13 Thread Julien Rouhaud

Hi,

On Sat, Oct 23, 2021 at 07:30:47PM +, Floris Van Nee wrote:
> 
> From the patch series above, v9-0001/v9-0002 is the UniqueKeys patch series,
> which focuses on the planner. v2-0001 is Dmitry's patch that extends it to
> make it possible to use UniqueKeys for the skip scan. Both of these are
> unfortunately still older patches, but because they are planner machinery
> they are less relevant to the discussion about the executor here.  Patch
> v2-0002 contains most of my work and introduces all the executor logic for
> the skip scan and hooks up the planner for DISTINCT queries to use the skip
> scan.  Patch v2-0003 is a planner hack that makes the planner pick a skip
> scan on virtually every possibility. This also enables the skip scan on the
> queries above that don't have a DISTINCT but where it could be useful.

The patchset doesn't apply anymore:
http://cfbot.cputube.org/patch_36_1741.log
=== Applying patches on top of PostgreSQL commit ID 
a18b6d2dc288dfa6e7905ede1d4462edd6a8af47 ===
=== applying patch ./v2-0001-Extend-UniqueKeys.patch
[...]
patching file src/include/optimizer/paths.h
Hunk #2 FAILED at 299.
1 out of 2 hunks FAILED -- saving rejects to file 
src/include/optimizer/paths.h.rej

Could you send a rebased version?  In the meantime I will change the status on
the cf app to Waiting on Author.

Re: Add sub-transaction overflow status in pg_stat_activity

2022-01-13 Thread Dilip Kumar

On Tue, Dec 14, 2021 at 6:23 PM Ashutosh Sharma 
wrote:

In the latest patch I have fixed comments given here except a few.

I have looked into the v2 patch and here are my comments:

+   PG_RETURN_INT32(local_beentry->subxact_overflowed);
+}

Should this be PG_RETURN_BOOL instead of PG_RETURN_INT32??

With the new patch this is not relevant because we are returning a tuple.

+{ oid => '6107', descr => 'statistics: cached subtransaction count of
backend',
+  proname => 'pg_stat_get_backend_subxact_count', provolatile => 's',
proparallel => 'r',
+  prorettype => 'int4', proargtypes => 'int4',
+  prosrc => 'pg_stat_get_backend_subxact_count' },
+{ oid => '6108', descr => 'statistics: subtransaction cache of backend
overflowed',
+  proname => 'pg_stat_get_backend_subxact_overflow', provolatile => 's',
proparallel => 'r',
+  prorettype => 'bool', proargtypes => 'int4',
+  prosrc => 'pg_stat_get_backend_subxact_overflow' },

The description says that the two new functions show the statistics for
"cached subtransaction count of backend" and "subtransaction cache of
backend overflowed". But, when these functions are called it shows these
stats for the non-backend processes like checkpointer, walwriter etc as
well. Should that happen?

I am following similar description as pg_stat_get_backend_pid,
pg_stat_get_backend_idset and other relevant functioins.

typedef struct LocalPgBackendStatus
 * not.
 */
TransactionId backend_xmin;
+
+   /*
+* Number of cached subtransactions in the current session.
+*/
+   int subxact_count;
+
+   /*
+* The number of subtransactions in the current session exceeded the
cached
+* subtransaction limit.
+*/
+   bool subxact_overflowed;

All the variables inside this LocalPgBackendStatus structure are prefixed
with "backend" word. Can we do the same for the newly added variables as
well?

Done

+ * Get the xid and xmin, nsubxid and overflow status of the backend.
The

Should this be put as - "xid, xmin, nsubxid and overflow" instead of "xid
and xmin, nsubxid and overflow"?

I missed to fix this one in the last patch so updated again.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From e7dd74b5171327109a3c1ecdb48c104e93630b2b Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Sun, 12 Dec 2021 17:10:55 +0530
Subject: [PATCH v4] Add functions to show subtransaction count and overflow
 status

If there are some backends having a lot of nested subtransaction
or the subtransaction cache is overflowed there is a no way to
detect that.  So this patch is making that easy by providing
function to get that information.
---
 doc/src/sgml/monitoring.sgml| 18 ++
 src/backend/storage/ipc/sinvaladt.c | 13 +++---
 src/backend/utils/activity/backend_status.c |  4 ++-
 src/backend/utils/adt/pgstatfuncs.c | 38 +
 src/include/catalog/pg_proc.dat |  7 ++
 src/include/storage/sinvaladt.h |  4 ++-
 src/include/utils/backend_status.h  | 11 +
 7 files changed, 89 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 62f2a33..4388151 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5482,6 +5482,24 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   

 
+ pg_stat_get_backend_subxact
+
+pg_stat_get_backend_subxact ( integer )
+record
+   
+   
+Returns a record of information about the backend's subtransactions.
+The fields returned are subxact_count identifies
+number of active subtransaction and subxact_overflow
+ shows whether the backend's subtransaction cache is
+overflowed or not.
+   
+   
+  
+
+  
+   
+
  pg_stat_get_backend_userid
 
 pg_stat_get_backend_userid ( integer )
diff --git a/src/backend/storage/ipc/sinvaladt.c b/src/backend/storage/ipc/sinvaladt.c
index cb3ee82..0b7b183 100644
--- a/src/backend/storage/ipc/sinvaladt.c
+++ b/src/backend/storage/ipc/sinvaladt.c
@@ -395,17 +395,20 @@ BackendIdGetProc(int backendID)
 
 /*
  * BackendIdGetTransactionIds
- *		Get the xid and xmin of the backend. The result may be out of date
- *		arbitrarily quickly, so the caller must be careful about how this
- *		information is used.
+ *		Get the xid, xmin, nsubxid and overflow status of the backend. The
+ *		result may be out of date arbitrarily quickly, so the caller must be
+ *		careful about how this information is used.
  */
 void
-BackendIdGetTransactionIds(int backendID, TransactionId *xid, TransactionId *xmin)
+BackendIdGetTransactionIds(int backendID, TransactionId *xid,
+		   TransactionId *xmin, int *nsubxid, bool *overflowed)
 {
 	SISeg	   *segP = shmInvalBuffer;
 
 	*xid = InvalidTransactionId;
 	*xmin = InvalidTransactionId;
+	*nsubxid = 0;
+	*overflowed = false;
 
 	/* Need to lock

Re: Add sub-transaction overflow status in pg_stat_activity

2022-01-13 Thread Dilip Kumar

On Fri, Dec 17, 2021 at 9:32 AM Justin Pryzby  wrote:

> On Fri, Dec 17, 2021 at 09:00:04AM +0530, Dilip Kumar wrote:
> > On Tue, Dec 14, 2021 at 3:57 AM Bossart, Nathan 
> wrote:
> > >
> > > On 12/13/21, 6:30 AM, "Dilip Kumar"  wrote:
> > > > On Tue, Dec 7, 2021 at 11:11 AM Justin Pryzby 
> wrote:
> > > >> Since I think this field is usually not interesting to most users of
> > > >> pg_stat_activity, maybe this should instead be implemented as a
> function like
> > > >> pg_backend_get_subxact_status(pid).
> > > >>
> > > >> People who want to could use it like:
> > > >> SELECT * FROM pg_stat_activity psa,
> pg_backend_get_subxact_status(pid) sub;
> > > >
> > > > I have provided two function, one for subtransaction counts and other
> > > > whether subtransaction cache is overflowed or not, we can use like
> > > > this,  if we think this is better way to do it then we can also add
> > > > another function for the lastOverflowedXid
> > >
> > > The general approach looks good to me.  I think we could have just one
> > > function for all three values, though.
> >
> > If we create just one function then the output type will be a tuple
> > then we might have to add another view on top of that.  Is there any
> > better way to do that?
>
> I don't think you'd need to add a view on top of it.
>
> Compare:
>
> postgres=# SELECT 1, pg_config() LIMIT 1;
>  ?column? | pg_config
> --+
> 1 | (BINDIR,/usr/pgsql-14/bin)
>
> postgres=# SELECT 1, c FROM pg_config() c LIMIT 1;
>  ?column? | c
> --+
> 1 | (BINDIR,/usr/pgsql-14/bin)
>
> postgres=# SELECT 1, c.* FROM pg_config() c LIMIT 1;
>  ?column? |  name  |  setting
> --++---
> 1 | BINDIR | /usr/pgsql-14/bin
>

Okay, that makes sense, I have modified it to make a single function.



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 5e5dd3c6a4056b1ef821fa5d7dccf5044f1fa48c Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Sun, 12 Dec 2021 17:10:55 +0530
Subject: [PATCH v3] Add functions to show subtransaction count and overflow
 status

If there are some backends having a lot of nested subtransaction
or the subtransaction cache is overflowed there is a no way to
detect that.  So this patch is making that easy by providing
function to get that information.
---
 doc/src/sgml/monitoring.sgml| 18 ++
 src/backend/storage/ipc/sinvaladt.c | 13 +++---
 src/backend/utils/activity/backend_status.c |  4 ++-
 src/backend/utils/adt/pgstatfuncs.c | 38 +
 src/include/catalog/pg_proc.dat |  7 ++
 src/include/storage/sinvaladt.h |  4 ++-
 src/include/utils/backend_status.h  | 11 +
 7 files changed, 89 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 62f2a33..4388151 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5482,6 +5482,24 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   

 
+ pg_stat_get_backend_subxact
+
+pg_stat_get_backend_subxact ( integer )
+record
+   
+   
+Returns a record of information about the backend's subtransactions.
+The fields returned are subxact_count identifies
+number of active subtransaction and subxact_overflow
+ shows whether the backend's subtransaction cache is
+overflowed or not.
+   
+   
+  
+
+  
+   
+
  pg_stat_get_backend_userid
 
 pg_stat_get_backend_userid ( integer )
diff --git a/src/backend/storage/ipc/sinvaladt.c b/src/backend/storage/ipc/sinvaladt.c
index cb3ee82..8713c88 100644
--- a/src/backend/storage/ipc/sinvaladt.c
+++ b/src/backend/storage/ipc/sinvaladt.c
@@ -395,17 +395,20 @@ BackendIdGetProc(int backendID)
 
 /*
  * BackendIdGetTransactionIds
- *		Get the xid and xmin of the backend. The result may be out of date
- *		arbitrarily quickly, so the caller must be careful about how this
- *		information is used.
+ *		Get the xid and xmin, nsubxid and overflow status of the backend. The
+ *		result may be out of date arbitrarily quickly, so the caller must be
+ *		careful about how this information is used.
  */
 void
-BackendIdGetTransactionIds(int backendID, TransactionId *xid, TransactionId *xmin)
+BackendIdGetTransactionIds(int backendID, TransactionId *xid,
+		   TransactionId *xmin, int *nsubxid, bool *overflowed)
 {
 	SISeg	   *segP = shmInvalBuffer;
 
 	*xid = InvalidTransactionId;
 	*xmin = InvalidTransactionId;
+	*nsubxid = 0;
+	*overflowed = false;
 
 	/* Need to lock out additions/removals of backends */
 	LWLockAcquire(SInvalWriteLock, LW_SHARED);
@@ -419,6 +422,8 @@ BackendIdGetTransactionIds(int backendID, TransactionId *xid, TransactionId *xmi
 		{
 			*xid = proc->xid;
 			*xmin = proc

Re: Implementing Incremental View Maintenance

2022-01-13 Thread Julien Rouhaud

Hi,

On Thu, Nov 25, 2021 at 04:37:10PM +0900, Yugo NAGATA wrote:
> On Wed, 24 Nov 2021 04:31:25 +
> "r.takahash...@fujitsu.com"  wrote:
> 
> > 
> > I checked the same procedure on v24 patch.
> > But following error occurs instead of the original error.
> > 
> > ERROR:  relation "ivm_t_index" already exists
> 
> Thank you for pointing out it!
> 
> Hmmm, an index is created when IMMV is defined, so CREAE INDEX called
> after this would fail... Maybe, we should not create any index automatically
> if IMMV is created WITH NO DATA.
> 
> I'll fix it after some investigation.

Are you still investigating on that problem?  Also, the patchset doesn't apply
anymore:
http://cfbot.cputube.org/patch_36_2138.log
=== Applying patches on top of PostgreSQL commit ID 
a18b6d2dc288dfa6e7905ede1d4462edd6a8af47 ===
[...]
=== applying patch 
./v24-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch
patching file src/bin/pg_dump/pg_dump.c
Hunk #1 FAILED at 6393.
Hunk #2 FAILED at 6596.
Hunk #3 FAILED at 6719.
Hunk #4 FAILED at 6796.
Hunk #5 succeeded at 14953 (offset -915 lines).
4 out of 5 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.c.rej

There isn't any answer to your following email summarizing the feature yet, so
I'm not sure what should be the status of this patch, as there's no ideal
category for that.  For now I'll change the patch to Waiting on Author on the
cf app, feel free to switch it back to Needs Review if you think it's more
suitable, at least for the design discussion need.

Re: row filtering for logical replication

2022-01-13 Thread Amit Kapila

On Wed, Jan 12, 2022 at 7:19 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wed, Jan 12, 2022 5:38 PM Amit Kapila  wrote:
>
> Attach the v63 patch set which include the following changes.
>

Few comments:
=
1.
+
+ 
+  
+  prqual pg_node_tree
+  
+  Expression tree (in nodeToString()
+  representation) for the relation's qualifying condition
+ 

Let's slightly modify this as: "Expression tree (in
nodeToString() representation) for the relation's
qualifying condition. Null if there is no qualifying condition."

2.
+   A WHERE clause allows simple expressions. The simple
+   expression cannot contain any aggregate or window functions, non-immutable
+   functions, user-defined types, operators or functions.

This part in the docs should be updated to say something similar to
what we have in the commit message for this part or maybe additionally
in some way we can say which other forms of expressions are not
allowed.

3.
+   for which the expression returns
+   false or null will not be published.
+   If the subscription has several publications in which
+   the same table has been published with different WHERE

In the above text line spacing appears a bit odd to me. There doesn't
seem to be a need for extra space after line-2 and line-3 in
above-quoted text.

4.
/*
+ * Return the relid of the topmost ancestor that is published via this

We normally seem to use Returns in similar places.

5.
+ * The simple expression contains the following restrictions:
+ * - User-defined operators are not allowed;
+ * - User-defined functions are not allowed;
+ * - User-defined types are not allowed;
+ * - Non-immutable built-in functions are not allowed;
+ * - System columns are not allowed.

Why system columns are not allowed in the above context?

6.
+static void
+transformPubWhereClauses(List *tables, const char *queryString)

To keep the function naming similar to other nearby functions, it is
better to name this as TransformPubWhereClauses.

7. In AlterPublicationTables(), won't it better if we
transformPubWhereClauses() after
CheckObjSchemaNotAlreadyInPublication() to avoid extra processing in
case of errors.

8.
+ /*
+ * Check if the relation is member of the existing schema in the
+ * publication or member of the schema list specified.
+ */
  CheckObjSchemaNotAlreadyInPublication(rels, schemaidlist,
PUBLICATIONOBJ_TABLE);

I don't see the above comment addition has anything to do with this
patch. Can we remove it?

9.
 CheckCmdReplicaIdentity(Relation rel, CmdType cmd)
 {
  PublicationActions *pubactions;
+ AttrNumber bad_rfcolnum;

  /* We only need to do checks for UPDATE and DELETE. */
  if (cmd != CMD_UPDATE && cmd != CMD_DELETE)
  return;

+ if (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL)
+ return;
+
+ /*
+ * It is only safe to execute UPDATE/DELETE when all columns referenced in
+ * the row filters from publications which the relation is in are valid -
+ * i.e. when all referenced columns are part of REPLICA IDENTITY, or the
+ * table does not publish UPDATES or DELETES.
+ */
+ bad_rfcolnum = GetRelationPublicationInfo(rel, true);

Can we name this variable as invalid_rf_column?

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-01-13 Thread Fabrice Chapuis

first phase: postgres read WAL files and generate 1420 snap files.
second phase: I guess, but on this point maybe you can clarify, postgres
has to decode the snap files and remove them if no statement must be
applied on a replicated table.
It is from this point that the worker process exit after 1 minute timeout.

On Wed, Jan 12, 2022 at 11:54 AM Amit Kapila 
wrote:

> On Tue, Jan 11, 2022 at 8:13 PM Fabrice Chapuis 
> wrote:
>
>> Can you explain why you think this will help in solving your current
>> problem?
>>
>> Indeed your are right this function won't help, we have to look elsewhere.
>>
>> It is still not clear to me why the problem happened? IIUC, after
>> restoring 4096 changes from snap files, we send them to the subscriber, and
>> then apply worker should apply those one by one. Now, is it taking one
>> minute to restore 4096 changes due to which apply worker is timed out?
>>
>> Now I can easily reproduce the problem.
>> In a first phase, snap files are generated and stored in pg_replslot.
>> This process end when1420 files are present in pg_replslots (this is in
>> relation with statements that must be replayed from WAL). In the
>> pg_stat_replication view, the state field is set to *catchup*.
>> In a 2nd phase, the snap files must be decoded. However after one minute
>> (wal_receiver_timeout parameter set to 1 minute) the worker process stop
>> with a timeout.
>>
>>
> What exactly do you mean by the first and second phase in the above
> description?
>
> --
> With Regards,
> Amit Kapila.
>

Re: Add spin_delay() implementation for Arm in s_lock.h

2022-01-13 Thread Blake, Geoff

Tom, Andres,

I spun up a 64-core Graviton2 instance (where I reported seeing improvement 
with this patch) and ran the provided regression test with and without my 
proposed on top of mainline PG.  I ran 4 runs each of 63 workers where we 
should see the most contention and most impact from the patch.  I am reporting 
the average and standard deviation, the average with the patch is 10% lower 
latency, but there is overlap in the standard deviation.  I'll gather 
additional data at lower worker counts and post later to see what the trend is.

Cmd: postgres=# SELECT test_shm_mq_pipelined(16384, 'xyzzy', 1000, workers);

Avg +/- standard dev
63 workers w/o patch: 552443ms +/- 22841ms
63 workers w/   patch: 502727 +/- 45253ms

Best results
w/o patch: 521216ms
w/   patch: 436442ms

Thanks,
Geoff

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2022-01-13 Thread Pavel Borisov

By the way I've forgotten to add one part of my code into the CF patch
related to the treatment of NULL values in checking btree unique
constraints.
PFA v9 of a patch with this minor code and tests additions.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


v9-0001-Add-option-for-amcheck-and-pg_amcheck-to-check-un.patch
Description: Binary data

Re: Add Boolean node

2022-01-13 Thread Pavel Stehule

Hi

po 3. 1. 2022 v 14:18 odesílatel Peter Eisentraut <
peter.eisentr...@enterprisedb.com> napsal:

>
> On 03.01.22 12:04, Peter Eisentraut wrote:
> > On 27.12.21 10:02, Peter Eisentraut wrote:
> >> This patch adds a new node type Boolean, to go alongside the "value"
> >> nodes Integer, Float, String, etc.  This seems appropriate given that
> >> Boolean values are a fundamental part of the system and are used a lot.
> >>
> >> Before, SQL-level Boolean constants were represented by a string with
> >> a cast, and internal Boolean values in DDL commands were usually
> >> represented by Integer nodes.  This takes the place of both of these
> >> uses, making the intent clearer and having some amount of type safety.
> >
> > Here is an update of this patch set based on the feedback.  First, I
> > added a patch that makes some changes in AlterRole() that my original
> > patch might have broken or at least made more confusing.  Unlike in
> > CreateRole(), we use three-valued logic here, so that a variable like
> > issuper would have 0 = no, 1 = yes, -1 = not specified, keep previous
> > value.  I'm simplifying this, by instead using the dissuper etc.
> > variables to track whether a setting was specified.  This makes
> > everything a bit simpler and makes the subsequent patch easier.
> >
> > Second, I added the suggest by Tom Lane to rename to fields in the
> > used-to-be-Value nodes to be different in each node type (ival, fval,
> > etc.).  I agree that this makes things a bit cleaner and reduces the
> > changes of mixups.
> >
> > And third, the original patch that introduces the Boolean node with some
> > small changes based on the feedback.
>
> Another very small update, attempting to appease the cfbot.


This is almost trivial patch

There are not problems with patching, compilation and tests

make check-world passed

There are not objection from me or from community

I'll mark this patch as ready for committer

Regards

Pavel

Re: row filtering for logical replication

2022-01-13 Thread Alvaro Herrera

>  /*
> + * Only 3 publication actions are used for row filtering ("insert", "update",
> + * "delete"). See RelationSyncEntry.exprstate[].
> + */
> +typedef enum RowFilterPubAction
> +{
> + PUBACTION_INSERT,
> + PUBACTION_UPDATE,
> + PUBACTION_DELETE,
> + NUM_ROWFILTER_PUBACTIONS  /* must be last */
> +} RowFilterPubAction;

Please do not add NUM_ROWFILTER_PUBACTIONS as an enum value.  It's a bit
of a lie and confuses things, because your enum now has 4 possible
values, not 3.  I suggest to
#define NUM_ROWFILTER_PUBACTIONS (PUBACTION_DELETE+1)
instead.

> + int idx_ins = PUBACTION_INSERT;
> + int idx_upd = PUBACTION_UPDATE;
> + int idx_del = PUBACTION_DELETE;

I don't understand the purpose of these variables; can't you just use
the constants?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Hay que recordar que la existencia en el cosmos, y particularmente la
elaboración de civilizaciones dentro de él no son, por desgracia,
nada idílicas" (Ijon Tichy)

Re: row filtering for logical replication

2022-01-13 Thread Amit Kapila

On Thu, Jan 13, 2022 at 12:19 PM Peter Smith  wrote:
>
> 7. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
>
> + /*
> + * For the following occasions where there is only one tuple, we can
> + * evaluates the row filter for the not null tuple and return.
> + *
> + * For INSERT: we only have new tuple.
> + *
> + * For UPDATE: if no old tuple, it means none of the replica identity
> + * columns changed and this would reduce to a simple update. we only need
> + * to evaluate the row filter for new tuple.
> + *
> + * FOR DELETE: we only have old tuple.
> + */
>
> There are several things not quite right with that comment:
> a. I thought now it should refer to "slots" instead of "tuples"
>

I feel tuple still makes sense as it makes the comments/code easy to understand.

-- 
With Regards,
Amit Kapila.

Re: ICU for global collation

2022-01-13 Thread Peter Eisentraut


On 11.01.22 12:08, Julien Rouhaud wrote:

So, unless there are concerns, I'm going to see about making a patch to call
pg_newlocale_from_collation() even with the default collation. That would
make the actual feature patch quite a bit smaller, since we won't have to
patch every call site of pg_newlocale_from_collation().

+1 for me!


Here is that patch.

If this is applied, then in my estimation all these hunks will 
completely disappear from the global ICU patch.  So this would be a 
significant win.From 2ee4b77221020e81ec83cf37d36e3955bf619d80 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 13 Jan 2022 09:14:41 +0100
Subject: [PATCH] Call pg_newlocale_from_collation() also with default
 collation

Previously, callers of pg_newlocale_from_collation() did not call it
if the collation was DEFAULT_COLLATION_OID and instead proceeded with
a pg_locale_t of 0.  Instead, now we call it anyway and have it return
0 if the default collation was passed.  It already did this, so we
just have to adjust the callers.  This simplifies all the call sites
and also makes future enhancements easier.

After discussion and testing, the previous comment in pg_locale.c
about avoiding this for performance reasons may have been mistaken
since it was testing a very different patch version way back when.

Discussion: 
https://www.postgresql.org/message-id/ed3baa81-7fac-7788-cc12-41e3f7917...@enterprisedb.com
---
 src/backend/access/hash/hashfunc.c   |  4 +-
 src/backend/regex/regc_pg_locale.c   | 40 ++--
 src/backend/utils/adt/formatting.c   | 96 +---
 src/backend/utils/adt/like.c | 37 ++-
 src/backend/utils/adt/like_support.c | 27 
 src/backend/utils/adt/pg_locale.c|  3 -
 src/backend/utils/adt/varchar.c  | 26 +---
 src/backend/utils/adt/varlena.c  | 34 ++
 8 files changed, 135 insertions(+), 132 deletions(-)

diff --git a/src/backend/access/hash/hashfunc.c 
b/src/backend/access/hash/hashfunc.c
index 0521c69dd5..b57ed946c4 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -278,7 +278,7 @@ hashtext(PG_FUNCTION_ARGS)
 errmsg("could not determine which collation to 
use for string hashing"),
 errhint("Use the COLLATE clause to set the 
collation explicitly.")));
 
-   if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
+   if (!lc_collate_is_c(collid))
mylocale = pg_newlocale_from_collation(collid);
 
if (!mylocale || mylocale->deterministic)
@@ -334,7 +334,7 @@ hashtextextended(PG_FUNCTION_ARGS)
 errmsg("could not determine which collation to 
use for string hashing"),
 errhint("Use the COLLATE clause to set the 
collation explicitly.")));
 
-   if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
+   if (!lc_collate_is_c(collid))
mylocale = pg_newlocale_from_collation(collid);
 
if (!mylocale || mylocale->deterministic)
diff --git a/src/backend/regex/regc_pg_locale.c 
b/src/backend/regex/regc_pg_locale.c
index e0d93eab32..6e84f42cb2 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -231,6 +231,18 @@ static const unsigned char pg_char_properties[128] = {
 void
 pg_set_regex_collation(Oid collation)
 {
+   if (!OidIsValid(collation))
+   {
+   /*
+* This typically means that the parser could not resolve a
+* conflict of implicit collations, so report it that way.
+*/
+   ereport(ERROR,
+   (errcode(ERRCODE_INDETERMINATE_COLLATION),
+errmsg("could not determine which collation to 
use for regular expression"),
+errhint("Use the COLLATE clause to set the 
collation explicitly.")));
+   }
+
if (lc_ctype_is_c(collation))
{
/* C/POSIX collations use this path regardless of database 
encoding */
@@ -240,28 +252,12 @@ pg_set_regex_collation(Oid collation)
}
else
{
-   if (collation == DEFAULT_COLLATION_OID)
-   pg_regex_locale = 0;
-   else if (OidIsValid(collation))
-   {
-   /*
-* NB: pg_newlocale_from_collation will fail if not 
HAVE_LOCALE_T;
-* the case of pg_regex_locale != 0 but not 
HAVE_LOCALE_T does not
-* have to be considered below.
-*/
-   pg_regex_locale = 
pg_newlocale_from_collation(collation);
-   }
-   else
-   {
-   /*
-* This typically means that the parser could not 
resolve a
-* conflict of implicit collations, so

Re: Improve error handling of HMAC computations and SCRAM

2022-01-13 Thread Sergey Shinderuk


On 13.01.2022 10:24, Michael Paquier wrote:

Thanks for the review.  The comments about pg_hmac_ctx->data were
wrong from the beginning, coming, I guess, from one of the earlier
patch versions where this was discussed.  So I have applied that
independently.

I have also spent a good amount of time on that to close the loop and
make sure that no code paths are missing an error context, adjusted a
couple of comments to explain more the role of *errstr in all the
SCRAM routines, and finally applied it on HEAD.

Thanks!

--
Sergey Shinderukhttps://postgrespro.com/

1 2 >

100 matches

Mail list logo