date:20230717

Re: Support logical replication of DDLs

2023-07-17 Thread Michael Paquier

On Tue, Jul 18, 2023 at 02:28:08PM +0900, Masahiko Sawada wrote:
> I've considered some alternative approaches but I prefer the second
> approach. A long test time could not be a big problem unless we run it
> by default. We can prepare a buildfarm animal that is configured to
> run the DDL deparse tests.

An extra option is to have some tests in core, then control their
execution with a new value in PG_TEST_EXTRA so as one has an easy way
to run the tests that a buildfarm machine would run.  We have already
solved any problems related to full pg_regress runs in TAP tests, as
proved by 002_pg_upgrade.pl and 027_stream_regress.pl, but I doubt
that everybody would accept the workload of an extra full run of the
main regression test suite by default for the sake of what's being
developed on this thread.
--
Michael


signature.asc
Description: PGP signature

Re: Add TOAST support for more system tables

2023-07-17 Thread Michael Paquier

On Mon, Jul 17, 2023 at 06:31:04PM -0400, Tom Lane wrote:
> Sofia Kopikova  writes:
> > This patch adds TOAST support for system tables pg_class,
> > pg_attribute and pg_largeobject_metadata, as they include ACL columns,
> > which may be potentially large in size.
> 
> We have been around on this topic before, cf discussion leading up to
> commit 96cdeae07.  Allowing toasted data in pg_class or pg_attribute
> seems quite scary to me because of the potential for recursive access,
> particularly during cache-flush scenarios.  (That is, you need to be
> able to read those catalogs on the way to fetching a toasted value,
> so how can you be sure that doesn't devolve into an infinite loop?)

Yep.  I have something to add here.  The last time I poked at that, I
was wondering about two code paths that have specific comments on this
matter.  Based on my notes:
1) finish_heap_swap() in cluster.c:
 * pg_class doesn't have a toast relation, so we don't need to update the
 * corresponding toast relation. Not that there's little point moving all   

 * relfrozenxid updates here since swap_relation_files() needs to write to
 * pg_class for non-mapped relations anyway.
2) extract_autovac_opts() in autovacuum.c:
 * we acquired the pg_class row.  If pg_class had a TOAST table, this would
 * be a risk; fortunately, it doesn't. 

What has been posted makes zero adjustments in these areas.
--
Michael

signature.asc
Description: PGP signature

Re: Support logical replication of DDLs

2023-07-17 Thread Masahiko Sawada

On Tue, Jul 11, 2023 at 8:01 PM Zhijie Hou (Fujitsu)
 wrote:
>
> Hi,
>
> We have been researching how to create a test that detects failures resulting
> from future syntax changes, where the deparser fails to update properly.
>
> The basic idea comes from what Robert Haas suggested in [1]: when running the
> regression test(tests in parallel_schedule), we replace the executing ddl
> statement with the its deparsed version and execute the deparsed statement, so
> that we can run all the regression with the deparsed statement and can expect
> the output to be the same as the existing expected/*.out. As developers
> typically add new regression tests to test new syntax, so we expect this test
> can automatically identify most of the new syntax changes.
>
> One thing to note is that when entering the event trigger(where the deparsing
> happen), the ddl have already been executed. So we need to get the deparsed
> statement before the execution and replace the current statement with it. To
> achieve this, the current approach is to create another database with deparser
> trigger and in the ProcessUtility hook(e.g. tdeparser_ProcessUtility in the
> patch) we redirect the local statement to that remote database, then the
> statment will be deparsed and stored somewhere, we can query the remote
> database to get the deparsed statement and use it to replace the original
> statment.
>
> The process looks like:
> 1) create another database and deparser event trigger in it before running 
> the regression test.
> 2) run the regression test (the statement will be redirect to the remote 
> database and get the deparsed statement)
> 3) compare the output file with the expected ones.
>
> Attach the POC patch(0004) for this approach. Note that, the new test module 
> only
> test test_setup.sql, create_index.sql, create_table.sql and alter_table.sql as
> we only support deparsing TABLE related command for now. And the current test
> cannot pass because of some bugs in deparser, we will fix these bugs soon.
>
>
> TO IMPROVE:
>
> 1. The current approach needs to handle the ERRORs, WARNINGs, and NOTICEs from
> the remote database. Currently it will directly rethrow any ERRORs encountered
> in the remote database. However, for WARNINGs and NOTICEs, we will only 
> rethrow
> them along with the ERRORs. This is done to prevent duplicate messages in the
> output file during local statement execution, which would be inconsistent with
> the existing expected output. Note that this approach may potentially miss 
> some
> bugs, as there could be additional WARNINGs or NOTICEs caused by the deparser
> in the remote database.
>
> 2. The variable reference and assignment (xxx /gset and select :var_name) will
> not be sent to the server(only the qualified value will be sent), but it's
> possible the variable in another session should be set to a different value, 
> so
> this can cause inconsistent output.
>
> 3 .CREATE INDEX CONCURRENTLY will create an invalid index internally even if 
> it
> reports an ERROR later. But since we will directly rethrow the remote ERROR in
> the main database, we won't execute the "CREATE INDEX CONCURRENTLY" in the 
> main
> database. This means that we cannot see the invalid index in the main 
> database.
>
> To improve the above points, another variety is: run the regression test 
> twice.
> The first run is solely intended to collect all the deparsed statements. We 
> can
> dump these statements from the database and then reload them in the second
> regression run. This allows us to utilize the deparsed statements to replace
> the local statements in the second regression run. This approach does not need
> to handle any remote messages and client variable stuff during execution,
> although it could take more time to finsh the test.
>

I've considered some alternative approaches but I prefer the second
approach. A long test time could not be a big problem unless we run it
by default. We can prepare a buildfarm animal that is configured to
run the DDL deparse tests.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: logicalrep_message_type throws an error

2023-07-17 Thread Masahiko Sawada

On Tue, Jul 18, 2023 at 12:15 PM Amit Kapila  wrote:
>
> On Mon, Jul 17, 2023 at 7:54 PM Alvaro Herrera  
> wrote:
> >
> > On 2023-Jul-17, Ashutosh Bapat wrote:
> >
> > > Prologue of psprintf() says
> > >
> > > * Errors are not returned to the caller, but are reported via elog(ERROR)
> > > * in the backend, or printf-to-stderr-and-exit() in frontend builds.
> > > * One should therefore think twice about using this in libpq.
> > >
> > > If an error occurs in psprintf(), it will throw an error which will
> > > override the original error. I think we should avoid any stuff that
> > > throws further errors.
> >
> > Ooh, yeah, this is an excellent point.  I agree it would be better to
> > avoid psprintf() here and anything that adds more failure modes.
> >
>
> I have tried to check whether we have such usage in any other error
> callbacks. Though I haven't scrutinized each and every error callback,
> I found a few of them where an error can be raised. For example,
>
> rm_redo_error_callback()->initStringInfo()
> CopyFromErrorCallback()->limit_printout_length()
> shared_buffer_write_error_callback()->relpathperm()->relpathbackend()->GetRelationPath()->psprintf()
>
> >  Let's
> > just do the thing in the original patch you submitted, to ensure all
> > these strings are compile-time constants; that's likely the most robust.
> >

Or can we use snprintf() writing "??? (%d)" to a fixed length char[8 +
11] allocated on the stack instead?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Inefficiency in parallel pg_restore with many tables

2023-07-17 Thread Nathan Bossart

On Sun, Jul 16, 2023 at 08:54:24PM -0700, Nathan Bossart wrote:
> This seems worth a try.  IIUC you are suggesting making binaryheap.c
> frontend-friendly and expanding its API a bit.  If no one has volunteered,
> I could probably hack something together.

I spent some time on the binaryheap changes.  I haven't had a chance to
plug it into the ready_list yet.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/lib/binaryheap.c b/src/backend/lib/binaryheap.c
index 1737546757..36b6f5c51f 100644
--- a/src/backend/lib/binaryheap.c
+++ b/src/backend/lib/binaryheap.c
@@ -11,10 +11,17 @@
  *-
  */
 
+#ifndef FRONTEND
 #include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
 
 #include 
 
+#ifdef FRONTEND
+#include "common/logging.h"
+#endif
 #include "lib/binaryheap.h"
 
 static void sift_down(binaryheap *heap, int node_off);
@@ -35,7 +42,11 @@ binaryheap_allocate(int capacity, binaryheap_comparator compare, void *arg)
 	binaryheap *heap;
 
 	sz = offsetof(binaryheap, bh_nodes) + sizeof(Datum) * capacity;
+#ifdef FRONTEND
+	heap = (binaryheap *) pg_malloc(sz);
+#else
 	heap = (binaryheap *) palloc(sz);
+#endif
 	heap->bh_space = capacity;
 	heap->bh_compare = compare;
 	heap->bh_arg = arg;
@@ -109,7 +120,11 @@ void
 binaryheap_add_unordered(binaryheap *heap, Datum d)
 {
 	if (heap->bh_size >= heap->bh_space)
+#ifdef FRONTEND
+		pg_fatal("out of binary heap slots");
+#else
 		elog(ERROR, "out of binary heap slots");
+#endif
 	heap->bh_has_heap_property = false;
 	heap->bh_nodes[heap->bh_size] = d;
 	heap->bh_size++;
@@ -141,7 +156,11 @@ void
 binaryheap_add(binaryheap *heap, Datum d)
 {
 	if (heap->bh_size >= heap->bh_space)
+#ifdef FRONTEND
+		pg_fatal("out of binary heap slots");
+#else
 		elog(ERROR, "out of binary heap slots");
+#endif
 	heap->bh_nodes[heap->bh_size] = d;
 	heap->bh_size++;
 	sift_up(heap, heap->bh_size - 1);
@@ -196,6 +215,65 @@ binaryheap_remove_first(binaryheap *heap)
 	return result;
 }
 
+/*
+ * binaryheap_remove
+ *
+ * Removes the given datum from the heap.  The caller must ensure that the
+ * datum exists in the heap.  Always O(n).
+ */
+void
+binaryheap_remove(binaryheap *heap, Datum d)
+{
+	Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
+
+	for (int i = 0; i < heap->bh_size; i++)
+	{
+		if (heap->bh_compare(heap->bh_nodes[i],
+			 d,
+			 heap->bh_arg) == 0)
+		{
+			binaryheap_remove_node(heap, >bh_nodes[i]);
+			return;
+		}
+	}
+
+#ifdef FRONTEND
+	pg_fatal("datum not found in heap");
+#else
+	elog(ERROR, "datum not found in heap");
+#endif
+}
+
+/*
+ * binaryheap_remove_node
+ *
+ * Removes the given node from the heap.  The caller must ensure that the node
+ * exists in the heap.  O(log n) worst case.
+ */
+void
+binaryheap_remove_node(binaryheap *heap, Datum *n)
+{
+	int			cmp;
+
+	Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
+	Assert(n >= >bh_nodes[0]);
+	Assert(n <= >bh_nodes[heap->bh_size - 1]);
+
+	/* compare last node to the one that is being removed */
+	cmp = heap->bh_compare(heap->bh_nodes[--heap->bh_size],
+		   *n,
+		   heap->bh_arg);
+
+	/* remove the last node, placing it in the vacated entry */
+	*n = heap->bh_nodes[heap->bh_size];
+
+	/* sift as needed to preserve the heap property */
+	if (cmp > 0)
+		sift_up(heap, n - >bh_nodes[0]);
+	else if (cmp < 0)
+		sift_down(heap, n - >bh_nodes[0]);
+}
+
 /*
  * binaryheap_replace_first
  *
diff --git a/src/include/lib/binaryheap.h b/src/include/lib/binaryheap.h
index 52f7b06b25..411f7358ba 100644
--- a/src/include/lib/binaryheap.h
+++ b/src/include/lib/binaryheap.h
@@ -47,6 +47,8 @@ extern void binaryheap_build(binaryheap *heap);
 extern void binaryheap_add(binaryheap *heap, Datum d);
 extern Datum binaryheap_first(binaryheap *heap);
 extern Datum binaryheap_remove_first(binaryheap *heap);
+extern void binaryheap_remove(binaryheap *heap, Datum d);
+extern void binaryheap_remove_node(binaryheap *heap, Datum *n);
 extern void binaryheap_replace_first(binaryheap *heap, Datum d);
 
 #define binaryheap_empty(h)			((h)->bh_size == 0)

Volatile write caches on macOS and Windows, redux

2023-07-17 Thread Thomas Munro

Hi,

Continuing a topic from earlier threads[1][2], I've been wondering
about how to de-klugify wal_sync_method=fsync_writethrough (a setting
that actually affects much more than just WAL), and how to do the
right thing for our users on macOS and Windows by default.  Commit
d0c28601 was a very small cleanup in this area.  Here are some bigger
ideas I'd like to try out.

Short version:

 * Make wal_sync_method=fdatasync the default everywhere
 * Drop wal_sync_method=fsync_writethrough
 * Add new macOS-only level for the fsync GUC: fsync=full
 * Make fsync=full redirect both pg_fsync() and pg_fdatasync()
 * Make fsync=full the default on macOS

Motivation:

I think expectations might have changed quite a bit since ~2000.  Back
then, fsync() didn't flush write caches on any OS (you were supposed
to use battery-backed controllers and SCSI as found on expensive
proprietary Unix systems if you were serious, IDE/ATA protocols didn't
originally have flush commands, and some consumer drives famously
ignored them or lied, so users of cheap drives were advised to turn
write caches off).  Around 2005, Linux decided to start sending the
flush command in fsync().  Windows' FlushFileBuffers() does the same,
and I gathered from Raymond Chen's blog that by the Windows 8
timeframe all consumer drive vendors supported and respected the flush
command.  macOS *still* doesn't send it for fsync(), but has had
fcntl(F_FULLFSYNC) since 2003.  In Apple's defence, they seem to have
been ahead of the curve on this problem[3]... I suppose they didn't
anticipate that everyone else was going to do it in their main
fsync()/fdatasync() call, they blazed their own trail, and now it all
looks a bit weird.

In other words, back then all systems running PostgreSQL risked data
loss unless you had fancy hardware or turned off unsafe caching.  But
now, due to the changing landscape and our policy choices, that is
true only for rarer systems by default while most in our community are
on Linux where this is all just a historical footnote.  People's
baseline expectations have moved, and although we try to document the
situation, they are occasionally very surprised: "Loaded footgun
open_datasync on Windows" was Laurenz Albe's reaction[4] to those
paragraphs.  Surely we should be able to recover after power loss by
default even on a lowly desktop PC or basic server loaded with SATA
drives, out of the box?

Proposal for Windows:

The existing default use of FILE_FLAG_WRITE_THROUGH is probably a
better choice on hardware where it works reliably (cache disabled,
non-volatile cache, or working FUA support), since it skips a system
call and doesn't wait for incidental other stuff in the cache to
flush, but it's well documented that Windows' SATA drivers neither
pass the "FUA" flag down to the device nor fall back to sending a full
cache flush command.  It's also easy to see in the pg_test_fsync
numbers, which are too good to be true on consumer gear.  Therefore
wal_sync_method=fdatasync is a better default level.  We map that to
NtFlushBuffersFileEx(FLUSH_FLAGS_FILE_DATA_SYNC_ONLY).  (The "SYNC" in
that flag name means flush the drive cache; the "DATA...ONLY" in that
flag name means skip non-essential stuff like file modification time
etc just like fdatasync() in POSIX, and goes visibly faster thanks to
not journaling metadata.)

Proposal for macOS:

Our current default isn't nice to users who run a database on
mains-powered Macs.  I don't have one myself to try it, but "man
fsync" clearly states that you can lose data and it is easily
demonstrated with a traditional cord-yanking test[5].  You could
certainly lose some recent commits; you could probably also get more
subtle corruption or a total recovery failure like [6] too, if for
example the control file can make it to durable storage and while
pointing to a checkpoint that did not (maybe a ZFS-like atomic
root-switch prevents that sort of disorder in APFS, I dunno, but I
read some semi-informed speculation that it doesn't work that way
*shrug*).

We do currently offer a non-default setting
wal_sync_method=fsync_writethough to address all this already.
Despite its name, it affects every caller of pg_fsync() (control file,
data files, etc).  It's certainly essential to flush all those files
fully too as part of our recovery protocol, but they're not "WAL".
The new idea here is to provide a separate way of controlling that
global behaviour, and I propose fsync=full.  Furthermore, I think that
setting should also affect pg_fdatasync(), given that Apple doesn't
even really have fdatasync() (perhaps if they carry out their threat
to implement it, they'll also invent F_FULLFDATASYNC; for now it
*seems* to be basically just another name for fsync() albeit
undeclared by ).

It's possible that fcntl(F_FULLFSYNC) might fail with ENOSUPP or other
errors in obscure cases (eg unusual file systems).  In that case, you
could manually lower fsync to just "on" and do your own research on
whether power loss can toast

Re: logicalrep_message_type throws an error

2023-07-17 Thread Amit Kapila

On Mon, Jul 17, 2023 at 7:54 PM Alvaro Herrera  wrote:
>
> On 2023-Jul-17, Ashutosh Bapat wrote:
>
> > Prologue of psprintf() says
> >
> > * Errors are not returned to the caller, but are reported via elog(ERROR)
> > * in the backend, or printf-to-stderr-and-exit() in frontend builds.
> > * One should therefore think twice about using this in libpq.
> >
> > If an error occurs in psprintf(), it will throw an error which will
> > override the original error. I think we should avoid any stuff that
> > throws further errors.
>
> Ooh, yeah, this is an excellent point.  I agree it would be better to
> avoid psprintf() here and anything that adds more failure modes.
>

I have tried to check whether we have such usage in any other error
callbacks. Though I haven't scrutinized each and every error callback,
I found a few of them where an error can be raised. For example,

rm_redo_error_callback()->initStringInfo()
CopyFromErrorCallback()->limit_printout_length()
shared_buffer_write_error_callback()->relpathperm()->relpathbackend()->GetRelationPath()->psprintf()

>  Let's
> just do the thing in the original patch you submitted, to ensure all
> these strings are compile-time constants; that's likely the most robust.
>

So will we be okay with something like the below:

ERROR:  invalid logical replication message type "??? (88)"
CONTEXT:  processing remote data for replication origin "pg_16638"
during message type "???" in transaction 796, finished at
0/1626698

-- 
With Regards,
Amit Kapila.

RE: Partial aggregates pushdown

2023-07-17 Thread fujii.y...@df.mitsubishielectric.co.jp

Hi Mr.Pyhalov.

> From: Alexander Pyhalov 
> Sent: Friday, July 14, 2023 10:40 PM
> 1) In foreign_join_ok() should we set fpinfo->user if
> fpinfo->check_partial_aggregate_support is set like it's done for 
> fpinfo->use_remote_estimate? It seems we can end up with fpinfo->user 
> fpinfo->=
> NULL if use_remote_estimate is not set.
You are right. I will modify this patch according to your advice.
Thank you for advice.

> 2) It seeems we found an additional issue with original patch, which 
> is present in current one. I'm attaching a patch which seems to fix 
> it, but I'm not quite sure in it.
Thank you for pointing out the issue.
If a query's group-by clause contains variable based expression(not variable)
and the query's select clause contains another expression,
the partial aggregate could be unsafe to push down.

An example of such queries:
SELECT (b/2)::numeric, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2

Your patch disables partial aggregate pushdown for such queries.
I'll see if we can modify the patch to safely do a partial aggregate pushdown 
for such queries as well.
Such a query expects the variable in the select clause expression to be 
included in the target of the grouped rel
(let see make_partial_grouping_target), 
but the original groupby clause has no reference to this variable,
this seems to be the direct cause(let see foreign_grouping_ok). 
I will examine whether a safe pushdown can be achieved by matching the
groupby clause information referenced by foreign_grouping_ok with the grouped 
rel target information.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-07-17 Thread Peter Smith

On Tue, Jul 18, 2023 at 11:25 AM Peter Smith  wrote:
>
> On Tue, Jul 18, 2023 at 1:54 AM Melih Mutlu  wrote:
> >
> > Hi,
> >
> > PFA updated patches. Rebased 0003 with minor changes. Addressed Peter's 
> > reviews for 0001 and 0002 with some small comments below.
> >
>
> Thanks, I will take another look at these soon. FYI, the 0001 patch
> does not apply cleanly. It needs to be rebased again because
> get_worker_name() function was recently removed from HEAD.
>

Sorry, to be more correct -- it applied OK, but failed to build.

> replication/logical/worker.o: In function `InitializeLogRepWorker':
> /home/postgres/oss_postgres_misc/src/backend/replication/logical/worker.c:4605:
> undefined reference to `get_worker_name'
>
> --
> Kind Regards,
> Peter Smith.
> Fujitsu Australia

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-07-17 Thread Peter Smith

On Tue, Jul 18, 2023 at 1:54 AM Melih Mutlu  wrote:
>
> Hi,
>
> PFA updated patches. Rebased 0003 with minor changes. Addressed Peter's 
> reviews for 0001 and 0002 with some small comments below.
>

Thanks, I will take another look at these soon. FYI, the 0001 patch
does not apply cleanly. It needs to be rebased again because
get_worker_name() function was recently removed from HEAD.

replication/logical/worker.o: In function `InitializeLogRepWorker':
/home/postgres/oss_postgres_misc/src/backend/replication/logical/worker.c:4605:
undefined reference to `get_worker_name'

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: CommandStatus from insert returning when using a portal.

2023-07-17 Thread David G. Johnston

On Wed, Jul 12, 2023 at 2:49 PM Tom Lane  wrote:

> Dave Cramer  writes:
> > Obviously I am biased by the JDBC API which would like to have
> > PreparedStatement.execute() return the number of rows inserted
> > without having to wait to read all of the rows returned
>
> Umm ... you do realize that we return the rows on-the-fly?
> The server does not know how many rows got inserted/returned
> until it's run the query to completion, at which point all
> the data has already been sent to the client
>

Doesn't this code contradict that statement?

src/backend/tcop/pquery.c
/*
* If we have not yet run the command, do so, storing its
* results in the portal's tuplestore.  But we don't do that
* for the PORTAL_ONE_SELECT case.
*/
if (portal->strategy != PORTAL_ONE_SELECT && !portal->holdStore)
FillPortalStore(portal, isTopLevel);
/*
* Now fetch desired portion of results.
*/
nprocessed = PortalRunSelect(portal, true, count, dest);

Not sure we'd want to lock ourselves into this implementation but at least
as it stands now we could send a message with the portal size after calling
FillPortalStore and prior to calling PortalRunSelect.

David J.

Re: Dumping policy on a table belonging to an extension is expected?

2023-07-17 Thread Stephen Frost

Greetings,

* Amul Sul (sula...@gmail.com) wrote:
> I have a ROW LEVEL SECURITY policy on the table part of an extension, and
> while
> dumping the database where that extension is installed, dumps the policy of
> that table too even though not dumpling that table .
> 
> Here is quick tests, where I have added following SQL to adminpack--1.0.sql
> extension file:
> 
> CREATE TABLE public.foo (i int CHECK(i > 0));
> ALTER TABLE public.foo ENABLE ROW LEVEL SECURITY;
> CREATE POLICY foo_policy ON public.foo USING (true);
> 
> After installation and creation of this extension, the dump output will have
> policy without that table:
> 
> --
> -- Name: foo; Type: ROW SECURITY; Schema: public; Owner: amul
> --
> 
> ALTER TABLE public.foo ENABLE ROW LEVEL SECURITY;
> 
> --
> -- Name: foo foo_policy; Type: POLICY; Schema: public; Owner: amul
> --
> 
> CREATE POLICY foo_policy ON public.foo USING (true);
> 
> 
> I am not sure if that is expected behaviour. The code comment in
> checkExtensionMembership() seems to be doing intentionally:
> 
>  * In 9.6 and above, mark the member object to have any non-initial ACL,
>  * policies, and security labels dumped.
> 
> The question is why were we doing this? Shouldn't skip this policy if it is
> part of the create-extension script?
> 
> Also, If you try to drop this policy, get dropped without any warning/error
> unlike tables or other objects which are not allowed to drop at all.

At least at the time, it wasn't really envisioned that policies would be
part of an extension's creation script.  That was probably short-sighted
and it'd be better if we handled that cleanly, but to do so we'd need
something akin to pg_init_privs where we track what policies are created
as part of the creation script vs. what are created afterwards and then
dump out the post-installation policy changes (note that we'd need to
track if any installation-time policies were dropped or changed too...)
as part of the pg_dump.

It'd be helpful if you could maybe provide some use-cases around this?
Permission changes such as those handled by pg_init_privs are a bit more
understandable since an extension script might want to revoke access
from PUBLIC for functions or to GRANT access to PUBLIC for other things,
or to leverage predefined roles, but how does that work for policies?
Most extensions aren't likely to be creating their own roles or
depending on non-predefined roles to already exist in the system as
otherwise they'd end up failing to install if those roles didn't
exist...

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Atomic ops for unlogged LSN

2023-07-17 Thread Andres Freund

Hi,

On 2023-07-17 16:15:52 -0700, Nathan Bossart wrote:
> On Mon, Jul 17, 2023 at 07:08:03PM -0400, Stephen Frost wrote:
> > Awesome.  Was there any other feedback on this change which basically is
> > just getting rid of a spinlock and replacing it with using atomics?
> > It's still in needs-review status but there's been a number of
> > comments/reviews (drive-by, at least) but without any real ask for any
> > changes to be made.
> 
> LGTM

Why don't we just use a barrier when around reading the value? It's not like
CreateCheckPoint() is frequent?

Greetings,

Andres Freund

Re: Add TOAST support for more system tables

2023-07-17 Thread David Rowley

On Tue, 18 Jul 2023 at 10:31, Tom Lane  wrote:
> I wonder whether we'd be better off shoving the ACL data out of
> these catalogs and putting it somewhere else (compare pg_attrdef).

relpartbound is another column that could cause a pg_class row to grow
too large.  I did have a patch [1] to move that column into
pg_partition. I imagine it's very bit rotted now.

David

[1] 
https://postgr.es/m/CAKJS1f9QjUwQrio20Pi%3DyCHmnouf4z3SfN8sqXaAcwREG6k0zQ%40mail.gmail.com

Re: unrecognized node type while displaying a Path due to dangling pointer

2023-07-17 Thread David Rowley

On Mon, 17 Jul 2023 at 15:31, Tom Lane  wrote:
> > I also didn't do anything about ExtensibleNode types. I assume just
> > copying the ExtensibleNode isn't good enough. To flat copy the actual
> > node I think would require adding a new function to
> > ExtensibleNodeMethods.
>
> Yeah, the problem I've got with this approach is that flat-copying
> FDW and Custom paths would require extending the respective APIs.
> While that's a perfectly reasonable ask if we only need to do this
> in HEAD, it would be a nonstarter for released branches.  Is it
> okay to only fix this issue in HEAD?

CustomPaths, I didn't think about those. That certainly makes it more
complex.  I also now see the header comment for struct CustomPath
mentioning that we don't copy Paths:

 * Core code must avoid assuming that the CustomPath is only as large as
 * the structure declared here; providers are allowed to make it the first
 * element in a larger structure.  (Since the planner never copies Paths,
 * this doesn't add any complication.)  However, for consistency with the
 * FDW case, we provide a "custom_private" field in CustomPath; providers
 * may prefer to use that rather than define another struct type.

Are there any legitimate reasons to look at the input_rel's pathlist
again aside from debugging? I can't think of any. Perhaps back
branches can be fixed by just emptying the path lists and NULLifying
the cheapest paths as you mentioned last week.

> > I was also unsure what we should do when shallow copying a List.
>
> The proposal is to shallow-copy a Path node.  List is not a kind
> of Path, so how does List get into it?  (Lists below Paths would
> not get copied, by definition.)

The patch contained infrastructure to copy any Node type. Not just
Paths. Perhaps that's more than what's needed, but it seemed more
effort to limit it just to Path types than to make it "work" for all
Node types.

David.

Re: Atomic ops for unlogged LSN

2023-07-17 Thread Nathan Bossart

On Mon, Jul 17, 2023 at 07:08:03PM -0400, Stephen Frost wrote:
> Awesome.  Was there any other feedback on this change which basically is
> just getting rid of a spinlock and replacing it with using atomics?
> It's still in needs-review status but there's been a number of
> comments/reviews (drive-by, at least) but without any real ask for any
> changes to be made.

LGTM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Atomic ops for unlogged LSN

2023-07-17 Thread Stephen Frost

Greetings,

* Nathan Bossart (nathandboss...@gmail.com) wrote:
> On Mon, Jun 12, 2023 at 07:24:18PM -0400, Stephen Frost wrote:
> > * Nathan Bossart (nathandboss...@gmail.com) wrote:
> >> Is it?  I see uses in GiST indexing (62401db), so it's not immediately
> >> obvious to me how it is debugging-only.  If it is, then I think this patch
> >> ought to clearly document it so that nobody else tries to use it for
> >> non-debugging-only stuff.
> > 
> > I don't see it as a debugging value.  I'm not sure where that came
> > from..?  We do use it in places and if anything, I expect it to be used
> > more, not less, in the future as a persistant generally increasing
> > value (could go backwards on a crash-restart but in such case all
> > unlogged tables are truncated).
> 
> This is my understanding as well.
> 
> >> My concern would be whether GetFakeLSNForUnloggedRel or CreateCheckPoint
> >> might see an old value of unloggedLSN.  From the following note in
> >> README.barrier, it sounds like this would be a problem even if we ensured
> >> full barrier semantics:
> 
> Never mind.  I think I'm forgetting that the atomics support in Postgres
> deals with cache coherency.
> 
> > The concern around unlogged LSN, imv anyway, is less about shared memory
> > access and making sure that all callers understand that it can move
> > backwards on a crash/restart.  I don't think that's an issue for current
> > users but we just need to make sure to try and comment sufficiently
> > regarding that such that new users understand that about this particular
> > value.
> 
> Right.

Awesome.  Was there any other feedback on this change which basically is
just getting rid of a spinlock and replacing it with using atomics?
It's still in needs-review status but there's been a number of
comments/reviews (drive-by, at least) but without any real ask for any
changes to be made.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: Add TOAST support for more system tables

2023-07-17 Thread Tom Lane

Sofia Kopikova  writes:
> This patch adds TOAST support for system tables pg_class,
> pg_attribute and pg_largeobject_metadata, as they include ACL columns,
> which may be potentially large in size.

We have been around on this topic before, cf discussion leading up to
commit 96cdeae07.  Allowing toasted data in pg_class or pg_attribute
seems quite scary to me because of the potential for recursive access,
particularly during cache-flush scenarios.  (That is, you need to be
able to read those catalogs on the way to fetching a toasted value,
so how can you be sure that doesn't devolve into an infinite loop?)

I wonder whether we'd be better off shoving the ACL data out of
these catalogs and putting it somewhere else (compare pg_attrdef).

regards, tom lane

Re: ObjectIdGetDatum() missing from SearchSysCache*() callers

2023-07-17 Thread Michael Paquier

On Mon, Jul 17, 2023 at 05:33:42PM +0300, Aleksander Alekseev wrote:
> I can't be 100% sure but it looks like that's all of them. PFA the
> updated patch v2.

Thanks.  Yes, this stuff is easy to miss.  I was just grepping for a
few patterns and missed these two.
--
Michael


signature.asc
Description: PGP signature

Add TOAST support for more system tables

2023-07-17 Thread Sofia Kopikova


Hi, hackers!

I've tried sending this patch to community before, let me try it second
time. Patch is revised and improved compared to previous version.

This patch adds TOAST support for system tables pg_class,
pg_attribute and pg_largeobject_metadata, as they include ACL columns,
which may be potentially large in size. Patch fixes possible pg_upgrade
bug (problem with seeing a non-empty new cluster).

During code developing it turned out that heap_inplace_update function
is not suitable for use with TOAST, so its work could lead to wrong
statistics update (for example, during VACUUM). This problem is fixed
by adding new heap_inplace_update_prepare_tuple function -- we assume
TOASTed attributes are never changed by in-place update, and just
replace them with old values.

I also added pg_catalog_toast1 test that does check for "invalid tupple
length" error when creating index with toasted pg_class. Test grants and
drops roles on certain table many times to make ACL column long and then
creates index on this table.

I wonder what other bugs can happen there, but if anyone can give me a
hint, I'll try to fix them. Anyway, in PostgresPro we didn't encounter
any problems with this feature.

First attempt here:
https://www.postgresql.org/message-id/1643112264.186902...@f325.i.mail.ru

This time I'll do it better

--
Sofia Kopikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From 6812e6cd47c03e22f953732a24223411d80d6c95 Mon Sep 17 00:00:00 2001
From: Sofia Kopikova 
Date: Mon, 17 Jul 2023 11:56:25 +0300
Subject: [PATCH] Add TOAST support for several system tables

ACL lists may have large size. Using TOAST for system tables pg_class,
pg_attribute and pg_largeobject_metadata with aclitem[] columns.

In 96cdeae07f9 some system catalogs were excluded from adding TOATS to
them due to several possible bugs. Here bug with pg_upgrade seeing a
non-empty new cluster is fixed. A workaround is added to
heap_inplace_update function for its correct work with TOAST. Also test
for "invalig tupple length" error case when creating index with toasted
pg_class is added.
---
 src/backend/access/heap/heapam.c  | 64 +--
 src/backend/catalog/catalog.c |  2 +
 src/bin/pg_upgrade/check.c|  3 +-
 src/include/catalog/pg_attribute.h|  2 +
 src/include/catalog/pg_class.h|  2 +
 src/include/catalog/pg_largeobject_metadata.h |  2 +
 src/test/regress/expected/misc_sanity.out | 30 -
 .../regress/expected/pg_catalog_toast1.out| 25 
 src/test/regress/parallel_schedule|  3 +
 src/test/regress/sql/misc_sanity.sql  | 10 +--
 src/test/regress/sql/pg_catalog_toast1.sql| 20 ++
 11 files changed, 134 insertions(+), 29 deletions(-)
 create mode 100644 src/test/regress/expected/pg_catalog_toast1.out
 create mode 100644 src/test/regress/sql/pg_catalog_toast1.sql

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7ed72abe597..e043a8e4401 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5860,6 +5860,53 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
 	pgstat_count_heap_delete(relation);
 }
 
+/*
+ * Prepare tuple for inplace update.  TOASTed attributes can't be modified
+ * by in-place upgrade.  Simultaneously, new tuple is flatten.  So, we just
+ * replace TOASTed attributes with their values of old tuple.
+ */
+static HeapTuple
+heap_inplace_update_prepare_tuple(Relation relation,
+  HeapTuple oldtup,
+  HeapTuple newtup)
+{
+	TupleDesc	desc = relation->rd_att;
+	HeapTuple	result;
+	Datum	   *oldvals,
+			   *newvals;
+	bool	   *oldnulls,
+			   *newnulls;
+	int			i,
+natts = desc->natts;
+
+	oldvals = (Datum *) palloc(sizeof(Datum) * natts);
+	newvals = (Datum *) palloc(sizeof(Datum) * natts);
+	oldnulls = (bool *) palloc(sizeof(bool) * natts);
+	newnulls = (bool *) palloc(sizeof(bool) * natts);
+
+	heap_deform_tuple(oldtup, desc, oldvals, oldnulls);
+	heap_deform_tuple(newtup, desc, newvals, newnulls);
+
+	for (i = 0; i < natts; i++)
+	{
+		Form_pg_attribute att = >attrs[i];
+
+		if (att->attlen == -1 &&
+			!oldnulls[i] &&
+			VARATT_IS_EXTENDED(oldvals[i]))
+		{
+			Assert(!newnulls[i]);
+			newvals[i] = oldvals[i];
+		}
+	}
+
+	result = heap_form_tuple(desc, newvals, newnulls);
+
+	result->t_self = newtup->t_self;
+
+	return result;
+}
+
 /*
  * heap_inplace_update - update a tuple "in place" (ie, overwrite it)
  *
@@ -5889,6 +5936,8 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 	HeapTupleHeader htup;
 	uint32		oldlen;
 	uint32		newlen;
+	HeapTupleData oldtup;
+	HeapTuple	newtup;
 
 	/*
 	 * For now, we don't allow parallel updates.  Unlike a regular update,
@@ -5914,16 +5963,23 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 
 	htup = (HeapTupleHeader) PageGetItem(page, lp);
 
+	oldtup.t_tableOid = RelationGetRelid(relation);
+

Re: harmonize password reuse in vacuumdb, clusterdb, and reindexdb

2023-07-17 Thread Nathan Bossart

On Wed, Jun 28, 2023 at 10:24:09PM -0700, Nathan Bossart wrote:
> On Wed, Jun 28, 2023 at 09:20:03PM -0700, Gurjeet Singh wrote:
>> The comment on top of connect_utils.c:connectDatabase() seems pertinent:
>> 
>>> (Callers should not pass
>>> * allow_password_reuse=true unless reconnecting to the same database+user
>>> * as before, else we might create password exposure hazards.)
>> 
>> The callers of {cluster|reindex}_one_database() (which in turn call
>> connectDatabase()) clearly pass different database names in successive
>> calls to these functions. So the patch seems to be in conflict with
>> the recommendation in the comment.
>> 
>> I'm not sure if the concern raised in that comment is a legitimate
>> one, though. I mean, if the password is reused to connect to a
>> different database in the same cluster/instance, which I think is
>> always the case with these utilities, the password will exposed in the
>> server logs (if at all). And since the admins of the instance already
>> have full control over the passwords of the user, I don't think this
>> patch will give them any more information than what they can get
>> anyways.
>> 
>> It is a valid concern, though, if the utility connects to a different
>> instance in the same run/invocation, and hence exposes the password
>> from the first instance to the admins of the second cluster.
> 
> The same commit that added this comment (ff402ae) also set the
> allow_password_reuse parameter to true in vacuumdb's connectDatabase()
> calls.  I found a message from the corresponding thread that provides some
> additional detail [0].  I wonder if this comment should instead recommend
> against using the allow_password_reuse flag unless reconnecting to the same
> host/port/user target.  Connecting to different databases with the same
> host/port/user information seems okay.  Maybe I am missing something... 

Here is a new version of the patch in which I've updated this comment as
proposed.  Gurjeet, do you have any other concerns about this patch?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 85d0a1bbe62c4cc01b3fdba7c653f95b8472cc7a Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 27 Jun 2023 21:38:24 -0700
Subject: [PATCH v2 1/1] harmonize password reuse in vacuumdb, clusterdb, and
 reindexdb

---
 doc/src/sgml/ref/reindexdb.sgml | 14 --
 doc/src/sgml/ref/vacuumdb.sgml  | 13 -
 src/bin/scripts/clusterdb.c |  2 +-
 src/bin/scripts/reindexdb.c |  2 +-
 src/fe_utils/connect_utils.c|  2 +-
 5 files changed, 3 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/ref/reindexdb.sgml b/doc/src/sgml/ref/reindexdb.sgml
index 8cb8bf4fa3..8d9ced212f 100644
--- a/doc/src/sgml/ref/reindexdb.sgml
+++ b/doc/src/sgml/ref/reindexdb.sgml
@@ -432,20 +432,6 @@ PostgreSQL documentation
 
  
 
-
- 
-  Notes
-
-  
-   reindexdb might need to connect several
-   times to the PostgreSQL server, asking
-   for a password each time. It is convenient to have a
-   ~/.pgpass file in such cases. See  for more information.
-  
- 
-
-
  
   Examples
 
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index da2393783b..09356ea4fa 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -605,19 +605,6 @@ PostgreSQL documentation
 
  
 
-
- 
-  Notes
-
-  
-   vacuumdb might need to connect several
-   times to the PostgreSQL server, asking
-   for a password each time. It is convenient to have a
-   ~/.pgpass file in such cases. See  for more information.
-  
- 
-
  
   Examples
 
diff --git a/src/bin/scripts/clusterdb.c b/src/bin/scripts/clusterdb.c
index 58a774013b..65428031c7 100644
--- a/src/bin/scripts/clusterdb.c
+++ b/src/bin/scripts/clusterdb.c
@@ -195,7 +195,7 @@ cluster_one_database(const ConnParams *cparams, const char *table,
 
 	PGconn	   *conn;
 
-	conn = connectDatabase(cparams, progname, echo, false, false);
+	conn = connectDatabase(cparams, progname, echo, false, true);
 
 	initPQExpBuffer();
 
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index 5b297d1dc1..002c41f221 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -315,7 +315,7 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 	bool		failed = false;
 	int			items_count = 0;
 
-	conn = connectDatabase(cparams, progname, echo, false, false);
+	conn = connectDatabase(cparams, progname, echo, false, true);
 
 	if (concurrently && PQserverVersion(conn) < 12)
 	{
diff --git a/src/fe_utils/connect_utils.c b/src/fe_utils/connect_utils.c
index 7a1edea7c8..7d45f5c609 100644
--- a/src/fe_utils/connect_utils.c
+++ b/src/fe_utils/connect_utils.c
@@ -25,7 +25,7 @@
  *
  * If allow_password_reuse is true, we will try to re-use any password
  * given during previous calls to this routine.  (Callers should not pass
- * allow_password_reuse=true unless reconnecting to the same database+user
+ * allow_password_reuse=true unless reconnecting to the

Re: SLRUs in the main buffer pool - Page Header definitions

2023-07-17 Thread Stephen Frost

Greetings,

[snipped quoted bits]

Would really be helpful to keep who the author of each quoted snipper
was when you quote them; dropping that makes it look like one person
wrote all of them and that's confusing.

* Bagga, Rishu (bagri...@amazon.com) wrote:
> The third patch brings back the the SLRU control structure, to keep it 
> as an extensible feature for now, and renames the handler for the 
> components we are moving into the buffercache to NREL (short for non 
> relational). nrel.c is essentially a copy of Munro’s modified slru.c, 
> and I have restored the original slru.c. This allows for existing 
> extensions utilizing SLRUs to keep working, and the test_slru unit tests 
> to pass, as well as introducing a more accurate name for the handling of 
> components (CLOG, Multixact Offsets/Members, Async, Serial, Subtrans) 
> that are no longer handled by an SLRU, but are still non relational 
> components. To address Andres’s concern - I modified the slru stats test 
> code to still track all these current components and maintain the 
> behavior, and confirmed as those tests pass as well.

Haven't really looked over the patches yet but I wanted to push back on
this a bit- you're suggesting that we'd continue to maintain and update
slru.c for the benefit of extensions which use it while none of the core
code uses it?  For how long?  For my 2c, at least, I'd rather we tell
extension authors that they need to update their code instead.  There's
reasons why we're moving the SLRUs into the main buffer pool and having
page headers for them and using the existing page code to read/write
them and extension authors should be eager to gain those advantages too.
Not sure how much concern to place on extensions that aren't willing to
adjust to changes like these.

> The fourth patch adds the page headers to these Non Relational (NREL) 
> components, and provides the upgrade story to rewrite the clog and 
> multixact files with page headers across upgrades.

Nice.

> With the changes from all four patches, they pass all tests with make 
> installcheck-world, as well as test_slru.

Awesome, will try to take a look soon.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Fine-tune TLS 1.3 cipher suites and curves lists

2023-07-17 Thread Daniel Gustafsson

> On 17 Jul 2023, at 15:16, Seraphime Kirkovski  wrote:

> I see in the source code that only TLS 1.2 and bellow cipher lists can be 
> configured:
>  
> https://github.com/postgres/postgres/blob/master/src/backend/libpq/be-secure-openssl.c#L281
>  
> and Postgres relies on the OpenSSL defaults for TLS 1.3 ciphersuites.
>  
> My first question is whether there is a reason not to support setting TLS 1.3 
> cipher suites through configuration ? Maybe there are Postgres builds with 
> BoringSSL ? (Just speculating ?)

I think the main raison is that noone has done it, and noone has requested it.
I have no way if knowing for certain, but I doubt too many postgres users
change this setting.

> Another thing I was curious about is why does postgres opts to support 
> setting only a single elliptic group 
> (https://github.com/postgres/postgres/blob/master/src/backend/libpq/be-secure-openssl.c#L1303)
>  instead of calling out to an SSL function like SSL_CTX_set1_curves_list ?
> 
> Would the community be interested in seeing patches for setting TLS 1.3 
> ciphersuites and expanding the configuration option for EC settings to 
> support lists instead of single values ? 

I would be interested in seeing them, and would offer to review them.

The main challenge is IMO to properly document these settings such that
postgres users know what they are, and when they should think about changing
them.  Postgres also supports very old OpenSSL versions, so any change and
setting must in some way make sense for those installations (which may be a
no-op, a warning at startup for non-applicable settings, or something else).

--
Daniel Gustafsson

Re: MERGE ... RETURNING

2023-07-17 Thread Jeff Davis

On Fri, 2023-07-14 at 09:55 +0100, Dean Rasheed wrote:
> I found a 10-year-old thread discussing adding support for OLD/NEW to
> RETURNING [1], but it doesn't look like anything close to a
> committable solution was developed, or even a design that might lead
> to one. That's a shame, because there seemed to be a lot of demand
> for
> the feature, but it's not clear how much effort it would be to
> implement.

It looks like progress was made in the direction of using a table alias
with executor support to bring the right attributes along.

There was some concern about how exactly the table alias should work
such that it doesn't look too much like a join. Not sure how much of a
problem that is.

> > Maybe instead of a function it could be a special table reference
> > like:
> > 
> >   MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val?
> > 
> 
> Well, that's a little more concise, but I'm not sure that it really
> buys us that much, to be worth the extra complication. Presumably
> something in the planner would turn that into something the executor
> could handle, which might just end up being the existing functions
> anyway.

The benefits are:

1. It is naturally constrained to the right context. It doesn't require
global variables and the PG_TRY/PG_FINALLY, and can't be called in the
wrong contexts (like SELECT).

2. More likely to be consistent with eventual support for NEW/OLD
(actually BEFORE/AFTER for reasons the prior thread discussed).

I'm not sure how much extra complication it would cause, though.

Regards,
Jeff Davis

Re: Fix search_path for all maintenance commands

2023-07-17 Thread Jeff Davis

On Mon, 2023-07-17 at 10:58 -0700, Nathan Bossart wrote:
> On Sat, Jul 15, 2023 at 02:13:33PM -0700, Noah Misch wrote:
> > The 2018 security fixes instigated many function repairs that
> > $SUBJECT would
> > otherwise instigate.  That wasn't too painful.  The net new pain of
> > $SUBJECT
> > will be less, since the 2018 security fixes prepared the path. 
> > Hence, I
> > remain +1 for the latest Davis proposal.
> 
> I concur.

Based on feedback, I plan to commit soon.

Tom's objection seemed specific to v16, and Robert's concern seemed to
be about having the MAINTAIN privilege without this patch. If I missed
any objections to this patch, please let me know.

If we hear about breakage that suggests we need an escape hatch GUC, we
have time to add one later.

I remain open to considering more complete fixes for the search_path
problems.

Regards,
Jeff Davis

Re: Should we remove db_user_namespace?

2023-07-17 Thread Nathan Bossart

On Sun, Jul 16, 2023 at 01:24:06PM +0200, Magnus Hagander wrote:
> I'd lean towards "no". A hard break, when it's a major release, is
> better than a "it stopped having effect but didn't tell you anything"
> break. Especially when it comes to things like startup scripts etc.

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: sslinfo extension - add notbefore and notafter timestamps

2023-07-17 Thread Cary Huang

Hello

 > > Perhaps calling "tm2timestamp(_time, 0, NULL, )" without checking 
 > > the return code would be just fine. I see some other usages of 
 > > tm2timstamp() in other code areas also skip checking the return code.
 > 
 > I think we want to know about any failures, btu we can probably make it into 
 > an
 > elog() instead, as it should never fail.

Yes, sure. I have corrected the error message to elog(ERROR, "timestamp out of 
range") on a rare tm2timestamp() failure. Please see the attached patch based 
on your v5. "v6-0001-Set-fixed-dates-for-test-certificates-validity.patch" is 
exactly the same as 
"v5-0001-Set-fixed-dates-for-test-certificates-validity.patch", I just up the 
version to be consistent. 

thank you very much


Cary Huang
-
HighGo Software Inc. (Canada)
cary.hu...@highgo.ca
www.highgo.ca



v6-0001-Set-fixed-dates-for-test-certificates-validity.patch
Description: Binary data


v6-0002-Add-notBefore-and-notAfter-to-SSL-cert-info-displ.patch
Description: Binary data

Re: Looking for context around which event triggers are permitted

2023-07-17 Thread Garrett Thornburg

That makes sense and is similar to the problem I'm hoping to solve for our
team. We had a DB upgrade that corrupted a few indexes. Gitlab went through
something similar as part of their OS/ DB upgrade. We had to concurrently
reindex everything. This took a few days and just to make sure we completed
this, we reindexed again. If we had had a way to log the event to a table
for each index, it would have made our lives a lot easier.

At a more high level though, it really made me wish there was a way to
audit these things. Sounds like that is what event triggers were designed
for and adding a few more operations could prove useful. Example: You can
track Create/Alter/Drop of a table's lifecycle, capturing timestamps in a
table, but not indexes without REINDEX.

On Mon, Jul 17, 2023 at 10:31 AM Alvaro Herrera 
wrote:

> On 2023-Jul-17, Garrett Thornburg wrote:
>
> > That's a good point, Isaac. Select into, security label, comment, etc are
> > all maintenance style commands but are already added to the matrix. I do
> > think there's a good case to include other maintenance related commands
> as
> > event triggers. Suppose you want to know the last time a table was
> vacuumed
> > or the last time a table was reindexed. If you can trigger off of these
> > maintenance commands, there's a lot you could build on top of postgres to
> > make the maintenance experience easier. Seems like a positive thing.
> >
> > The code exists but they are disabled at the moment. Happy to enable
> those
> > with a patch if it's as Aleksander said. Meaning, no real reason they
> were
> > disabled other than someone thought folks wouldn't need them.
>
> Yeah, as I recall, initially there were two use cases considered for
> event triggers:
>
> 1. DDL replication.  For this, you need to capture commands that somehow
> modify the set of objects that exist in the database.  So creating an
> index or COMMENT are important, but reindexing one isn't.
>
> 2. DDL auditing.  Pretty much the same as above.  You don't really care
> when vacuuming occurs, but if a table changes ownership or a security
> label is modified, that needs to be kept track of.
>
>
> Later, a further use case was added to enable people avoid long-running
> table locking behavior: you only want to let your devs run ALTER TABLE
> in production if it's going to finish really quick.  So table_rewriting
> appeared and allowed some further options.  (As for SELECT INTO, it may
> be that it is only there because it's very close in implementation to
> CREATE TABLE AS, which naturally needs to be logged for auditing
> purposes ... but I'm not sure.)
>
>
> I'm wondering why you want REINDEX reported to an event trigger.  What's
> your use case?
>
> --
> Álvaro Herrera PostgreSQL Developer  —
> https://www.EnterpriseDB.com/
>

Re: Fix search_path for all maintenance commands

2023-07-17 Thread Nathan Bossart

On Sat, Jul 15, 2023 at 02:13:33PM -0700, Noah Misch wrote:
> The 2018 security fixes instigated many function repairs that $SUBJECT would
> otherwise instigate.  That wasn't too painful.  The net new pain of $SUBJECT
> will be less, since the 2018 security fixes prepared the path.  Hence, I
> remain +1 for the latest Davis proposal.

I concur.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Report distinct wait events when waiting for WAL "operation"

2023-07-17 Thread Andres Freund

Hi,

In a number of workloads one can see two wait events prominently:
LWLock:WALWrite and LWLock:WALInsert. Unfortunately for both that is not very
informative:

LWLock:WALWrite can be reported because there is genuine contention on the
LWLock, or, more commonly, because several backends are waiting for another to
finish IO. In the latter case we are not actually waiting to acquire the lock,
we are waiting for the lock to be released *without* then acquiring it.

LWLock:WALInsert can be reported because there are not enough WALInsert locks
(c.f. NUM_XLOGINSERT_LOCKS) or because we are waiting for another backend to
finish copying a WAL record into wal_buffers.  In the latter case we are
therefore not waiting to acquire an LWLock.


I think both of these cases are relevant to distinguish from an operational
perspective. Secondarily, I've received many questions about making those
locks more scalable / granular, when in most of the cases the issue was not
actual lock contention.

Today it's surprisingly hard to figure out whether the issue is lock
contention or the speed of copying buffers for WAL insert locks / computing
the last prc of the CRC checksum.


Therefore I'm proposing that LWLockAcquireOrWait() and LWLockWaitForVar() not
use the "generic" LWLockReportWaitStart(), but use caller provided wait
events.  The attached patch adds two new wait events for the existing callers.

I waffled a bit about which wait event section to add these to. Ended up with
"IPC", but would be happy to change that.

WAIT_EVENT_WAL_WAIT_INSERT  WALWaitInsert   "Waiting for WAL record to be 
copied into buffers."
WAIT_EVENT_WAL_WAIT_WRITE   WALWaitWrite"Waiting for WAL buffers to be 
written or flushed to disk."


Previously it was e.g. not really possible to distinguish that something like
this:

┌┬─┬┬───┐
│  backend_type  │ wait_event_type │ wait_event │ count │
├┼─┼┼───┤
│ client backend │ LWLock  │ WALInsert  │32 │
│ client backend │ (null)  │ (null) │ 9 │
│ walwriter  │ IO  │ WALWrite   │ 1 │
│ client backend │ Client  │ ClientRead │ 1 │
│ client backend │ LWLock  │ WALWrite   │ 1 │
└┴─┴┴───┘

is a workload with a very different bottleneck than this:

┌┬─┬───┬───┐
│  backend_type  │ wait_event_type │  wait_event   │ count │
├┼─┼───┼───┤
│ client backend │ IPC │ WALWaitInsert │22 │
│ client backend │ LWLock  │ WALInsert │13 │
│ client backend │ LWLock  │ WALBufMapping │ 5 │
│ walwriter  │ (null)  │ (null)│ 1 │
│ client backend │ Client  │ ClientRead│ 1 │
│ client backend │ (null)  │ (null)│ 1 │
└┴─┴───┴───┘

even though they are very different

FWIW, the former is bottlenecked by the number of WAL insertion locks, the
second is bottlenecked by copying WAL into buffers due to needing to flush
them.

Greetings,

Andres Freund
>From 9dcf4a45b6ed7d8fca1a1cd6782f11dff3a84406 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 17 Jul 2023 09:20:05 -0700
Subject: [PATCH v1] Caller specified wait events for LWLockWaitForVar(),
 LWLockAcquireOrWait()

Reporting waits within those functions as LWLock wait events is misleading, as
we do not actually wait for the lock themselves.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/include/storage/lwlock.h  |  6 --
 src/backend/access/transam/xlog.c |  6 --
 src/backend/storage/lmgr/lwlock.c | 19 +--
 .../utils/activity/wait_event_names.txt   |  2 ++
 4 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 34169e5889e..0f345076f41 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -127,7 +127,8 @@ extern PGDLLIMPORT bool Trace_lwlocks;
 
 extern bool LWLockAcquire(LWLock *lock, LWLockMode mode);
 extern bool LWLockConditionalAcquire(LWLock *lock, LWLockMode mode);
-extern bool LWLockAcquireOrWait(LWLock *lock, LWLockMode mode);
+extern bool LWLockAcquireOrWait(LWLock *lock, LWLockMode mode,
+uint32 wait_event_info);
 extern void LWLockRelease(LWLock *lock);
 extern void LWLockReleaseClearVar(LWLock *lock, uint64 *valptr, uint64 val);
 extern void LWLockReleaseAll(void);
@@ -135,7 +136,8 @@ extern bool LWLockHeldByMe(LWLock *lock);
 extern bool LWLockAnyHeldByMe(LWLock *lock, int nlocks, size_t stride);
 extern bool LWLockHeldByMeInMode(LWLock *lock, LWLockMode mode);
 
-extern bool LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval);
+extern bool LWLockWaitForVar(LWLock *lock, uint64 *valptr,

Re: generic plans and "initial" pruning

2023-07-17 Thread Thom Brown

On Thu, 13 Jul 2023 at 13:59, Amit Langote  wrote:
> In an absolutely brown-paper-bag moment, I realized that I had not
> updated src/backend/executor/README to reflect the changes to the
> executor's control flow that this patch makes.   That is, after
> scrapping the old design back in January whose details *were*
> reflected in the patches before that redesign.
>
> Anyway, the attached fixes that.
>
> Tom, do you think you have bandwidth in the near future to give this
> another look?  I think I've addressed the comments that you had given
> back in April, though as mentioned in the previous message, there may
> still be some funny-looking aspects still remaining.  In any case, I
> have no intention of pressing ahead with the patch without another
> committer having had a chance to sign off on it.

I've only just started taking a look at this, and my first test drive
yields very impressive results:

8192 partitions (3 runs, 1 rows)
Head 391.294989 382.622481 379.252236
Patched 13088.145995 13406.135531 13431.828051

Looking at your changes to README, I would like to suggest rewording
the following:

+table during planning.  This means that inheritance child tables, which are
+added to the query's range table during planning, if they are present in a
+cached plan tree would not have been locked.

To:

This means that inheritance child tables present in a cached plan
tree, which are added to the query's range table during planning,
would not have been locked.

Also, further down:

s/intiatialize/initialize/

I'll carry on taking a closer look and see if I can break it.

Thom

Re: Looking for context around which event triggers are permitted

2023-07-17 Thread Aleksander Alekseev

Hi,

> Happy to enable those with a patch if it's as Aleksander said. Meaning, no 
> real reason they were disabled other than someone thought folks wouldn't need 
> them.

Sure, please feel free submitting the patch and we will see how it
goes. I don't foresee a strong push-back from the community, but this
being said you can never be certain.

Ideally the patch should include corresponding tests and changes to
the documentation. If you will experience difficulties with those,
that's fine, submit the patch as is. Somebody (me, perhaps) will add
them if necessary.

-- 
Best regards,
Aleksander Alekseev

Re: Looking for context around which event triggers are permitted

2023-07-17 Thread Alvaro Herrera

On 2023-Jul-17, Garrett Thornburg wrote:

> That's a good point, Isaac. Select into, security label, comment, etc are
> all maintenance style commands but are already added to the matrix. I do
> think there's a good case to include other maintenance related commands as
> event triggers. Suppose you want to know the last time a table was vacuumed
> or the last time a table was reindexed. If you can trigger off of these
> maintenance commands, there's a lot you could build on top of postgres to
> make the maintenance experience easier. Seems like a positive thing.
> 
> The code exists but they are disabled at the moment. Happy to enable those
> with a patch if it's as Aleksander said. Meaning, no real reason they were
> disabled other than someone thought folks wouldn't need them.

Yeah, as I recall, initially there were two use cases considered for
event triggers:

1. DDL replication.  For this, you need to capture commands that somehow
modify the set of objects that exist in the database.  So creating an
index or COMMENT are important, but reindexing one isn't.

2. DDL auditing.  Pretty much the same as above.  You don't really care
when vacuuming occurs, but if a table changes ownership or a security
label is modified, that needs to be kept track of.

Later, a further use case was added to enable people avoid long-running
table locking behavior: you only want to let your devs run ALTER TABLE
in production if it's going to finish really quick.  So table_rewriting
appeared and allowed some further options.  (As for SELECT INTO, it may
be that it is only there because it's very close in implementation to
CREATE TABLE AS, which naturally needs to be logged for auditing
purposes ... but I'm not sure.)

I'm wondering why you want REINDEX reported to an event trigger.  What's
your use case?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Looking for context around which event triggers are permitted

2023-07-17 Thread Garrett Thornburg

That's a good point, Isaac. Select into, security label, comment, etc are
all maintenance style commands but are already added to the matrix. I do
think there's a good case to include other maintenance related commands as
event triggers. Suppose you want to know the last time a table was vacuumed
or the last time a table was reindexed. If you can trigger off of these
maintenance commands, there's a lot you could build on top of postgres to
make the maintenance experience easier. Seems like a positive thing.

The code exists but they are disabled at the moment. Happy to enable those
with a patch if it's as Aleksander said. Meaning, no real reason they were
disabled other than someone thought folks wouldn't need them.

Re: Looking for context around which event triggers are permitted

2023-07-17 Thread Isaac Morland

On Mon, 17 Jul 2023 at 11:26, Aleksander Alekseev 
wrote:

> Hi,
>
> > I was working on a project with event triggers and was wondering if
> there was any context from the developers around why some things make this
> list and others do not. Example: REVOKE/ GRANT are in the event trigger
> matrix [1] but REINDEX is not. Just wondering if there's a mailing list
> thread or a commit message that has more info. I can't seem to find
> anything in the postgres list archives. Thanks!
> >
> > [1] https://www.postgresql.org/docs/15/event-trigger-matrix.html
>
> Good question. My guess would be that no one really needed an event
> trigger for REINDEX so far.
>

My answer is not authoritative, but I notice that ANALYZE and VACUUM are
also not there. Those, together with REINDEX, are maintenance commands,
which normally should not affect which queries you can run or their
results. If we think of the queries we can run and the objects we can run
them against as forming an abstraction with maintenance commands breaking
the abstraction, then we can think of event triggers as operating against
the abstraction layer, not the underlying maintenance layer.

On the other hand, the event triggers include tags related to indexes,
which themselves (except for enforcement of uniqueness) in some sense sit
below the abstraction: presence of an index can affect the query plan and
how efficient it is, but shouldn't change the result of a query or whether
it is a valid query. So this is not a fully satisfactory explanation.

Re: Looking for context around which event triggers are permitted

2023-07-17 Thread Aleksander Alekseev

Hi,

> I was working on a project with event triggers and was wondering if there was 
> any context from the developers around why some things make this list and 
> others do not. Example: REVOKE/ GRANT are in the event trigger matrix [1] but 
> REINDEX is not. Just wondering if there's a mailing list thread or a commit 
> message that has more info. I can't seem to find anything in the postgres 
> list archives. Thanks!
>
> [1] https://www.postgresql.org/docs/15/event-trigger-matrix.html

Good question. My guess would be that no one really needed an event
trigger for REINDEX so far.

-- 
Best regards,
Aleksander Alekseev

Fine-tune TLS 1.3 cipher suites and curves lists

2023-07-17 Thread Seraphime Kirkovski

Hi all,

I’m a security engineer and I’m looking into restricting the set of allowed 
ciphers on Postgres and configure a concrete set of curves on our postgres 
instances.

I see in the source code that only TLS 1.2 and bellow cipher lists can be 
configured:

https://github.com/postgres/postgres/blob/master/src/backend/libpq/be-secure-openssl.c#L281

and Postgres relies on the OpenSSL defaults for TLS 1.3 ciphersuites.

My first question is whether there is a reason not to support setting TLS 1.3 
cipher suites through configuration ? Maybe there are Postgres builds with 
BoringSSL ? (Just speculating ?)

Another thing I was curious about is why does postgres opts to support setting 
only a single elliptic group 
(https://github.com/postgres/postgres/blob/master/src/backend/libpq/be-secure-openssl.c#L1303)
 instead of calling out to an SSL function like SSL_CTX_set1_curves_list ?

Would the community be interested in seeing patches for setting TLS 1.3 
ciphersuites and expanding the configuration option for EC settings to support 
lists instead of single values ?

Thanks,
Seraphime Kirkovski

Re: Improve heapgetpage() performance, overhead from serializable

2023-07-17 Thread Andres Freund

Hi,

On 2023-07-17 09:55:07 +0800, Zhang Mingli wrote:
> LGTM and I have a fool question:
>
>   if (likely(all_visible))
>   {
>   if (likely(!check_serializable))
>   scan->rs_ntuples = heapgetpage_collect(scan, snapshot, 
> page, buffer,
>   
>    block, lines, 1, 0);
>   else
>   scan->rs_ntuples = heapgetpage_collect(scan, snapshot, 
> page, buffer,
>   
>    block, lines, 1, 1);
>   }
>   else
>   {
>   if (likely(!check_serializable))
>   scan->rs_ntuples = heapgetpage_collect(scan, snapshot, 
> page, buffer,
>   
>    block, lines, 0, 0);
>   else
>   scan->rs_ntuples = heapgetpage_collect(scan, snapshot, 
> page, buffer,
>   
>    block, lines, 0, 1);
>
>
> Does it make sense to combine if else condition and put it to the incline 
> function’s param?
>
> Like:
> scan->rs_ntuples = heapgetpage_collect(scan, snapshot, page, buffer,
>   
>    block, lines, all_visible, check_serializable);

I think that makes it less likely that the compiler actually generates a
constant-folded version for each of the branches. Perhaps worth some
experimentation.

Greetings,

Andres Freund

Looking for context around which event triggers are permitted

2023-07-17 Thread Garrett Thornburg

Hey list,

I was working on a project with event triggers and was wondering if there
was any context from the developers around why some things make this list
and others do not. Example: REVOKE/ GRANT are in the event trigger matrix
[1] but REINDEX is not. Just wondering if there's a mailing list thread or
a commit message that has more info. I can't seem to find anything in the
postgres list archives. Thanks!

[1] https://www.postgresql.org/docs/15/event-trigger-matrix.html

Re: ObjectIdGetDatum() missing from SearchSysCache*() callers

2023-07-17 Thread Aleksander Alekseev

Hi,

> > And inside pg_sequence_parameters:
> > pgstuple = SearchSysCache1(SEQRELID, relid);
>
> Found another one in partcache.c:
>
> ```
> /* Get pg_class.relpartbound */
> tuple = SearchSysCache1(RELOID, RelationGetRelid(rel));
> ```
>
> I can't be 100% sure but it looks like that's all of them. PFA the
> updated patch v2.

Added a CF entry, just in case:
https://commitfest.postgresql.org/44/4448/


-- 
Best regards,
Aleksander Alekseev

Re: ObjectIdGetDatum() missing from SearchSysCache*() callers

2023-07-17 Thread Aleksander Alekseev

Hi,

> And inside pg_sequence_parameters:
> pgstuple = SearchSysCache1(SEQRELID, relid);

Found another one in partcache.c:

```
/* Get pg_class.relpartbound */
tuple = SearchSysCache1(RELOID, RelationGetRelid(rel));
```

I can't be 100% sure but it looks like that's all of them. PFA the
updated patch v2.

-- 
Best regards,
Aleksander Alekseev


v2-0001-Pass-Datum-to-SearchSysCache-instead-of-Oid.patch
Description: Binary data

Re: [PATCH] Automatic HASH and LIST partition creation

2023-07-17 Thread Stéphane Tachoires

Hi,
I found that thread (and the patch), but it seems to be pretty dead.
Patch didn't apply, due to gen_node_support.pl
Can I hope for a rebirth ?

I've made a rebased patch,in case of no response...
It's just the patch from
https://www.postgresql.org/message-id/calt9zeg9okz9-dv9yyzaeexnpzp0+telfsz7qst28acmerv...@mail.gmail.com
rebased on 17dev

Perhaps it's too early for a commit ; automatic range partitioning is still
missing and, according to
https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements,
syntax is arguable.

If 'USING' it out of option (already a keyword for CREATE TABLE) and
'CONFIGURATION()' is not what we want, we should reach for a final decision
first.
I suggest OVER that is a keyword but unused in CREATE TABLE (nor ALTER
TABLE). Whatever...

For RANGE partitioning I think of four syntaxes (inspired by pg_partman)
PARTITION BY RANGE(stamp) CONFIGURATION (SPAN interval CENTER datetime BACK
integer AHEAD integer [DEFAULT [PARTITION] [defname]])
PARTITION BY RANGE(stamp) CONFIGURATION (SPAN interval
START firstfrombound END lasttobound [DEFAULT [PARTITION] [defname]])
PARTITION BY RANGE(region_id) CONFIGURATION (STEP integer START integer END
integer [DEFAULT [PARTITION] [defname]])
PARTITION BY RANGE(name) CONFIGURATION (BOUNDS (boundlist) [START
firstfrombound] [END lasttobound] [DEFAULT [PARTITION] [defname]])

Last one should solve the addition operator problem with non numeric non
timedate range.
Plus, it allows non uniform range (thinking about an "encyclopedia volume"
partitioning, you know 'A', 'B-CL', 'CL-D'...)

CREATE table (LIKE other INCLUDING PARTITIONS) should create 'table'
partitioned the same as 'other'
and
CREATE table (LIKE other INCLUDING PARTITIONS) PARTITION BY partspec
CONFIGURATION(), should create 'table' partitioned by partspec and sub
partitioned as 'other'.

Then CREATE could accept multiple PARTITION BY CONFIGURATION().

For ALTER TABLE (and automatic maintenance) to be usable, we will need
SPLIT and MERGE CONCURRENTLY (pg_pathman ?) enhanced by CREATE TABLE LIKE
to handle subpartitioning. But that's another story.

Stéphane.


Le jeu. 2 déc. 2021 à 12:20, Daniel Gustafsson  a écrit :

> This thread has stalled since July with review comments unanswered, I'm
> marking
> the patch Returned with Feedback.  Please feel free to resubmit when/if a
> new
> patch is available.
>
> --
> Daniel Gustafsson   https://vmware.com/
>
>
>
>

-- 
"Où se posaient les hirondelles avant l'invention du téléphone ?"
  -- Grégoire Lacroix
From ff875672556a2da09cfcc6c976178abd91c74622 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?St=C3=A9phane=20Tachoires?= 
Date: Mon, 17 Jul 2023 13:25:29 +0200
Subject: [PATCH v7] This is a simple rebase on most recent 17dev. Due to lack
 of answer from author Pavel Borisov 

Previous meaningfull commit from author was:
From ef0fcba607641aef94a9ce9b0393e607ba476ba4 Mon Sep 17 00:00:00 2001
From: Pavel Borisov 
Date: Fri, 9 Jul 2021 00:34:21 +0400
Subject: [PATCH v5] Automatically generate partitions by LIST and HASH

A patch adds CREATE TABLE statement syntax and functions to create both
a parent partitioned table and child partitions at once based on a
partitioning rules. The created partitions set can be manipulated using
existing expressions like DETACH, CREATE TABLE.. PARTITION OF etc.

It is the first step towards more complicated:
(1) partitioning BY RANGE,
(2) automatic deferred child partition creation on a first try to
insert a row with which fullfills a condition for the partition etc.
---
 doc/src/sgml/ref/create_table.sgml |  49 +++
 src/backend/parser/gram.y  |  81 +++-
 src/backend/parser/parse_utilcmd.c | 142 +++
 src/include/nodes/parsenodes.h |  23 +
 src/include/partitioning/partdefs.h|   2 +
 src/test/regress/expected/create_table.out | 464 +
 src/test/regress/sql/create_table.sql  | 265 
 7 files changed, 1024 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 10ef699fab..5a87fa1a7e 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -29,6 +29,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 ] )
 [ INHERITS ( parent_table [, ... ] ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ]
+[ CONFIGURATION ( partition_bound_auto_spec ) ]
 [ USING method ]
 [ WITH ( storage_parameter [= value] [, ... ] ) | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
@@ -41,6 +42,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 [, ... ]
 ) ]
 [ PARTITION BY { RANGE | LIST | HASH } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ]
+[ CONFIGURATION ( partition_bound_auto_spec ) ]
 [ USING method ]
 [ WITH ( storage_parameter [=

Re: logicalrep_message_type throws an error

2023-07-17 Thread Alvaro Herrera

On 2023-Jul-17, Ashutosh Bapat wrote:

> Prologue of psprintf() says
> 
> * Errors are not returned to the caller, but are reported via elog(ERROR)
> * in the backend, or printf-to-stderr-and-exit() in frontend builds.
> * One should therefore think twice about using this in libpq.
> 
> If an error occurs in psprintf(), it will throw an error which will
> override the original error. I think we should avoid any stuff that
> throws further errors.

Ooh, yeah, this is an excellent point.  I agree it would be better to
avoid psprintf() here and anything that adds more failure modes.  Let's
just do the thing in the original patch you submitted, to ensure all
these strings are compile-time constants; that's likely the most robust.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)

Re: Getting rid of OverrideSearhPath in namespace.c

2023-07-17 Thread Aleksander Alekseev

Hi,

> As a follow-up for the CVE-2023-2454 fix, I think that it makes sense to
> completely remove unsafe functions
> PushOverrideSearchPath()/PopOverrideSearchPath(), which are not used in the
> core now.
> Please look at the patch attached.
>
> [...]
>
> What do you think?

+1 to remove dead code.

The proposed patch however removes get_collation_oid(), apparently by
mistake. Other than that the patch looks fine and passes `make
installcheck-world`.

I added an entry to the nearest CF [1].

> Beside that, maybe it's worth to rename three functions in "Override" in
> their names: GetOverrideSearchPath(), CopyOverrideSearchPath(),
> OverrideSearchPathMatchesCurrent(), and then maybe struct OverrideSearchPath.
> Noah Misch proposed name GetSearchPathMatcher() for the former.

+1 as well. I added the corresponding 0002 patch.

[1] https://commitfest.postgresql.org/44/4447/

-- 
Best regards,
Aleksander Alekseev


v2-0002-Rename-OverrideSearchPath-to-SearchPathMatcher.patch
Description: Binary data


v2-0001-Remove-unused-search_path-override-stack.patch
Description: Binary data

Re: logicalrep_message_type throws an error

2023-07-17 Thread Ashutosh Bapat

On Sat, Jul 15, 2023 at 12:57 PM Amit Kapila  wrote:
> >
> > Since the numerical value is important only in invalid message type
> > cases, how about using a format like "??? (88)" in unknown message
> > type cases, in both error and context messages?
> >
>
> Do you have something like attached in mind?

Prologue of psprintf() says

* Errors are not returned to the caller, but are reported via elog(ERROR)
* in the backend, or printf-to-stderr-and-exit() in frontend builds.
* One should therefore think twice about using this in libpq.

If an error occurs in psprintf(), it will throw an error which will
override the original error. I think we should avoid any stuff that
throws further errors.

-- 
Best Wishes,
Ashutosh Bapat

Re: ObjectIdGetDatum() missing from SearchSysCache*() callers

2023-07-17 Thread Zhang Mingli

Hi,

Regards,
Zhang Mingli
On Jul 17, 2023 at 21:09 +0800, Zhang Mingli , wrote:
> sequence_options
And inside pg_sequence_parameters:
pgstuple = SearchSysCache1(SEQRELID, relid);

Re: ObjectIdGetDatum() missing from SearchSysCache*() callers

2023-07-17 Thread Zhang Mingli

Hi

Regards,
Zhang Mingli
On Jul 17, 2023 at 19:10 +0800, Michael Paquier , wrote:
> Hi all,
>
> While scanning the code, I have noticed that a couple of code paths
> that do syscache lookups are passing down directly Oids rather than
> Datums. I think that we'd better be consistent here, even if there is
> no actual bug.
>
> I have noticed 11 callers of SearchSysCache*() that pass down
> an Oid instead of a Datum.
>
> Thoughts or comments?
> --
> Michael
LGTM, and there are two functions missed, in sequence_options

 pgstuple = SearchSysCache1(SEQRELID, relid);

Shall we fix that too?

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2023-07-17 Thread Jelte Fennema

Rebased again to resolve some conflicts


v22-0003-Add-non-blocking-version-of-PQcancel.patch
Description: Binary data


v22-0002-Return-2-from-pqReadData-on-EOF.patch
Description: Binary data


v22-0004-Start-using-new-libpq-cancel-APIs.patch
Description: Binary data

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-07-17 Thread Amit Kapila

On Fri, Jun 30, 2023 at 7:29 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> I have analyzed more, and concluded that there are no difference between 
> manual
> and shutdown checkpoint.
>
> The difference was whether the CHECKPOINT record has been decoded or not.
> The overall workflow of this test was:
>
> 1. do INSERT
> (2. do CHECKPOINT)
> (3. decode CHECKPOINT record)
> 4. receive feedback message from standby
> 5. do shutdown CHECKPOINT
>
> At step 3, the walsender decoded that WAL and set candidate_xmin_lsn. The 
> stucktrace was:
> standby_decode()->SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot().
>
> At step 4, the confirmed_flush of the slot was updated, but 
> ReplicationSlotSave()
> was executed only when the slot->candidate_xmin_lsn had valid lsn. If step 2 
> and
> 3 are misssed, the dirty flag is not set and the change is still on the 
> memory.
>
> FInally, the CHECKPOINT was executed at step 5. If step 2 and 3 are misssed 
> and
> the patch from Julien is not applied, the updated value will be discarded. 
> This
> is what I observed. The patch forces to save the logical slot at the shutdown
> checkpoint, so the confirmed_lsn is save to disk at step 5.
>

I see your point but there are comments in walsender.c which indicates
that we also wait for step-5 to get replicated. See [1] and comments
atop walsender.c. If this is true then we don't need a special check
as you have in patch 0003 or at least it doesn't seem to be required
in all cases.

[1] -
/*
* When SIGUSR2 arrives, we send any outstanding logs up to the
* shutdown checkpoint record (i.e., the latest record), wait for
* them to be replicated to the standby, and exit. ...
*/

-- 
With Regards,
Amit Kapila.

Re: Protect extension' internal tables - how?

2023-07-17 Thread Aleksander Alekseev

Hi,

> Could you please advise or give some hint on what is the correct (and
> secure) way to implement this?
>
> Currently I use the owner of the extension as owner when creating
> such a table inside the function, but maybe there are some pitfalls
> in this kind of solution?

If the goal is to protect the user from an _accidental_ access to the
tables, placing them into a separate schema _my_extension_private or
something will be enough.

Otherwise consider using corresponding access control abilities of
PostgreSQL and creating functions with SECURITY DEFINER [1]. Be
mindful that your functions will become a target for privilege
escalation, so you should be extra careful with the implementation.

[1]: https://www.postgresql.org/docs/current/sql-createfunction.html

-- 
Best regards,
Aleksander Alekseev

Re: ObjectIdGetDatum() missing from SearchSysCache*() callers

2023-07-17 Thread Aleksander Alekseev

Hi,

> I have noticed 11 callers of SearchSysCache*() that pass down
> an Oid instead of a Datum.

Good catch.

> I think that we'd better be consistent here, even if there is
> no actual bug.
>

+1

-- 
Best regards,
Aleksander Alekseev

Re: Make mesage at end-of-recovery less scary.

2023-07-17 Thread Aleksander Alekseev

Hi,

> Thanks for checking it!
>
> I think 4ac30ba4f2 is that, which changes a few error
> messages. Addition to rebasing, I rewrote some code comments of
> xlogreader.c and revised the additional test script.

Thanks for working on this, it bugged me for a while. I noticed that
cfbot is not happy with the patch so I rebased it.
postgresql:pg_waldump test suite didn't pass after the rebase. I fixed
it too. Other than that the patch LGTM so I'm not changing its status
from "Ready for Committer".

It looks like the patch was moved between the commitfests since
2020... If there is anything that may help merging it into PG17 please
let me know.

-- 
Best regards,
Aleksander Alekseev


v26-0001-Make-End-Of-Recovery-error-less-scary.patch
Description: Binary data

ObjectIdGetDatum() missing from SearchSysCache*() callers

2023-07-17 Thread Michael Paquier

Hi all,

While scanning the code, I have noticed that a couple of code paths
that do syscache lookups are passing down directly Oids rather than
Datums.  I think that we'd better be consistent here, even if there is
no actual bug.

I have noticed 11 callers of SearchSysCache*() that pass down
an Oid instead of a Datum.

Thoughts or comments?
--
Michael
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 67b743e251..eb2b8d84c3 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1330,7 +1330,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
 	indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
 
 	/* Fetch options of index if any */
-	classTuple = SearchSysCache1(RELOID, oldIndexId);
+	classTuple = SearchSysCache1(RELOID, ObjectIdGetDatum(oldIndexId));
 	if (!HeapTupleIsValid(classTuple))
 		elog(ERROR, "cache lookup failed for relation %u", oldIndexId);
 	optionDatum = SysCacheGetAttr(RELOID, classTuple,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 4dc029f91f..727f151750 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -10147,7 +10147,7 @@ CloneFkReferenced(Relation parentRel, Relation partitionRel)
 		Oid			deleteTriggerOid,
 	updateTriggerOid;
 
-		tuple = SearchSysCache1(CONSTROID, constrOid);
+		tuple = SearchSysCache1(CONSTROID, ObjectIdGetDatum(constrOid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for constraint %u", constrOid);
 		constrForm = (Form_pg_constraint) GETSTRUCT(tuple);
@@ -10353,7 +10353,7 @@ CloneFkReferencing(List **wqueue, Relation parentRel, Relation partRel)
 		Oid			insertTriggerOid,
 	updateTriggerOid;
 
-		tuple = SearchSysCache1(CONSTROID, parentConstrOid);
+		tuple = SearchSysCache1(CONSTROID, ObjectIdGetDatum(parentConstrOid));
 		if (!HeapTupleIsValid(tuple))
 			elog(ERROR, "cache lookup failed for constraint %u",
  parentConstrOid);
@@ -13723,7 +13723,7 @@ ATExecAlterColumnGenericOptions(Relation rel,
 
 	/* First, determine FDW validator associated to the foreign table. */
 	ftrel = table_open(ForeignTableRelationId, AccessShareLock);
-	tuple = SearchSysCache1(FOREIGNTABLEREL, rel->rd_id);
+	tuple = SearchSysCache1(FOREIGNTABLEREL, ObjectIdGetDatum(rel->rd_id));
 	if (!HeapTupleIsValid(tuple))
 		ereport(ERROR,
 (errcode(ERRCODE_UNDEFINED_OBJECT),
@@ -16186,7 +16186,8 @@ ATExecGenericOptions(Relation rel, List *options)
 
 	ftrel = table_open(ForeignTableRelationId, RowExclusiveLock);
 
-	tuple = SearchSysCacheCopy1(FOREIGNTABLEREL, rel->rd_id);
+	tuple = SearchSysCacheCopy1(FOREIGNTABLEREL,
+ObjectIdGetDatum(rel->rd_id));
 	if (!HeapTupleIsValid(tuple))
 		ereport(ERROR,
 (errcode(ERRCODE_UNDEFINED_OBJECT),
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 6b42d4fc34..ce77a055e5 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -1935,7 +1935,7 @@ AddRoleMems(Oid currentUserId, const char *rolename, Oid roleid,
 HeapTuple	mrtup;
 Form_pg_authid mrform;
 
-mrtup = SearchSysCache1(AUTHOID, memberid);
+mrtup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(memberid));
 if (!HeapTupleIsValid(mrtup))
 	elog(ERROR, "cache lookup failed for role %u", memberid);
 mrform = (Form_pg_authid) GETSTRUCT(mrtup);
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 7c5d9110fb..5436cc302d 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -4313,7 +4313,7 @@ get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
 			Datum		datum;
 			PartitionBoundSpec *bspec;
 
-			tuple = SearchSysCache1(RELOID, inhrelid);
+			tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(inhrelid));
 			if (!HeapTupleIsValid(tuple))
 elog(ERROR, "cache lookup failed for relation %u", inhrelid);
 
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 7a2b5e57ff..65f3d5a5e6 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -183,7 +183,7 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
 		PartitionBoundSpec *boundspec = NULL;
 
 		/* Try fetching the tuple from the catcache, for speed. */
-		tuple = SearchSysCache1(RELOID, inhrelid);
+		tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(inhrelid));
 		if (HeapTupleIsValid(tuple))
 		{
 			Datum		datum;
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 883e09393a..27eabb80ab 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -5334,13 +5334,13 @@ get_rolespec_tuple(const RoleSpec *role)
 
 		case ROLESPEC_CURRENT_ROLE:
 		case ROLESPEC_CURRENT_USER:
-			tuple = SearchSysCache1(AUTHOID, GetUserId());
+			tuple = SearchSysCache1(AUTHOID, ObjectIdGetDatum(GetUserId()));
 			if (!HeapTupleIsValid(tuple))
 elog(ERROR, "cache

Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

2023-07-17 Thread Amit Kapila

On Mon, Jul 17, 2023 at 11:51 AM Önder Kalacı  wrote:
>
>> >
>> > The last line seems repetitive to me. So, I have removed it. Apart
>> > from that patch looks good to me. Sergie, Peter, and others, any
>> > thoughts?
>>
>> The v5 patch LGTM.
>>
>
> Overall looks good to me as well. Please consider the following as an 
> optional improvement.
>
> My only minor concern here is the use of the term "default operator class". 
> It is accurate to use it. However, as far as I know, not many users can 
> follow that easily. I think the "pkey/repl full" suggestion gives some tip, 
> but I wonder if we add something like the following to the text such that 
> users can understand more:
>
>>  do not have a default operator class for B-tree or Hash.
>>
>> + If  there is no default operator class, usually the type does not have an 
>> equality operator.
>>

This sounds a bit generic to me. If required, we can give an example
so that it is easier to understand. But OTOH, I see that we use
"default operator class" in the docs and error messages, so this
should be probably okay.


-- 
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-07-17 Thread Peter Smith

Hi Kuroda-san, I haven't looked at this thread for a very long time so
to re-familiarize myself with it I read all the latest v16-0001 patch.

Here are a number of minor review comments I noted in passing:

==
Commit message

1.
For pg_dump this commit includes a new option called
"--logical-replication-slots-only".
This option can be used to dump logical replication slots. When this option is
specified, the slot_name, plugin, and two_phase parameters are extracted from
pg_replication_slots. An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

~

This part doesn't do the actual execution, so maybe slightly reword this.

BEFORE
An SQL file is then generated which executes
pg_create_logical_replication_slot() with the extracted parameters.

SUGGESTION
An SQL file that executes pg_create_logical_replication_slot() with
the extracted parameters is generated.

~~~

2.
For pg_upgrade, when '--include-logical-replication-slots' is
specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and
restores from the
dump. Note that we cannot dump replication slots at the same time as the schema
dump because we need to separate the timing of restoring replication slots and
other objects. Replication slots, in  particular, should not be restored before
executing the pg_resetwal command because it will remove WALs that are required
by the slots.

~~~

Maybe "restores from the dump" can be described more?

BEFORE
...and restores from the dump.

SUGGESTION
...and restores the slots using the
pg_create_logical_replication_slots() statements that the dump
generated (see above).

==
src/bin/pg_dump/pg_dump.c

3. help

+
+ /*
+ * The option --logical-replication-slots-only is used only by pg_upgrade
+ * and should not be called by users, which is why it is not listed.
+ */
  printf(_("  --no-commentsdo not dump comments\n"));
~

/not listed./not exposed by the help./

~~~

4. getLogicalReplicationSlots

+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 16)
+ return;

PG16 is already in beta. I think this should now be changed to 17, right?

==
src/bin/pg_upgrade/check.c

5. check_new_cluster

+ /*
+ * Do additional works if --include-logical-replication-slots is required.
+ * These must be done before check_new_cluster_is_empty() because the
+ * slot_arr attribute of the new_cluster will be checked in the function.
+ */

SUGGESTION (minor rewording/grammar)
Do additional work if --include-logical-replication-slots was
specified. This must be done before check_new_cluster_is_empty()
because the slot_arr attribute of the new_cluster will be checked in
that function.

~~~

6. check_new_cluster_is_empty

+ /*
+ * If --include-logical-replication-slots is required, check the
+ * existence of slots.
+ */
+ if (user_opts.include_logical_slots)
+ {
+ LogicalSlotInfoArr *slot_arr = _cluster.dbarr.dbs[dbnum].slot_arr;
+
+ /* if nslots > 0, report just first entry and exit */
+ if (slot_arr->nslots)
+ pg_fatal("New cluster database \"%s\" is not empty: found logical
replication slot \"%s\"",
+ new_cluster.dbarr.dbs[dbnum].db_name,
+ slot_arr->slots[0].slotname);
+ }
+

6a.
There are a number of places in this function using
"new_cluster.dbarr.dbs[dbnum].XXX"

It is OK but maybe it would be tidier to up-front assign a local
variable for this?

DbInfo *pDbInfo = _cluster.dbarr.dbs[dbnum];

~

6b.
The above code adds an unnecessary blank line in the loop that was not
there previously.

~~~

7. check_for_parameter_settings

+/*
+ * Verify parameter settings for creating logical replication slots
+ */
+static void
+check_for_parameter_settings(ClusterInfo *new_cluster)

7a.
I felt this might have some missing words so it was meant to say:

SUGGESTION
Verify the parameter settings necessary for creating logical replication slots.

~

7b.
Maybe you can give this function a better name because there is no
hint in this generic name that it has anything to do with replication
slots.

~~~

8.
+ /* --include-logical-replication-slots can be used since PG16. */
+ if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1500)
+ return;

PG16 is already in beta, so the version number (1500) and the comment
mentioning PG16 are outdated aren't they?

==
src/bin/pg_upgrade/info.c

9.
 static void print_rel_infos(RelInfoArr *rel_arr);
-
+static void print_slot_infos(LogicalSlotInfoArr *slot_arr);

The removal of the existing blank line seems not a necessary part of this patch.

~~~

10. get_logical_slot_infos_per_db

+ char query[QUERY_ALLOC];
+
+ query[0] = '\0'; /* initialize query string to empty */
+
+ snprintf(query, sizeof(query),
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");

Does the initial assignment query[0] = '\0'; acheive anything? IIUC,
the next

Re: Some revises in adding sorting path

2023-07-17 Thread Richard Guo

On Mon, Jul 10, 2023 at 5:37 PM Daniel Gustafsson  wrote:

> > On 28 Mar 2023, at 21:59, David Rowley  wrote:
> > I'll mark this as waiting on author while you work on that.
>
> Richard: have you had a chance to incorporate the tests proposed by David
> in
> this thread into your patch?

I just updated the patches according to David's reviews.  So I'll change
it back to needs review.

Thanks
Richard

Re: Some revises in adding sorting path

2023-07-17 Thread Richard Guo

On Wed, Mar 29, 2023 at 4:00 AM David Rowley  wrote:

> If you write some tests for this code, it will be useful to prove that
> it actually does something, and also that it does not break again in
> the future. I don't really want to just blindly copy the pattern used
> in 3c6fc5820 for creating incremental sort paths if it's not useful
> here. It would be good to see tests that make an Incremental Sort path
> using the code you're changing.

Thanks for the suggestion.  I've managed to come up with a query that
gets the codes being changed here to perform an incremental sort.

set min_parallel_index_scan_size to 0;
set enable_seqscan to off;

Without making those parallel paths:

explain (costs off)
select * from tenk1 where four = 2 order by four, hundred,
parallel_safe_volatile(thousand);
  QUERY PLAN
--
 Incremental Sort
   Sort Key: hundred, (parallel_safe_volatile(thousand))
   Presorted Key: hundred
   ->  Gather Merge
 Workers Planned: 3
 ->  Parallel Index Scan using tenk1_hundred on tenk1
   Filter: (four = 2)
(7 rows)

and with those parallel paths:

explain (costs off)
select * from tenk1 where four = 2 order by four, hundred,
parallel_safe_volatile(thousand);
  QUERY PLAN
---
 Gather Merge
   Workers Planned: 3
   ->  Incremental Sort
 Sort Key: hundred, (parallel_safe_volatile(thousand))
 Presorted Key: hundred
 ->  Parallel Index Scan using tenk1_hundred on tenk1
   Filter: (four = 2)
(7 rows)

I've added two tests for the code changes in create_ordered_paths in the
new patch.

> Same for the 0003 patch.

For the code changes in gather_grouping_paths, I've managed to come up
with a query that makes an explicit Sort atop cheapest partial path.

explain (costs off)
select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
 QUERY PLAN

 Finalize GroupAggregate
   Group Key: twenty, (parallel_safe_volatile(two))
   ->  Gather Merge
 Workers Planned: 4
 ->  Sort
   Sort Key: twenty, (parallel_safe_volatile(two))
   ->  Partial HashAggregate
 Group Key: twenty, parallel_safe_volatile(two)
 ->  Parallel Seq Scan on tenk1
(9 rows)

Without this logic the plan would look like:

explain (costs off)
select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
 QUERY PLAN

 Finalize GroupAggregate
   Group Key: twenty, (parallel_safe_volatile(two))
   ->  Sort
 Sort Key: twenty, (parallel_safe_volatile(two))
 ->  Gather
   Workers Planned: 4
   ->  Partial HashAggregate
 Group Key: twenty, parallel_safe_volatile(two)
 ->  Parallel Seq Scan on tenk1
(9 rows)

This test is also added in the new patch.

But I did not find a query that makes an incremental sort in this case.
After trying for a while it seems to me that we do not need to consider
incremental sort in this case, because for a partial path of a grouped
or partially grouped relation, it is either unordered (HashAggregate or
Append), or it has been ordered by the group_pathkeys (GroupAggregate).
It seems there is no case that we'd have a partial path that is
partially sorted.

So update the patches to v2.

Thanks
Richard

v2-0001-Revise-how-we-sort-partial-paths-in-create_ordered_paths.patch
Description: Binary data

v2-0002-Revise-how-we-sort-partial-paths-in-gather_grouping_paths.patch
Description: Binary data

Re: CommandStatus from insert returning when using a portal.

2023-07-17 Thread Alvaro Herrera

On 2023-Jul-14, Dave Cramer wrote:

> David,
> 
> I will try to get a tcpdump file. Doing this in libpq seems challenging as
> I'm not aware of how to create a portal in psql.

You can probably have a look at src/test/modules/libpq_pipeline/libpq_pipeline.c
as a basis to write some test code for this.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"If you want to have good ideas, you must have many ideas.  Most of them
will be wrong, and what you have to learn is which ones to throw away."
 (Linus Pauling)

Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-17 Thread Önder Kalacı

Hi,


> I've attached a draft patch for this idea.


I think it needs a rebase after edca3424342da323499a1998d18a888283e52ac7.

Also, as discussed in [1], I think one improvement we had was to
keep IsIndexUsableForReplicaIdentityFull() in a way that it is easier to
read & better documented in the code. So, it would be nice to stick to that.

Overall, the proposed changes make sense to me as well. Once the patch is
ready, I'm happy to review & test in detail.

Thanks,
Onder


[1]
https://www.postgresql.org/message-id/CAA4eK1Jcyrxt_84wt2%3DQnOcwwJEC2et%2BtCLjAuTXzE6N3FXqQw%40mail.gmail.com

Re: remaining sql/json patches

2023-07-17 Thread Erik Rijkers


Op 7/17/23 om 07:00 schreef jian he:

hi.
seems there is no explanation about, json_api_common_syntax in
functions-json.html

I can get json_query full synopsis from functions-json.html as follows:
json_query ( context_item, path_expression [ PASSING { value AS
varname } [, ...]] [ RETURNING data_type [ FORMAT JSON [ ENCODING UTF8
] ] ] [ { WITHOUT | WITH { CONDITIONAL | [UNCONDITIONAL] } } [ ARRAY ]
WRAPPER ] [ { KEEP | OMIT } QUOTES [ ON SCALAR STRING ] ] [ { ERROR |
NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression } ON EMPTY ]
[ { ERROR | NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression }
ON ERROR ])

seems doesn't  have a full synopsis for json_table? only partial one
by  one explanation.



FWIW, Re: json_api_common_syntax

An (old) pdf that I have (ISO/IEC TR 19075-6 First edition 2017-03)
contains the below specification.  It's probably the source of the 
particular term.  It's easy to see how it maps onto the current v7 
SQL/JSON implementation.   (I don't know if it has changed in later 
incarnations.)



-- 8< 
5.2  JSON API common syntax

The SQL/JSON query functions all need a path specification, the JSON 
value to be input to that path specification for querying and 
processing, and optional parameter values passed to the path 
specification. They use a common syntax:


 ::=

 [ AS  ]
 [  ]

 ::=
   

 ::=
  

 ::=
   PASSING  [ {   } ]

 ::=
AS 

-- 8< 

And yes, we might need a readable translation of that in the docs 
although it might be easier to just get get rid of the term 
'json_api_common_syntax'.


HTH,

Erik Rijkers

Re: doc: clarify the limitation for logical replication when REPILICA IDENTITY is FULL

2023-07-17 Thread Önder Kalacı

Hi,

>
> > The last line seems repetitive to me. So, I have removed it. Apart
> > from that patch looks good to me. Sergie, Peter, and others, any
> > thoughts?
>
> The v5 patch LGTM.
>
>
Overall looks good to me as well. Please consider the following as an
optional improvement.

My only minor concern here is the use of the term "default operator class".
It is accurate to use it. However, as far as I know, not many users can
follow that easily. I think the "pkey/repl full" suggestion gives some tip,
but I wonder if we add something like the following to the text such that
users can understand more:

 do not have a default operator class for B-tree or Hash.

+ If  there is no default operator class, usually the type does not have an
> equality operator.

However,  this limitation ..


Thanks,
Onder

61 matches

Mail list logo