date:20220718

On Thu, Jun 30, 2022 at 04:26:54PM -0700, Jacob Champion wrote:
> Yeah, maybe it'd be better to tell the user the correct context for an
> otherwise-valid option ("the 'sslcert' option may only be applied to
> USER MAPPING"), and avoid the option dump entirely?

Yes, that would be nice.  Now, this HINT has been an annoyance in the
context of the regression tests when adding features entirely
unrelated to postgres_fdw, at least for me.  I would be more tempted
to get rid of it entirely, FWIW.
--
Michael


signature.asc
Description: PGP signature

Re: Allow file inclusion in pg_hba and pg_ident files

On Mon, Jul 18, 2022 at 03:11:51PM +0800, Julien Rouhaud wrote:
> So first, even if we can test 99% of the features with just testing the views
> output, I think it's should use the TAP framework since the tests will have to
> mess with the pg_ident/pg_hba files.  It's way easier to modify the auth 
> files,
> and it uses a dedicated instance so we don't have to worry about breaking 
> other
> test that would run concurrently.

Agreed.

> That may still be workable by splitting the output per newline (and possibly
> removing the first prompt before sending the query text), and remove the first
> and last entry (assuming you want to test somewhat sane data, and not e.g. run
> the regression test on a database containing a newline), but then you have to
> also account for possible escape sequences, for instance if you use
> enable-bracketed-paste.  In that case, the output becomes something like
> 
> dbname=# SELECT 1;
> [?2004l
> 1
> [?2004hpostgres=#
> 
> It could probably be handled with some regexp to remove escape sequences and
> remove empty lines, but it seems like really fragile, and thus a very bad 
> idea.

Hmm.  Indeed, that sounds fragile.  And I don't really think that we
should make more testing infrastructure a requirement for this patch
as being able to maintain a connection while running the tests is
something we've bumped on for a bit of time.

> I'm not really sure what should be done here.  The best compromise I can think
> of is to split the tests in 3 parts:
> 
> 1) view reporting with various inclusions using safe_psql()

You mean in the case where the HBA and indent files can be loaded,
so as it is possible to peek at the system views without the
EXEC_BACKEND problem, right?

> 2) log error reporting

This one should be reliable and stable enough by parsing the logs of
the backend, thanks to connect_ok() and connect_fails().

> 3) view reporting with various inclusions errors, using safe_psql()
> 
> And when testing 3), detect first if we can still connect after introducing
> errors.  If not, assume this is Windows / EXEC_BACKEND and give up here 
> without
> reporting an error.  Otherwise continue, and fail the test if we later can't
> connect anymore.  As discussed previously, detecting that the build is using
> the fork emulation code path doesn't seem worthwhile so guessing it from the
> psql error may be a better approach.

Yeah, we could do that.  Now we may also fail on other patterns, so we
would need to make sure that a set of expected error outputs are the
ones generated?  I'd be fine to give up testing the error output
generated in the system views at the end.  Without a persistent
connection state with the same kind of APIs as any of the drivers able
to do so, that's going to be a PITA to maintain.
--
Michael


signature.asc
Description: PGP signature

Re: fix stats_fetch_consistency value in postgresql.conf.sample

2022-07-18 Thread Kyotaro Horiguchi

At Wed, 13 Jul 2022 18:54:45 -0500, Justin Pryzby  wrote 
in 
> On Thu, Jul 14, 2022 at 08:46:02AM +0900, Michael Paquier wrote:
> > On Wed, Jul 13, 2022 at 12:30:00PM -0500, Justin Pryzby wrote:
> > > How did you make this list ?  Was it by excluding things that failed for 
> > > you ?

Yes. I didn't confirm each variable. They are the variables differ on
RHEL-family OSes.  io_concurrency differs according to
USE_PREFETCH. Regarding to effects of macro definitions, I searched
guc.c for non-GUC_NOT_IN_SAMPLE variables with macro-affected defaults.

WIN32 affects update_process_title
USE_PREFETCH affects effective_io_concurrency and maintenance_io_concurrency
HAVE_UNIX_SOCKETS affects unix_socket_directories and unix_socket_directories
USE_SSL affects ssl_ecdh_curve
USE_OPENSSL affects ssl_ciphers
HAVE_SYSLOG affects syslog_facility

Different from most of the variables already in the exclusion list,
these could be changed at build time, but I haven't found a sensible
way to do that. Otherwise we need to add them to the exclusion list...

> > > cfbot is currently failing due to io_concurrency on windows.
> > > I think there are more GUC which should be included here.
> > > 
> > > http://cfbot.cputube.org/kyotaro-horiguchi.html
> > 
> > FWIW, I am not really a fan of making this test depend on a hardcoded
> > list of GUCs.  The design strength of the existing test is that we
> > don't have such a dependency now, making less to think about in terms
> > of maintenance in the long-term, even if this is now run
> > automatically.
> 
> It doesn't really need to be stated that an inclusive list wouldn't be useful.

+1

> That's a list of GUCs to be excluded.
> Which is hardly different from the pre-existing list of exceptions.
> 
> # Ignore some exceptions.
> next if $param_name eq "include";
> next if $param_name eq "include_dir";
> next if $param_name eq "include_if_exists";
> 
> -- Exceptions are transaction_*.
> SELECT name FROM tab_settings_flags
>   WHERE NOT no_show_all AND no_reset_all
>   ORDER BY 1;
>   name  
> 
>  transaction_deferrable
>  transaction_isolation
>  transaction_read_only
> (3 rows)
> 
> How else do you propose to make this work for guc whose defaults vary by
> platform in guc.c or in initdb ?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Allow placeholders in ALTER ROLE w/o superuser

2022-07-18 Thread Steve Chavez

Thanks a lot for the feedback Nathan.

Taking your options into consideration, for me the correct behaviour should
be:

- The ALTER ROLE placeholder should always be stored with a PGC_USERSET
GucContext. It's a placeholder anyway, so it should be the less restrictive
one. If the user wants to define it as PGC_SUSET or other this should be
done through a custom extension.
- When an extension claims the placeholder, we should check the
DefineCustomXXXVariable GucContext with PGC_USERSET. If there's a match,
then the value gets applied, otherwise WARN or ERR.
  The role GUCs get applied at login time right? So at this point we can
WARN or ERR about the defined role GUCs.

What do you think?

On Mon, 18 Jul 2022 at 19:03, Nathan Bossart 
wrote:

> On Fri, Jul 01, 2022 at 04:40:27PM -0700, Nathan Bossart wrote:
> > On Sun, Jun 05, 2022 at 11:20:38PM -0500, Steve Chavez wrote:
> >> However, defining placeholders at the role level require superuser:
> >>
> >>   alter role myrole set my.username to 'tomas';
> >>   ERROR:  permission denied to set parameter "my.username"
> >>
> >> Which is inconsistent and surprising behavior. I think it doesn't make
> >> sense since you can already set them at the session or transaction
> >> level(SET LOCAL my.username = 'tomas'). Enabling this would allow
> sidecar
> >> services to store metadata scoped to its pertaining role.
> >>
> >> I've attached a patch that removes this restriction. From my testing,
> this
> >> doesn't affect permission checking when an extension defines its custom
> GUC
> >> variables.
> >>
> >>DefineCustomStringVariable("my.custom", NULL, NULL,  &my_custom,
> NULL,
> >>   PGC_SUSET, ..);
> >>
> >> Using PGC_SUSET or PGC_SIGHUP will fail accordingly. Also no tests fail
> >> when doing "make installcheck".
> >
> > IIUC you are basically proposing to revert a6dcd19 [0], but it is not
> clear
> > to me why that is safe.  Am I missing something?
>
> I spent some more time looking into this, and I think I've constructed a
> simple example that demonstrates the problem with removing this
> restriction.
>
> postgres=# CREATE ROLE test CREATEROLE;
> CREATE ROLE
> postgres=# CREATE ROLE other LOGIN;
> CREATE ROLE
> postgres=# GRANT CREATE ON DATABASE postgres TO other;
> GRANT
> postgres=# SET ROLE test;
> SET
> postgres=> ALTER ROLE other SET plperl.on_plperl_init = 'test';
> ALTER ROLE
> postgres=> \c postgres other
> You are now connected to database "postgres" as user "other".
> postgres=> CREATE EXTENSION plperl;
> CREATE EXTENSION
> postgres=> SHOW plperl.on_plperl_init;
>  plperl.on_plperl_init
> ---
>  test
> (1 row)
>
> In this example, the non-superuser role sets a placeholder GUC for another
> role.  This GUC becomes a PGC_SUSET GUC when plperl is loaded, so a
> non-superuser role will have successfully set a PGC_SUSET GUC.  If we had a
> record of who ran ALTER ROLE, we might be able to apply appropriate
> permissions checking when the module is loaded, but this information
> doesn't exist in pg_db_role_setting.  IIUC we have the following options:
>
> 1. Store information about who ran ALTER ROLE.  I think there are a
>couple of problems with this.  For example, what happens if the
>grantor was dropped or its privileges were altered?  Should we
>instead store the context of the user (i.e., PGC_USERSET or
>PGC_SUSET)?  Do we need to add entries to pg_depend?
> 2. Ignore or ERROR for any ALTER ROLE settings for custom GUCs.
> Since
>we don't know who ran ALTER ROLE, we can't trust the value.
> 3. Require superuser to use ALTER ROLE for a placeholder.  This is
> what
>we do today.  Since we know a superuser set the value, we can
> always
>apply it when the custom GUC is finally defined.
>
> If this is an accurate representation of the options, it seems clear why
> the superuser restriction is in place.
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
>

Re: Windows now has fdatasync()

On Tue, Jul 19, 2022 at 4:54 PM Michael Paquier  wrote:
> Do you still need HAVE_DECL_FDATASYNC?

I guess so, because that is currently used for macOS, and with this
patch would also be used to control the declaration for Windows.  The
alternative would be to explicitly test for WIN32 or __darwin__.

The reason we need it for macOS is that they have had fdatasync
function for many years now, and configure detects it, but they
haven't ever declared it in a header, so we (accidentally?) do it in
c.h.  We didn't set that up for Apple!  The commit that added it was
33cc5d8a, which was about a month before Apple shipped the first
version of OS X (and long before they defined the function).  So there
must have been another Unix with that problem, lost in the mists of
time.

Re: Making pg_rewind faster

On Mon, Jul 18, 2022 at 05:14:00PM +, Justin Kwan wrote:
> Thank you for taking a look at this and that sounds good. I will
> send over a patch compatible with Postgres v16.

+$node_2->psql(
+   'postgres',
+   "SELECT extract(epoch from modification) FROM 
pg_stat_file('pg_wal/00010003');",
+   stdout => \my $last_common_tli1_wal_last_modified_at);
Please note that you should not rely on the FS-level stats for
anything that touches the WAL segments.  A rough guess about what you
could here to make sure that only the set of WAL segments you are
looking for is being copied over would be to either:
- Scan the logs produced by pg_rewind and see if the segments are
copied or not, depending on the divergence point (aka the last
checkpoint before WAL forked).
- Clean up pg_wal/ in the target node before running pg_rewind,
checking that only the segments you want are available once the
operation completes.
--
Michael


signature.asc
Description: PGP signature

Re: System catalog documentation chapter

2022-07-18 Thread Peter Smith

On Tue, Jul 19, 2022 at 1:22 PM Tom Lane  wrote:
>
> Bruce Momjian  writes:
> > On Sat, Jul 16, 2022 at 10:53:17AM +0200, Peter Eisentraut wrote:
> >> Maybe this whole notion that "system views" is one thing is not suitable.
>
> > Are you thinking we should just call the chapter "System Catalogs and
> > Views" and just place them alphabetically in a single chapter?
>
> I didn't think that was Peter's argument at all.  He's complaining
> that "system views" isn't a monolithic category, which is a reasonable
> point, especially since we have a bunch of built-in views that appear
> in other chapters.  But to then also confuse them with catalogs isn't
> improving the situation.
>

My original post was prompted when I was scrolling in the
table-of-contents for chapter 53 "System Catalogs". unable to find a
Catalog because I did not realise it was really a View. It was only
when I couldn't find it alphabetically that I noticed there was
*another* appended list of Views, but then the "System Views" heading
seemed strangely buried at the same heading level as everything
else...  and although there was an "Overview" section for Catalogs
there was no "Overview" section for the Views...

Maybe I was only seeing the tip of the iceberg. I'm not sure anymore
what the best solution is. I do prefer the recent changes over how it
used to be, but perhaps they also introduce a whole new set of
problems.

---

(It used to look like this)

Chapter 53. "System Catalogs"
==
53.1. Overview
53.2. pg_aggregate
53.3. pg_am
53.4. pg_amop
53.5. pg_amproc
...
53.66. System Views <--- 2nd heading just hiding here
53.67. pg_available_extensions
53.68. pg_available_extension_versions
53.69. pg_backend_memory_contexts
53.70. pg_config
...

--
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: Windows now has fdatasync()

On Mon, Jul 18, 2022 at 03:26:36PM +1200, Thomas Munro wrote:
> My plan now is to commit this patch so that problem #1 is solved, prod
> conchuela's owner to upgrade to solve #2, and wait until Tom shuts
> down prairiedog to solve #3.  Then we could consider removing the
> HAVE_FDATASYNC probe and associated #ifdefs when convenient.  For that
> reason, I'm not too bothered about the slight weirdness of defining
> HAVE_FDATASYNC on Windows even though that doesn't come from
> configure; it'd hopefully be short-lived.  Better ideas welcome,
> though.  Does that make sense?

Do you still need HAVE_DECL_FDATASYNC?  
--
Michael


signature.asc
Description: PGP signature

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-18 Thread Amit Kapila

On Tue, Jul 19, 2022 at 6:34 AM Masahiko Sawada  wrote:
>
> On Mon, Jul 18, 2022 at 1:12 PM Amit Kapila  wrote:
> >
> > On Fri, Jul 15, 2022 at 8:09 PM Masahiko Sawada  
> > wrote:
> > >
> > > This patch should have the fix for the issue that Shi yu reported. Shi
> > > yu, could you please test it again with this patch?
> > >
> >
> > Can you explain the cause of the failure and your fix for the same?
>
> @@ -1694,6 +1788,8 @@ out:
> /* be tidy */
> if (ondisk)
> pfree(ondisk);
> +   if (catchange_xip)
> +   pfree(catchange_xip);
>
> Regarding the above code in the previous version patch, looking at the
> generated assembler code shared by Shi yu offlist, I realized that the
> “if (catchange_xip)” is removed (folded) by gcc optimization. This is
> because we dereference catchange_xip before null-pointer check as
> follow:
>
> +   /* copy catalog modifying xacts */
> +   sz = sizeof(TransactionId) * catchange_xcnt;
> +   memcpy(ondisk_c, catchange_xip, sz);
> +   COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
> +   ondisk_c += sz;
>
> Since sz is 0 in this case, memcpy doesn’t do anything actually.
>
> By checking the assembler code, I’ve confirmed that gcc does the
> optimization for these code and setting
> -fno-delete-null-pointer-checks flag prevents the if statement from
> being folded. Also, I’ve confirmed that adding the check if
> "catchange.xcnt > 0” before the null-pointer check also can prevent
> that. Adding a check  if "catchange.xcnt > 0” looks more robust. I’ve
> added a similar check for builder->committed.xcnt as well for
> consistency. builder->committed.xip could have no transactions.
>

Good work. I wonder without comments this may create a problem in the
future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
freeing the memory any less robust. Also, for consistency, we can use
a similar check based on xcnt in the SnapBuildRestore to free the
memory in the below code:
+ /* set catalog modifying transactions */
+ if (builder->catchange.xip)
+ pfree(builder->catchange.xip);

-- 
With Regards,
Amit Kapila.

Re: In-placre persistance change of a relation

2022-07-18 Thread Kyotaro Horiguchi

(Mmm. I haven't noticed an annoying misspelling in the subejct X( )

> Thank you for checking that!  It got a wider attack by b0a55e4329
> (RelFileNumber). The commit message suggests "relfilenode" as files

Then, now I stepped on my own foot. Rebased also on nodefuncs
autogeneration.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From cdd0282eb373ce79b8495b9a1160fb8c5122315e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 19 Jul 2022 13:23:01 +0900
Subject: [PATCH v23 1/2] In-place table persistence change

Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data
rewriting, currently it runs heap rewrite which causes large amount of
file I/O.  This patch makes the command run without heap rewrite.
Addition to that, SET LOGGED while wal_level > minimal emits WAL using
XLOG_FPI instead of massive number of HEAP_INSERT's, which should be
smaller.

Also this allows for the cleanup of files left behind in the crash of
the transaction that created it.
---
 src/backend/access/rmgrdesc/smgrdesc.c|  49 ++
 src/backend/access/transam/README |  10 +
 src/backend/access/transam/xact.c |   7 +
 src/backend/access/transam/xlogrecovery.c |  18 +
 src/backend/catalog/storage.c | 561 +-
 src/backend/commands/tablecmds.c  | 267 --
 src/backend/replication/basebackup.c  |   9 +-
 src/backend/storage/buffer/bufmgr.c   |  85 
 src/backend/storage/file/fd.c |   4 +-
 src/backend/storage/file/reinit.c | 340 +
 src/backend/storage/smgr/md.c |  95 +++-
 src/backend/storage/smgr/smgr.c   |  32 ++
 src/backend/storage/sync/sync.c   |  21 +-
 src/bin/pg_rewind/parsexlog.c |  22 +
 src/bin/pg_rewind/pg_rewind.c |   1 -
 src/common/relpath.c  |  47 +-
 src/include/catalog/storage.h |   3 +
 src/include/catalog/storage_xlog.h|  43 +-
 src/include/common/relpath.h  |   9 +-
 src/include/storage/bufmgr.h  |   2 +
 src/include/storage/fd.h  |   1 +
 src/include/storage/md.h  |   8 +-
 src/include/storage/reinit.h  |  10 +-
 src/include/storage/smgr.h|  17 +
 src/tools/pgindent/typedefs.list  |   7 +
 25 files changed, 1485 insertions(+), 183 deletions(-)

diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index e0ee8a078a..2f92c06f70 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,46 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec->blkno, xlrec->flags);
 		pfree(path);
 	}
+	else if (info == XLOG_SMGR_UNLINK)
+	{
+		xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec;
+		char	   *path = relpathperm(xlrec->rlocator, xlrec->forkNum);
+
+		appendStringInfoString(buf, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_MARK)
+	{
+		xl_smgr_mark *xlrec = (xl_smgr_mark *) rec;
+		char	   *path = GetRelationPath(xlrec->rlocator.dbOid,
+		   xlrec->rlocator.spcOid,
+		   xlrec->rlocator.relNumber,
+		   InvalidBackendId,
+		   xlrec->forkNum, xlrec->mark);
+		char	   *action = "";
+
+		switch (xlrec->action)
+		{
+			case XLOG_SMGR_MARK_CREATE:
+action = "CREATE";
+break;
+			case XLOG_SMGR_MARK_UNLINK:
+action = "DELETE";
+break;
+		}
+
+		appendStringInfo(buf, "%s %s", action, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_BUFPERSISTENCE)
+	{
+		xl_smgr_bufpersistence *xlrec = (xl_smgr_bufpersistence *) rec;
+		char	   *path = relpathperm(xlrec->rlocator, MAIN_FORKNUM);
+
+		appendStringInfoString(buf, path);
+		appendStringInfo(buf, " persistence %d", xlrec->persistence);
+		pfree(path);
+	}
 }
 
 const char *
@@ -55,6 +95,15 @@ smgr_identify(uint8 info)
 		case XLOG_SMGR_TRUNCATE:
 			id = "TRUNCATE";
 			break;
+		case XLOG_SMGR_UNLINK:
+			id = "UNLINK";
+			break;
+		case XLOG_SMGR_MARK:
+			id = "MARK";
+			break;
+		case XLOG_SMGR_BUFPERSISTENCE:
+			id = "BUFPERSISTENCE";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index 734c39a4d0..f08bd7f42d 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -724,6 +724,16 @@ we must panic and abort recovery.  The DBA will have to manually clean up
 then restart recovery.  This is part of the reason for not writing a WAL
 entry until we've successfully done the original action.
 
+
+The Smgr MARK files
+
+
+An smgr mark file is an empty file that is created when a new relation
+storage file is created to signal that the storage file needs to be
+cleaned up at recovery time.  In contrast to the four actions above,
+failure to remove smgr mark files will lead to data loss, in which
+case the server will shut down.
+
 
 Skipping

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-07-18 Thread John Naylor

On Tue, Jul 19, 2022 at 9:11 AM Masahiko Sawada 
wrote:

> I’d like to keep the first version simple. We can improve it and add
> more optimizations later. Using radix tree for vacuum TID storage
> would still be a big win comparing to using a flat array, even without
> all these optimizations. In terms of single-value leaves method, I'm
> also concerned about an extra pointer traversal and extra memory
> allocation. It's most flexible but multi-value leaves method is also
> flexible enough for many use cases. Using the single-value method
> seems to be too much as the first step for me.
>
> Overall, using 64-bit keys and 64-bit values would be a reasonable
> choice for me as the first step . It can cover wider use cases
> including vacuum TID use cases. And possibly it can cover use cases by
> combining a hash table or using tree of tree, for example.

These two aspects would also bring it closer to Andres' prototype, which 1)
makes review easier and 2) easier to preserve optimization work already
done, so +1 from me.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-07-18 Thread Peter Geoghegan

On Mon, Jul 18, 2022 at 9:10 PM John Naylor
 wrote:
> On Tue, Jul 19, 2022 at 9:24 AM Andres Freund  wrote:
> > FWIW, I think the best path forward would be to do something similar to the
> > simplehash.h approach, so it can be customized to the specific user.
>
> I figured that would come up at some point. It may be worth doing in the 
> future, but I think it's way too much to ask for the first use case.

I have a prototype patch that creates a read-only snapshot of the
visibility map, and has vacuumlazy.c work off of that when determining
with pages to skip. The patch also gets rid of the
SKIP_PAGES_THRESHOLD stuff. This is very effective with TPC-C,
principally because it really cuts down on the number of scanned_pages
that are scanned only because the VM bit is unset concurrently by DML.
The window for this is very large when the table is large (and
naturally takes a long time to scan), resulting in many more "dead but
not yet removable" tuples being encountered than necessary. Which
itself causes bogus information in the FSM -- information about the
space that VACUUM could free from the page, which is often highly
misleading.

There are remaining questions about how to do this properly. Right now
I'm just copying pages from the VM into local memory, right after
OldestXmin is first acquired -- we "lock in" a snapshot of the VM at
the earliest opportunity, which is what lazy_scan_skip() actually
works off now. There needs to be some consideration given to the
resource management aspects of this -- it needs to use memory
sensibly, which the current prototype patch doesn't do at all. I'm
probably going to seriously pursue this as a project soon, and will
probably need some kind of data structure for the local copy. The raw
pages are usually quite space inefficient, considering we only need an
immutable snapshot of the VM.

I wonder if it makes sense to use this as part of this project. It
will be possible to know the exact heap pages that will become
scanned_pages before scanning even one page with this design (perhaps
with caveats about low memory conditions). It could also be very
effective as a way of speeding up TID lookups in the reasonably common
case where most scanned_pages don't have any LP_DEAD items -- just
look it up in our local/materialized copy of the VM first. But even
when LP_DEAD items are spread fairly evenly, it could still give us
reliable information about the distribution of LP_DEAD items very
early on.

Maybe the two data structures could even be combined in some way? You
can use more memory for the local copy of the VM if you know that you
won't need the memory for dead_items. It's kinda the same problem, in
a way.

-- 
Peter Geoghegan

Re: Allowing REINDEX to have an optional name

On Mon, Jul 18, 2022 at 09:26:53PM -0500, Justin Pryzby wrote:
> Sorry, I meant to send this earlier..

No problem.

> It looks like you named the table "toast_relfilenodes", but then also store
> to it data for non-toast tables.

How about naming that index_relfilenodes?  One difference with what I
posted previously and 5fb5b6 is the addition of an extra regclass that
stores the parent table, for reference in the output.

> It's also a bit weird to call the column "relname" but use it to store the
> ::regclass.  You later need to cast the column to text, so you may as well
> store it as text, either relname or oid::regclass.

I have used "indname" at the end.

> It seems like cluster.sql does this more succinctly.

Except that this does not include the relfilenodes from the toast
indexes, which is something I wanted to add a check for when it comes
to both user tables and catalogs.

> Why {4,5} ?

Looks like a brain fade from here, while looking the relation names
this generated.  This could just match with an integer.
--
Michael
diff --git a/src/bin/scripts/t/090_reindexdb.pl b/src/bin/scripts/t/090_reindexdb.pl
index 864707ff92..b5fff5a9cf 100644
--- a/src/bin/scripts/t/090_reindexdb.pl
+++ b/src/bin/scripts/t/090_reindexdb.pl
@@ -40,7 +40,7 @@ my $toast_index = $node->safe_psql('postgres',
 # REINDEX operations.  A set of relfilenodes is saved from the catalogs
 # and then compared with pg_class.
 $node->safe_psql('postgres',
-	'CREATE TABLE toast_relfilenodes (parent regclass, indname regclass, relfilenode oid);'
+	'CREATE TABLE index_relfilenodes (parent regclass, indname regclass, relfilenode oid);'
 );
 # Save the relfilenode of a set of toast indexes, one from the catalog
 # pg_constraint and one from the test table.
@@ -58,8 +58,8 @@ my $fetch_index_relfilenodes = qq{SELECT i.indrelid, a.oid, a.relfilenode
 JOIN pg_index i ON (i.indexrelid = a.oid)
   WHERE a.relname IN ('pg_constraint_oid_index', 'test1x')};
 my $save_relfilenodes =
-	"INSERT INTO toast_relfilenodes $fetch_toast_relfilenodes;"
-  . "INSERT INTO toast_relfilenodes $fetch_index_relfilenodes;";
+	"INSERT INTO index_relfilenodes $fetch_toast_relfilenodes;"
+  . "INSERT INTO index_relfilenodes $fetch_index_relfilenodes;";
 
 # Query to compare a set of relfilenodes saved with the contents of pg_class.
 # Note that this does not join using OIDs, as CONCURRENTLY would change them
@@ -68,10 +68,10 @@ my $save_relfilenodes =
 # based on the name is enough to ensure a fixed output, where the name of the
 # parent table is included to provide more context.
 my $compare_relfilenodes = qq(SELECT b.parent::regclass,
-  regexp_replace(b.indname::text, '(pg_toast.pg_toast_)\\d{4,5}(_index)', '\\1\\2'),
+  regexp_replace(b.indname::text, '(pg_toast.pg_toast_)\\d+(_index)', '\\1\\2'),
   CASE WHEN a.relfilenode = b.relfilenode THEN 'relfilenode is unchanged'
 ELSE 'relfilenode has changed' END
-  FROM toast_relfilenodes b
+  FROM index_relfilenodes b
 JOIN pg_class a ON b.indname::text = a.oid::regclass::text
   ORDER BY b.parent::text, b.indname::text);
 
@@ -91,7 +91,7 @@ test1|test1x|relfilenode has changed),
 
 # Re-save and run the second one.
 $node->safe_psql('postgres',
-	"TRUNCATE toast_relfilenodes; $save_relfilenodes");
+	"TRUNCATE index_relfilenodes; $save_relfilenodes");
 $node->issues_sql_like(
 	[ 'reindexdb', '-s', 'postgres' ],
 	qr/statement: REINDEX SYSTEM postgres;/,


signature.asc
Description: PGP signature

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-07-18 Thread John Naylor

On Tue, Jul 19, 2022 at 9:24 AM Andres Freund  wrote:
> FWIW, I think the best path forward would be to do something similar to
the
> simplehash.h approach, so it can be customized to the specific user.

I figured that would come up at some point. It may be worth doing in the
future, but I think it's way too much to ask for the first use case.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Rename some rel truncation related constants at the top of vacuumlazy.c

2022-07-18 Thread Peter Geoghegan

On Mon, Jul 18, 2022 at 8:55 PM Tom Lane  wrote:
> Um ... you seem to have removed some useful comments?

I don't think that the stuff about making them into a GUC is useful myself.

> Personally I wouldn't do this, as I don't think the renaming
> brings much benefit, and it will create a hazard for back-patching
> any fixes that might be needed in that code.  I'm not hugely upset
> about it, but that's the way I'd vote if asked.

In that case I withdraw the patch.

FWIW I wrote the patch during the course of work on new feature
development. A patch that added a couple of similar constants a bit
further down. Seemed neater this way, but it's certainly not worth
arguing over.

-- 
Peter Geoghegan

Re: Rename some rel truncation related constants at the top of vacuumlazy.c

Peter Geoghegan  writes:
> I propose to rename some of the rel truncation related constants at
> the top of vacuumlazy.c, per the attached patch. The patch
> consolidates related constants into a single block/grouping, and
> imposes a uniform naming scheme.

Um ... you seem to have removed some useful comments?

Personally I wouldn't do this, as I don't think the renaming
brings much benefit, and it will create a hazard for back-patching
any fixes that might be needed in that code.  I'm not hugely upset
about it, but that's the way I'd vote if asked.

regards, tom lane

Rename some rel truncation related constants at the top of vacuumlazy.c

2022-07-18 Thread Peter Geoghegan

I propose to rename some of the rel truncation related constants at
the top of vacuumlazy.c, per the attached patch. The patch
consolidates related constants into a single block/grouping, and
imposes a uniform naming scheme.

-- 
Peter Geoghegan


0001-vacuumlazy.c-rename-rel-truncation-constants.patch
Description: Binary data

Re: Allowing REINDEX to have an optional name

On Sun, Jul 17, 2022 at 10:58:26AM +0100, Simon Riggs wrote:
> Sounds great, looks fine. Thanks for your review.

Ok, cool.  At the end, I have decided to split the tests and the main
patch into two different commits, as each is useful on its own.  Doing
so also helps in seeing the difference of behavior when issuing a
REINDEX DATABASE.  Another thing that was itching me with the test is
that it was not possible to make the difference between the toast
index of the catalog and of the user table, so I have added the parent
table name as an extra thing stored in the table storing the
relfilenodes.
--
Michael


signature.asc
Description: PGP signature

Re: Doc about how to set max_wal_senders when setting minimal wal_level

On Mon, Jul 18, 2022 at 8:16 PM Bruce Momjian  wrote:

> On Mon, Jul 18, 2022 at 07:39:55PM -0700, David G. Johnston wrote:
> > On Mon, Jul 18, 2022 at 6:27 PM Japin Li  wrote:
> >
> >
> > +0.90
> >
> > Consider changing:
> >
> > "makes any base backups taken before this unusable"
> >
> > to:
> >
> > "makes existing base backups unusable"
> >
> > As I try to justify this, though, it isn't quite true, maybe:
> >
> > "makes point-in-time recovery, using existing base backups, unable to
> replay
> > future WAL."
>
> I went with simpler wording.
>
>
+1

Thanks!

David J.

Re: Doc about how to set max_wal_senders when setting minimal wal_level

On Mon, Jul 18, 2022 at 07:39:55PM -0700, David G. Johnston wrote:
> On Mon, Jul 18, 2022 at 6:27 PM Japin Li  wrote:
> 
> 
> +0.90
> 
> Consider changing:
> 
> "makes any base backups taken before this unusable"
> 
> to:
> 
> "makes existing base backups unusable"
> 
> As I try to justify this, though, it isn't quite true, maybe:
> 
> "makes point-in-time recovery, using existing base backups, unable to replay
> future WAL."

I went with simpler wording.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 37fd80388c..c2bdacb6a7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2764,9 +2764,10 @@ include_dir 'conf.d'
 levels.  This parameter can only be set at server start.


-In minimal level, no information is logged for
-permanent relations for the remainder of a transaction that creates or
-rewrites them.  This can make operations much faster (see
+The minimal level generates the least WAL
+volume.  It logs no row information for permanent relations
+in transactions that create or
+rewrite them.  This can make operations much faster (see
 ).  Operations that initiate this
 optimization include:
 
@@ -2778,19 +2779,20 @@ include_dir 'conf.d'
  REINDEX
  TRUNCATE
 
-But minimal WAL does not contain enough information to reconstruct the
-data from a base backup and the WAL logs, so replica or
-higher must be used to enable WAL archiving
-() and streaming replication.
+However, minimal WAL does not contain sufficient information for
+point-in-time recovery, so replica or
+higher must be used to enable continuous archiving
+() and streaming binary replication.
+In fact, the server will not even start in this mode if
+max_wal_senders is non-zero.
 Note that changing wal_level to
-minimal makes any base backups taken before
-unavailable for archive recovery and standby server, which may
-lead to data loss.
+minimal makes previous base backups unusable
+for point-in-time recovery and standby servers.


 In logical level, the same information is logged as
-with replica, plus information needed to allow
-extracting logical change sets from the WAL. Using a level of
+with replica, plus information needed to
+extract logical change sets from the WAL. Using a level of
 logical will increase the WAL volume, particularly if many
 tables are configured for REPLICA IDENTITY FULL and
 many UPDATE and DELETE statements are

Re: doc: New cumulative stats subsystem obsoletes comment in maintenance.sgml

Hi,

On 2022-07-18 19:47:39 -0700, David G. Johnston wrote:
> On Thu, Jul 14, 2022 at 4:31 PM Andres Freund  wrote:
> > It might make sense to still say semi-accurate, but adjust the explanation
> > to
> > say that stats reporting is not instantaneous?
> >
> >
> Unless that delay manifests in executing an UPDATE in a session then
> looking at these views in the same session and not seeing that update
> reflected I wouldn't mention it.

Depending on which stats you're looking at, yes, that could totally happen. I
don't think the issue is not seeing changes from the current transaction
though - it's that *after* commit you might not see them for a while (the're
transmitted not more than once a second, and can be delayed up to 60s if
there's contention).

Greetings,

Andres Freund

Re: Error "initial slot snapshot too large" in create replication slot

2022-07-18 Thread Kyotaro Horiguchi

At Tue, 5 Jul 2022 11:32:42 -0700, Jacob Champion  
wrote in 
> On Thu, Mar 31, 2022 at 11:53 PM Kyotaro Horiguchi
>  wrote:
> > So this is that. v5 creates a regular snapshot.
> 
> This patch will need a quick rebase over 905c020bef9, which added
> `extern` to several missing locations.

Thanks! Just rebased.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From abcf0a0e0b3e2de9927d8943a3e3c145ab189508 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 19 Jul 2022 11:50:29 +0900
Subject: [PATCH v6] Create correct snapshot during CREATE_REPLICATION_SLOT

Snapshot can hold top XIDs up to the number of server processes but
SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to
there and easily fails with "initial slot snapshot too large" error.

We could instead create a "takenDuringRecovery" snapshot, which later
leads to unnecessary visibility checks. Therefore we take trouble to
create a regular snapshot by identifying whether each xids is
top-level and storing it in the appropriate xid array.

Author: Dilip Kumar and Kyotaro Horiguchi
---
 .../replication/logical/reorderbuffer.c   | 20 +
 src/backend/replication/logical/snapbuild.c   | 45 +++
 src/include/replication/reorderbuffer.h   |  1 +
 3 files changed, 58 insertions(+), 8 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 88a37fde72..44ea3f31aa 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3332,6 +3332,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
 }
 
 
+/*
+ * ReorderBufferXidIsKnownSubXact
+ *		Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+	ReorderBufferTXN *txn;
+
+	txn = ReorderBufferTXNByXid(rb, xid, false,
+NULL, InvalidXLogRecPtr, false);
+
+	/* a known subtxn? */
+	if (txn && rbtxn_is_known_subxact(txn))
+		return true;
+
+	return false;
+}
+
+
 /*
  * ---
  * Disk serialization support
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 73c0f15214..2b5282788a 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -532,7 +532,10 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	Snapshot	snap;
 	TransactionId xid;
 	TransactionId *newxip;
-	int			newxcnt = 0;
+	TransactionId *newsubxip;
+	int			newxcnt;
+	int			newsubxcnt;
+	bool		overflowed = false;
 
 	Assert(!FirstSnapshotSet);
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -568,9 +571,17 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
-	/* allocate in transaction context */
+	/*
+	 * Allocate in transaction context.	
+	 *
+	 * We could use only subxip to store all xids (takenduringrecovery
+	 * snapshot) but that causes useless visibility checks later so we hasle to
+	 * create a normal snapshot.
+	 */
 	newxip = (TransactionId *)
 		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+	newsubxip = (TransactionId *)
+		palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
 
 	/*
 	 * snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -578,6 +589,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
+	newxcnt = 0;
+	newsubxcnt = 0;
+
 	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
 	{
 		void	   *test;
@@ -591,12 +605,24 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 		if (test == NULL)
 		{
-			if (newxcnt >= GetMaxSnapshotXidCount())
-ereport(ERROR,
-		(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-		 errmsg("initial slot snapshot too large")));
-
-			newxip[newxcnt++] = xid;
+			/* Store the xid to the appropriate xid array */
+			if (ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
+			{
+if (!overflowed)
+{
+	if (newsubxcnt >= GetMaxSnapshotSubxidCount())
+		overflowed = true;
+	else
+		newsubxip[newsubxcnt++] = xid;
+}
+			}
+			else
+			{
+if (newxcnt >= GetMaxSnapshotXidCount())
+	elog(ERROR,
+		 "too many transactions while creating snapshot");
+newxip[newxcnt++] = xid;
+			}
 		}
 
 		TransactionIdAdvance(xid);
@@ -606,6 +632,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	snap->snapshot_type = SNAPSHOT_MVCC;
 	snap->xcnt = newxcnt;
 	snap->xip = newxip;
+	snap->subxcnt = newsubxcnt;
+	snap->subxip = newsubxip;
+	snap->suboverflowed = overflowed;
 
 	return snap;
 }
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index d109d0baed..736f22a04e 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -669,6 +669,7 @@ exter

Re: doc: New cumulative stats subsystem obsoletes comment in maintenance.sgml

On Thu, Jul 14, 2022 at 4:31 PM Andres Freund  wrote:

> Hi,
>
> I had missed David's original email on this topic...
>
> On 2022-07-14 18:58:09 -0400, Bruce Momjian wrote:
> > On Wed, Apr 20, 2022 at 04:40:44PM -0700, David G. Johnston wrote:
> > > The new cumulative stats subsystem no longer has a "lost under heavy
> load"
> > > problem so that parenthetical should go (or at least be modified).
> > >
> > > These stats can be reset so some discussion about how the system uses
> them
> > > given that possibility seems like it would be good to add here.  I'm
> not sure
> > > what that should look like though.
> > >
> > > diff --git a/doc/src/sgml/maintenance.sgml
> b/doc/src/sgml/maintenance.sgml
> > > index 04a04e0e5f..360807c8f9 100644
> > > --- a/doc/src/sgml/maintenance.sgml
> > > +++ b/doc/src/sgml/maintenance.sgml
> > > @@ -652,9 +652,8 @@ vacuum insert threshold = vacuum base insert
> threshold +
> > > vacuum insert scale fac
> > >  tuples to be frozen by earlier vacuums.  The number of obsolete
> tuples and
> > >  the number of inserted tuples are obtained from the cumulative
> statistics
> > > system;
> > >  it is a semi-accurate count updated by each
> UPDATE,
> > > -DELETE and INSERT
> operation.  (It is
> > > -only semi-accurate because some information might be lost under
> heavy
> > > -load.)  If the relfrozenxid value of
> the table
> > > +DELETE and INSERT operation.
> > > +If the relfrozenxid value of the table
> > >  is more than vacuum_freeze_table_age
> transactions old,
> > >  an aggressive vacuum is performed to freeze old tuples and advance
> > >  relfrozenxid; otherwise, only pages
> that have
> > > been modified
> >
> > Yes, I agree and plan to apply this patch soon.
>
> It might make sense to still say semi-accurate, but adjust the explanation
> to
> say that stats reporting is not instantaneous?
>
>
Unless that delay manifests in executing an UPDATE in a session then
looking at these views in the same session and not seeing that update
reflected I wouldn't mention it.  Concurrency aspects are reasonably
expected here.  But if we do want to mention it maybe:

 "...it is an eventually-consistent count updated by..."

David J.

Re: Windows default locale vs initdb

On Tue, Jul 19, 2022 at 10:58 AM Thomas Munro  wrote:
> Here's a patch.

I added this to the next commitfest, and cfbot promptly told me about
some warnings I needed to fix.  That'll teach me to post a patch
tested with "ci-os-only: windows".  Looking more closely at some error
messages that report GetLastError() where I'd mixed up %d and %lu, I
see also that I didn't quite follow existing conventions for wording
when reporting Windows error numbers, so I fixed that too.

In the "startcreate" step on CI you can see that it says:

The database cluster will be initialized with locale "en-US".
The default database encoding has accordingly been set to "WIN1252".
The default text search configuration will be set to "english".

As for whether "accordingly" still applies, by the logic of of
win32_langinfo()...  Windows still considers WIN1252 to be the default
ANSI code page for "en-US", though it'd work with UTF-8 too.  I'm not
sure what to make of that.  The goal here was to give Windows users
good defaults, but WIN1252 is probably not what most people actually
want.  Hmph.
From 95f2684150e2938f2e555d16bbed4295a6dad279 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 19 Jul 2022 06:31:17 +1200
Subject: [PATCH v2 1/2] Default to BCP 47 locale in initdb on Windows.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Avoid selecting traditional Windows locale names written with English
words, because they are unstable and not recommended for use in
databases.  Since setlocale() returns such names, on Windows use
GetUserDefaultLocaleName() if the user didn't provide an explicit
locale.

Also update the documentation to recommend BCP 47 over the traditional
names when providing explicit values to initdb.

Reviewed-by: Juan José Santamaría Flecha 
Discussion: https://postgr.es/m/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com
---
 doc/src/sgml/charset.sgml | 10 --
 src/bin/initdb/initdb.c   | 31 +--
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 445fd175d8..b656ca489f 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -83,8 +83,14 @@ initdb --locale=sv_SE
 system under what names depends on what was provided by the operating
 system vendor and what was installed.  On most Unix systems, the command
 locale -a will provide a list of available locales.
-Windows uses more verbose locale names, such as German_Germany
-or Swedish_Sweden.1252, but the principles are the same.
+   
+
+   
+Windows uses BCP 47 language tags, like ICU.
+For example, sv-SE represents Swedish as spoken in Sweden.
+Windows also supports more verbose locale names based on English words,
+such as German_Germany or Swedish_Sweden.1252,
+but these are not recommended.

 

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 89b888eaa5..3af08b7b99 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -59,6 +59,10 @@
 #include "sys/mman.h"
 #endif
 
+#ifdef WIN32
+#include 
+#endif
+
 #include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_class_d.h" /* pgrminclude ignore */
@@ -2007,6 +2011,7 @@ locale_date_order(const char *locale)
 static void
 check_locale_name(int category, const char *locale, char **canonname)
 {
+	char	   *locale_copy;
 	char	   *save;
 	char	   *res;
 
@@ -2022,10 +2027,30 @@ check_locale_name(int category, const char *locale, char **canonname)
 
 	/* for setlocale() call */
 	if (!locale)
-		locale = "";
+	{
+#ifdef WIN32
+		wchar_t		wide_name[LOCALE_NAME_MAX_LENGTH];
+		char		name[LOCALE_NAME_MAX_LENGTH];
+
+		/* use Windows API to find the default in BCP47 format */
+		if (GetUserDefaultLocaleName(wide_name, LOCALE_NAME_MAX_LENGTH) == 0)
+			pg_fatal("failed to get default locale name: error code %lu",
+	 GetLastError());
+		if (WideCharToMultiByte(CP_ACP, 0, wide_name, -1, name,
+LOCALE_NAME_MAX_LENGTH, NULL, NULL) == 0)
+			pg_fatal("failed to convert locale name: error code %lu",
+	 GetLastError());
+		locale_copy = pg_strdup(name);
+#else
+		/* use environment to find the default */
+		locale_copy = pg_strdup("");
+#endif
+	}
+	else
+		locale_copy = pg_strdup(locale);
 
 	/* set the locale with setlocale, to see if it accepts it. */
-	res = setlocale(category, locale);
+	res = setlocale(category, locale_copy);
 
 	/* save canonical name if requested. */
 	if (res && canonname)
@@ -2054,6 +2079,8 @@ check_locale_name(int category, const char *locale, char **canonname)
 			pg_fatal("invalid locale settings; check LANG and LC_* environment variables");
 		}
 	}
+
+	free(locale_copy);
 }
 
 /*
-- 
2.35.1

From 1e0b75b4c8958397a8e660fa0b8759f1da78a753 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 19 Jul 2022 08:53:08 +1200
Subject: [PATCH v2 2/2] Remove support for old Windows loc

Re: Doc about how to set max_wal_senders when setting minimal wal_level

On Mon, Jul 18, 2022 at 6:27 PM Japin Li  wrote:

>
> On Tue, 19 Jul 2022 at 03:58, Bruce Momjian  wrote:
> > On Fri, Jul 15, 2022 at 09:29:20PM +0800, Japin Li wrote:
> >>
> >> On Fri, 15 Jul 2022 at 08:49, Bruce Momjian  wrote:
> >> > On Tue, Jul  5, 2022 at 08:02:33PM -0400, Tom Lane wrote:
> >> >> "Precondition" is an overly fancy word that makes things less clear
> >> >> not more so.  Does it mean that setting wal_level = minimal will fail
> >> >> if you don't do these other things, or does it just mean that you
> >> >> won't be getting the absolute minimum WAL volume?  If the former,
> >> >> I think it'd be better to say something like "To set wal_level to
> minimal,
> >> >> you must also set [these variables], which has the effect of
> disabling
> >> >> both WAL archiving and streaming replication."
> >> >
> >> > I have created the attached patch to try to improve this text.
> >>
> >> IMO we can add the following sentence for wal_level description, since
> >> if wal_level = minimal and max_wal_senders > 0, we cannot start the
> database.
> >>
> >> To set wal_level to minimal, you must also set max_wal_senders to 0,
> >> which has the effect of disabling both WAL archiving and streaming
> >> replication.
> >
> > Okay, text added in the attached patch.
>
> Thanks for updating the patch! LGTM.
>
>
+0.90

Consider changing:

"makes any base backups taken before this unusable"

to:

"makes existing base backups unusable"

As I try to justify this, though, it isn't quite true, maybe:

"makes point-in-time recovery, using existing base backups, unable to
replay future WAL."

David J.

Re: Allowing REINDEX to have an optional name

Sorry, I meant to send this earlier..

On Sun, Jul 17, 2022 at 03:19:47PM +0900, Michael Paquier wrote:
> The second thing is test coverage.  Using a REINDEX DATABASE/SYSTEM

> +my $catalog_toast_index = $node->safe_psql('postgres',
> + "SELECT indexrelid::regclass FROM pg_index WHERE indrelid = 
> '$toast_table'::regclass;"
> +);
> +
> +# Set of SQL queries to cross-check the state of relfilenodes across
> +# REINDEX operations.  A set of relfilenodes is saved from the catalogs
> +# and then compared with pg_class.
> +$node->safe_psql('postgres',
> + 'CREATE TABLE toast_relfilenodes (relname regclass, relfilenode oid);');

It looks like you named the table "toast_relfilenodes", but then also store
to it data for non-toast tables.

It's also a bit weird to call the column "relname" but use it to store the
::regclass.  You later need to cast the column to text, so you may as well
store it as text, either relname or oid::regclass.

It seems like cluster.sql does this more succinctly.

-- Check that clustering sets new relfilenodes:
CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, 
relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c 
ON c.oid=tree.relid ;
CLUSTER clstrpart USING clstrpart_idx;
CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, 
relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c 
ON c.oid=tree.relid ;
SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM 
old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY 
relname COLLATE "C";

> +# Save the relfilenode of a set of toast indexes, one from the catalog
> +# pg_constraint and one from the test table.  This data is used for checks
> +# after some of the REINDEX operations done below, checking if they are
> +# changed.
> +my $fetch_toast_relfilenodes = qq{SELECT c.oid::regclass, c.relfilenode
> +  FROM pg_class a
> +JOIN pg_class b ON (a.oid = b.reltoastrelid)
> +JOIN pg_index i on (a.oid = i.indrelid)
> +JOIN pg_class c on (i.indexrelid = c.oid)
> +  WHERE b.oid IN ('pg_constraint'::regclass, 'test1'::regclass)};
> +# Same for relfilenodes of normal indexes.  This saves the relfilenode
> +# from a catalog of pg_constraint, and the one from the test table.
> +my $fetch_index_relfilenodes = qq{SELECT oid, relfilenode
> +  FROM pg_class
> +  WHERE relname IN ('pg_constraint_oid_index', 'test1x')};
> +my $save_relfilenodes =
> + "INSERT INTO toast_relfilenodes $fetch_toast_relfilenodes;"
> +  . "INSERT INTO toast_relfilenodes $fetch_index_relfilenodes;";
> +
> +# Query to compare a set of relfilenodes saved with the contents of pg_class.
> +# Note that this does not join using OIDs, as CONCURRENTLY would change them
> +# when reindexing.  A filter is applied on the toast index names, even if 
> this
> +# does not make a difference between the catalog and normal ones, the 
> ordering
> +# based on the name is enough to ensure a fixed output.
> +my $compare_relfilenodes =
> +  qq(SELECT regexp_replace(b.relname::text, 
> '(pg_toast.pg_toast_)\\d{4,5}(_index)', '\\1\\2'),

Why {4,5} ?

> +  CASE WHEN a.relfilenode = b.relfilenode THEN 'relfilenode is unchanged'
> +ELSE 'relfilenode has changed' END
> +  FROM toast_relfilenodes b
> +JOIN pg_class a ON b.relname::text = a.oid::regclass::text
> +  ORDER BY b.relname::text);
> +
> +# Save the set of relfilenodes and compare them.
> +$node->safe_psql('postgres', $save_relfilenodes);
> +$node->issues_sql_like(
> + [ 'reindexdb', 'postgres' ],
> + qr/statement: REINDEX DATABASE postgres;/,
> + 'SQL REINDEX run');
> +my $relnode_info = $node->safe_psql('postgres', $compare_relfilenodes);
> +is( $relnode_info, qq(pg_constraint_oid_index|relfilenode is unchanged
> +pg_toast.pg_toast__index|relfilenode has changed
> +pg_toast.pg_toast__index|relfilenode is unchanged
> +test1x|relfilenode has changed), 'relfilenode change after REINDEX 
> DATABASE');
> +
> +# Re-save and run the second one.
> +$node->safe_psql('postgres',
> + "TRUNCATE toast_relfilenodes; $save_relfilenodes");
> +$node->issues_sql_like(
> + [ 'reindexdb', '-s', 'postgres' ],
> + qr/statement: REINDEX SYSTEM postgres;/,
> + 'reindex system tables');
> +$relnode_info = $node->safe_psql('postgres', $compare_relfilenodes);
> +is( $relnode_info, qq(pg_constraint_oid_index|relfilenode has changed
> +pg_toast.pg_toast__index|relfilenode is unchanged
> +pg_toast.pg_toast__index|relfilenode has changed
> +test1x|relfilenode is unchanged), 'relfilenode change after REINDEX SYSTEM');

Re: [PoC] Improve dead tuple storage for lazy vacuum

Hi,

On 2022-07-08 11:09:44 +0900, Masahiko Sawada wrote:
> I think that at this stage it's better to define the design first. For
> example, key size and value size, and these sizes are fixed or can be
> set the arbitary size? Given the use case of buffer mapping, we would
> need a wider key to store RelFileNode, ForkNumber, and BlockNumber. On
> the other hand, limiting the key size is 64 bit integer makes the
> logic simple, and possibly it could still be used in buffer mapping
> cases by using a tree of a tree. For value size, if we support
> different value sizes specified by the user, we can either embed
> multiple values in the leaf node (called Multi-value leaves in ART
> paper) or introduce a leaf node that stores one value (called
> Single-value leaves).

FWIW, I think the best path forward would be to do something similar to the
simplehash.h approach, so it can be customized to the specific user.

Greetings,

Andres Freund

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-07-18 Thread Masahiko Sawada

On Thu, Jul 14, 2022 at 1:17 PM John Naylor
 wrote:
>
> On Tue, Jul 12, 2022 at 8:16 AM Masahiko Sawada  wrote:
>
> > > > I think that at this stage it's better to define the design first. For
> > > > example, key size and value size, and these sizes are fixed or can be
> > > > set the arbitary size?
> > >
> > > I don't think we need to start over. Andres' prototype had certain
> > > design decisions built in for the intended use case (although maybe
> > > not clearly documented as such). Subsequent patches in this thread
> > > substantially changed many design aspects. If there were any changes
> > > that made things wonderful for vacuum, it wasn't explained, but Andres
> > > did explain how some of these changes were not good for other uses.
> > > Going to fixed 64-bit keys and values should still allow many future
> > > applications, so let's do that if there's no reason not to.
> >
> > I thought Andres pointed out that given that we store BufferTag (or
> > part of that) into the key, the fixed 64-bit keys might not be enough
> > for buffer mapping use cases. If we want to use wider keys more than
> > 64-bit, we would need to consider it.
>
> It sounds like you've answered your own question, then. If so, I'm
> curious what your current thinking is.
>
> If we *did* want to have maximum flexibility, then "single-value
> leaves" method would be the way to go, since it seems to be the
> easiest way to have variable-length both keys and values. I do have a
> concern that the extra pointer traversal would be a drag on
> performance, and also require lots of small memory allocations.

Agreed.

> I also have some concerns about also simultaneously trying to design
> for the use for buffer mappings. I certainly want to make this good
> for as many future uses as possible, and I'd really like to preserve
> any optimizations already fought for. However, to make concrete
> progress on the thread subject, I also don't think it's the most
> productive use of time to get tied up about the fine details of
> something that will not likely happen for several years at the
> earliest.

I’d like to keep the first version simple. We can improve it and add
more optimizations later. Using radix tree for vacuum TID storage
would still be a big win comparing to using a flat array, even without
all these optimizations. In terms of single-value leaves method, I'm
also concerned about an extra pointer traversal and extra memory
allocation. It's most flexible but multi-value leaves method is also
flexible enough for many use cases. Using the single-value method
seems to be too much as the first step for me.

Overall, using 64-bit keys and 64-bit values would be a reasonable
choice for me as the first step . It can cover wider use cases
including vacuum TID use cases. And possibly it can cover use cases by
combining a hash table or using tree of tree, for example.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: PATCH: Add Table Access Method option to pgbench

On Mon, Jul 18, 2022 at 01:53:21PM +0300, Alexander Korotkov wrote:
> Looks good to me as well.  I'm going to push this if no objections.

FWIW, I find the extra mention of PGOPTIONS with the specific point of
table AMs added within the part of the environment variables a bit
confusing, because we already mention PGOPTIONS for serializable
transactions a bit down.  Hence, my choice would be the addition of an
extra paragraph in the "Notes", named "Table Access Methods", just
before or after "Good Practices".  My 2c.
--
Michael


signature.asc
Description: PGP signature

Re: NAMEDATALEN increase because of non-latin languages

2022-07-18 Thread John Naylor

On Mon, Jul 18, 2022 at 9:58 AM Andres Freund  wrote:

> > 0001 is just boilerplate, same as v1
>
> If we were to go for this, I wonder if we should backpatch the cast
containing
> version of GESTRUCT for less pain backpatching bugfixes. That'd likely
require
> using a different name for the cast containing one.

The new version in this series was meant to be temporary scaffolding, but
in the back of my mind I wondered if we should go ahead and keep the simple
cast for catalogs that have no varlenas or alignment issues. It sounds like
you'd be in favor of that.

> > 0003 generates static inline functions that work the same as the current
> > GETSTRUCT macro, i.e. just cast to the right pointer and return it.
>
> It seems likely that inline functions are going to be too large for
> this. There's a lot of GESTRUCTs in a lot of files, emitting a copy of the
> function every time doesn't seem great.

Ok.

> > current offset is the previous offset plus the previous type length,
plus
> > any alignment padding suitable for the current type (there is none
here, so
> > the alignment aspect is not tested). I'm hoping something like this
will be
> > sufficient for what's in the current structs, but of course will need
> > additional work when expanding those to include pointers to varlen
> > attributes. I've not yet inspected the emitted assembly language, but
> > regression tests pass.
>
> Hm. Wouldn't it make sense to just use the normal tuple deforming
routines and
> then map the results to the structs?

I wasn't sure if they'd be suitable for this, but if they are, that'd make
this easier and more maintainable. I'll look into it.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Doc about how to set max_wal_senders when setting minimal wal_level

2022-07-18 Thread Japin Li



On Tue, 19 Jul 2022 at 03:58, Bruce Momjian  wrote:
> On Fri, Jul 15, 2022 at 09:29:20PM +0800, Japin Li wrote:
>> 
>> On Fri, 15 Jul 2022 at 08:49, Bruce Momjian  wrote:
>> > On Tue, Jul  5, 2022 at 08:02:33PM -0400, Tom Lane wrote:
>> >> "Precondition" is an overly fancy word that makes things less clear
>> >> not more so.  Does it mean that setting wal_level = minimal will fail
>> >> if you don't do these other things, or does it just mean that you
>> >> won't be getting the absolute minimum WAL volume?  If the former,
>> >> I think it'd be better to say something like "To set wal_level to minimal,
>> >> you must also set [these variables], which has the effect of disabling
>> >> both WAL archiving and streaming replication."
>> >
>> > I have created the attached patch to try to improve this text.
>> 
>> IMO we can add the following sentence for wal_level description, since
>> if wal_level = minimal and max_wal_senders > 0, we cannot start the database.
>> 
>> To set wal_level to minimal, you must also set max_wal_senders to 0,
>> which has the effect of disabling both WAL archiving and streaming
>> replication.
>
> Okay, text added in the attached patch.

Thanks for updating the patch! LGTM.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: First draft of the PG 15 release notes

> Increase hash_mem_multiplier default to 2.0 (Peter Geoghegan)
> This allows query hash operations to use double the amount of work_mem memory 
> as other operations.

I wonder if it's worth pointing out that a query may end up using not just 2x
more memory (since work_mem*hash_mem_multiplier is per node), but 2**N more,
for N nodes.

> Remove pg_dump's --no-synchronized-snapshots option since all supported 
> server versions support synchronized snapshots (Tom Lane)

It'd be better to put that after the note about dropping support for upgrading
clusters older than v9.2 in psql/pg_dump/pg_upgrade.

> Enable system and TOAST btree indexes to efficiently store duplicates (Peter 
> Geoghegan)

Say "btree indexes on system [and TOAST] tables"

> Prevent changes to columns only indexed by BRIN indexes from disabling HOT 
> updates (Josef Simanek)

This was reverted

> Generate periodic log message during slow server starts (Nitin Jadhav, Robert 
> Haas)
messages plural

> Messages report the cause of the delay. The time interval for notification is 
> controlled by the new server variable log_startup_progress_interval.
*The messages

> Add server variable shared_memory_size to report the size of allocated shared 
> memory (Nathan Bossart)
> Add server variable shared_memory_size_in_huge_pages to report the number of 
> huge memory pages required (Nathan Bossart)

Maybe these should say server *parameter* since they're not really "variable".

> 0. Add support for LZ4 and Zstandard compression of server-side base backups 
> (Jeevan Ladhe, Robert Haas)
> 1. Allow pg_basebackup to use LZ4 and Zstandard compression on server-side 
> base backup files (Dipesh Pandit, Jeevan Ladhe)
> 2. Allow pg_basebackup's --compress option to control the compression method 
> and options (Michael Paquier, Robert Haas)
>New options include server-gzip (gzip on the server), client-gzip (same as 
> gzip).
> 3. Allow pg_basebackup to compress on the server side and decompress on the 
> client side before storage (Dipesh Pandit)
>This is accomplished by specifying compression on the server side and 
> plain output format.

I still think these expose the incremental development rather than the
user-facing change.

1. It seems wrong to say "server-side" since client-side compression with
LZ4/zstd is also supported.

2. It's confusing to say that the new options are server-gzip and client-gzip,
since it just mentioned new algorithms;

3. I'm not sure this needs to be mentioned at all; maybe it should be a
"detail" following the item about server-side compression.

> Tables added to the listed schemas in the future will also be replicated.

"Tables later added" is clearer.  Otherwise "in the future" sounds like maybe
in v16 or v17 we'll start replicating those tables.

> Allow subscribers to stop logical replication application on error (Osumi 
> Takamichi, Mark Dilger)
"application" sounds off.

> Add new default WAL-logged method for database creation (Dilip Kumar)
"New default" sounds off.  Say "Add new WAL-logged method for database 
creation, used by default".

> Have pg_upgrade preserve relfilenodes, tablespace, and database OIDs between 
> old and new clusters (Shruthi KC, Antonin Houska)

"tablespace OIDs" or "tablespace and database OIDs and relfilenodes"

> Limit support of pg_upgrade to old servers running PostgreSQL 9.2 and later 
> (Tom Lane)

The word "old" doesn't appear in the 2 release notes items about pg_dump and
psql, and "old" makes it sound sounds like "antique" rather than "source".

> Some internal-use-only types have also been assigned this column.
this *value

> Allow custom scan provders to indicate if they support projections (Sven 
> Klemm)
> The default is now that custom scan providers can't support projections, so 
> they need to be updated for this release.

Per the commit message, they don't "need" to be updated.
I think this should say "The default now assumes that a custom scan provider
does not support projections; to retain optimal performance, they should be
updated to indicate whether that's supported.

Re: System catalog documentation chapter

Bruce Momjian  writes:
> On Sat, Jul 16, 2022 at 10:53:17AM +0200, Peter Eisentraut wrote:
>> Maybe this whole notion that "system views" is one thing is not suitable.

> Are you thinking we should just call the chapter "System Catalogs and
> Views" and just place them alphabetically in a single chapter?

I didn't think that was Peter's argument at all.  He's complaining
that "system views" isn't a monolithic category, which is a reasonable
point, especially since we have a bunch of built-in views that appear
in other chapters.  But to then also confuse them with catalogs isn't
improving the situation.

The views that are actually reinterpretations of catalog contents should
probably be documented near the catalogs.  But a lot of stuff in that
chapter is no such thing.  For example, it's really unclear why
pg_backend_memory_contexts is documented here and not somewhere near
the stats views.  We also have stuff like pg_available_extensions,
pg_file_settings, and pg_timezone_names, which are reporting ground truth
of some sort that didn't come from the catalogs.  I'm not sure if those
belong near the catalogs or not.

The larger point, perhaps, is that this whole area is underneath
"Part VII: Internals", and that being the case what you would expect
to find here is stuff that we don't intend people to interact with
in day-to-day usage.  Most of the "system views" are specifically
intended for day-to-day use, maybe only by DBAs, but nonetheless they
are user-facing in a way that the catalogs aren't.

Maybe we should move them all to Part IV, in a chapter or chapters
adjacent to the Information Schema chapter.  Or maybe try to separate
"user" views from "DBA" views, and put user views in Part IV while
DBA views go into a new chapter in Part III, near the stats views.

regards, tom lane

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-18 Thread Masahiko Sawada

On Mon, Jul 18, 2022 at 12:28 PM shiy.f...@fujitsu.com
 wrote:
>
> On Fri, Jul 15, 2022 10:39 PM Masahiko Sawada  wrote:
> >
> > This patch should have the fix for the issue that Shi yu reported. Shi
> > yu, could you please test it again with this patch?
> >
>
> Thanks for updating the patch!
> I have tested and confirmed that the problem I found has been fixed.

Thank you for testing!

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: [PATCH] Introduce array_shuffle() and array_sample()

On Tue, Jul 19, 2022 at 8:15 AM Martin Kalcher
 wrote:
> Am 18.07.22 um 21:29 schrieb Tom Lane:
> > The preferred thing to do is to add it to our "commitfest" queue,
> > which will ensure that it gets looked at eventually.  The currently
> > open cycle is 2022-09 [1] (see the "New Patch" button there).
>
> Thanks Tom, did that.

FYI that brought your patch to the attention of this CI robot, which
is showing a couple of warnings.  See the FAQ link there for an
explanation of that infrastructure.

http://cfbot.cputube.org/martin-kalcher.html

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-18 Thread Masahiko Sawada

On Mon, Jul 18, 2022 at 1:12 PM Amit Kapila  wrote:
>
> On Fri, Jul 15, 2022 at 8:09 PM Masahiko Sawada  wrote:
> >
> > This patch should have the fix for the issue that Shi yu reported. Shi
> > yu, could you please test it again with this patch?
> >
>
> Can you explain the cause of the failure and your fix for the same?

@@ -1694,6 +1788,8 @@ out:
/* be tidy */
if (ondisk)
pfree(ondisk);
+   if (catchange_xip)
+   pfree(catchange_xip);

Regarding the above code in the previous version patch, looking at the
generated assembler code shared by Shi yu offlist, I realized that the
“if (catchange_xip)” is removed (folded) by gcc optimization. This is
because we dereference catchange_xip before null-pointer check as
follow:

+   /* copy catalog modifying xacts */
+   sz = sizeof(TransactionId) * catchange_xcnt;
+   memcpy(ondisk_c, catchange_xip, sz);
+   COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
+   ondisk_c += sz;

Since sz is 0 in this case, memcpy doesn’t do anything actually.

By checking the assembler code, I’ve confirmed that gcc does the
optimization for these code and setting
-fno-delete-null-pointer-checks flag prevents the if statement from
being folded. Also, I’ve confirmed that adding the check if
"catchange.xcnt > 0” before the null-pointer check also can prevent
that. Adding a check  if "catchange.xcnt > 0” looks more robust. I’ve
added a similar check for builder->committed.xcnt as well for
consistency. builder->committed.xip could have no transactions.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Costing elided SubqueryScans more nearly correctly

2022-07-18 Thread Richard Guo

On Tue, Jul 19, 2022 at 1:30 AM Tom Lane  wrote:

> Alvaro Herrera  writes:
> > On 2022-Jul-18, Richard Guo wrote:
> >> BTW, not related to this patch, the new lines for parallel_aware check
> >> in setrefs.c are very wide. How about wrap them to keep consistent with
> >> arounding codes?
>
> > Not untrue!  Something like this, you mean?  Fixed the nearby typo while
> > at it.
>
> WFM.  (I'd fixed the comment typo in my patch, but I don't mind if
> you get there first.)


+1 The fix looks good to me.

Thanks
Richard

Re: System catalog documentation chapter

On Sat, Jul 16, 2022 at 10:53:17AM +0200, Peter Eisentraut wrote:
> Now that I see the result, I don't think this is really the right
> improvement yet.
> 
> The new System Views chapter lists views that are effectively quasi-system
> catalogs, such as pg_shadow or pg_replication_origin_status -- the fact that
> these are views and not tables is secondary.  On the other hand, it lists
> views that are more on the level of information schema views, that is, they
> are explicitly user-facing wrappers around information available elsewhere,
> such as pg_sequences, pg_views.
> 
> I think most of them are in the second category.  So having this chapter in
> the "Internals" part seems wrong.  But then moving it, say, closer to where
> the information schema is documented wouldn't be right either, unless we
> move the views in the first category elsewhere.
> 
> Also, consider that we document the pg_stats_ views in yet another place,
> and it's not really clear why something like pg_replication_slots, which
> might often be used together with stats views, is documented so far away
> from them.
> 
> Maybe this whole notion that "system views" is one thing is not suitable.

Are you thinking we should just call the chapter "System Catalogs and
Views" and just place them alphabetically in a single chapter?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: Allow placeholders in ALTER ROLE w/o superuser

On Fri, Jul 01, 2022 at 04:40:27PM -0700, Nathan Bossart wrote:
> On Sun, Jun 05, 2022 at 11:20:38PM -0500, Steve Chavez wrote:
>> However, defining placeholders at the role level require superuser:
>> 
>>   alter role myrole set my.username to 'tomas';
>>   ERROR:  permission denied to set parameter "my.username"
>> 
>> Which is inconsistent and surprising behavior. I think it doesn't make
>> sense since you can already set them at the session or transaction
>> level(SET LOCAL my.username = 'tomas'). Enabling this would allow sidecar
>> services to store metadata scoped to its pertaining role.
>> 
>> I've attached a patch that removes this restriction. From my testing, this
>> doesn't affect permission checking when an extension defines its custom GUC
>> variables.
>> 
>>DefineCustomStringVariable("my.custom", NULL, NULL,  &my_custom,  NULL,
>>   PGC_SUSET, ..);
>> 
>> Using PGC_SUSET or PGC_SIGHUP will fail accordingly. Also no tests fail
>> when doing "make installcheck".
> 
> IIUC you are basically proposing to revert a6dcd19 [0], but it is not clear
> to me why that is safe.  Am I missing something?

I spent some more time looking into this, and I think I've constructed a
simple example that demonstrates the problem with removing this
restriction.

postgres=# CREATE ROLE test CREATEROLE;
CREATE ROLE
postgres=# CREATE ROLE other LOGIN;
CREATE ROLE
postgres=# GRANT CREATE ON DATABASE postgres TO other;
GRANT
postgres=# SET ROLE test;
SET
postgres=> ALTER ROLE other SET plperl.on_plperl_init = 'test';
ALTER ROLE
postgres=> \c postgres other
You are now connected to database "postgres" as user "other".
postgres=> CREATE EXTENSION plperl;
CREATE EXTENSION
postgres=> SHOW plperl.on_plperl_init;
 plperl.on_plperl_init 
---
 test
(1 row)

In this example, the non-superuser role sets a placeholder GUC for another
role.  This GUC becomes a PGC_SUSET GUC when plperl is loaded, so a
non-superuser role will have successfully set a PGC_SUSET GUC.  If we had a
record of who ran ALTER ROLE, we might be able to apply appropriate
permissions checking when the module is loaded, but this information
doesn't exist in pg_db_role_setting.  IIUC we have the following options:

1. Store information about who ran ALTER ROLE.  I think there are a
   couple of problems with this.  For example, what happens if the
   grantor was dropped or its privileges were altered?  Should we
   instead store the context of the user (i.e., PGC_USERSET or
   PGC_SUSET)?  Do we need to add entries to pg_depend?
2. Ignore or ERROR for any ALTER ROLE settings for custom GUCs.  Since
   we don't know who ran ALTER ROLE, we can't trust the value.
3. Require superuser to use ALTER ROLE for a placeholder.  This is what
   we do today.  Since we know a superuser set the value, we can always
   apply it when the custom GUC is finally defined.

If this is an accurate representation of the options, it seems clear why
the superuser restriction is in place.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)

On Fri, May 13, 2022 at 09:10:52AM -0400, Robert Haas wrote:
> I suggest that if log_startup_progress_interval doesn't meet your
> needs here, we should try to understand why not and maybe enhance it,
> instead of adding a separate facility.

+1.  AFAICT it should be possible to make the log_startup_progress_interval
machinery extensible enough to be reused for several other tasks that can
take a while but have little visibility.

Since this thread has been inactive for over 2 months, I'm marking the
commitfest entry as Waiting on Author.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Windows default locale vs initdb

On Wed, Dec 15, 2021 at 11:32 PM Juan José Santamaría Flecha
 wrote:
> On Sun, May 16, 2021 at 6:29 AM Noah Misch  wrote:
>> On Mon, Apr 19, 2021 at 05:42:51PM +1200, Thomas Munro wrote:
>> > The question we asked ourselves
>> > multiple times in the other thread was how we're supposed to get to
>> > the modern BCP 47 form when creating the template databases.  It looks
>> > like one possibility, since Vista, is to call
>> > GetUserDefaultLocaleName()[2]
>>
>> > No patch, but I wondered if any Windows hackers have any feedback on
>> > relative sanity of trying to fix all these problems this way.
>>
>> Sounds reasonable.  If PostgreSQL v15 would otherwise run on Windows Server
>> 2003 R2, this is a good time to let that support end.
>>
> The value returned by GetUserDefaultLocaleName() is a system configured 
> parameter, independent of what you set with setlocale(). It might be 
> reasonable for initdb but not for a backend in most cases.

Agreed.  Only for initdb, and only if you didn't specify a locale name
on the command line.

> You can get the locale POSIX-ish name using GetLocaleInfoEx(), but this is no 
> longer recommended, because using LCIDs is no longer recommended [1]. 
> Although, this would work for legacy locales. Please find attached a POC 
> patch showing this approach.

Now that museum-grade Windows has been defenestrated, we are free to
call GetUserDefaultLocaleName().  Here's a patch.

One thing you did in your patch that I disagree with, I think, was to
convert a BCP 47 name to a POSIX name early, that is, s/-/_/.  I think
we should use the locale name exactly as Windows (really, under the
covers, ICU) spells it.  There is only one place in the tree today
that really wants a POSIX locale name, and that's LC_MESSAGES,
accessed by GNU gettext, not Windows.  We already had code to cope
with that.

I think we should also convert to POSIX format when making the
collname in your pg_import_system_collations() proposal, so that
COLLATE "en_US" works (= a SQL identifier), but that's another
thread[1].  I don't think we should do it in collcollate or
datcollate, which is a string for the OS to interpret.

With my garbage collector hat on, I would like to rip out all of the
support for traditional locale names, eventually.  Deleting kludgy
code is easy and fun -- 0002 is a first swing at that -- but there
remains an important unanswered question.  How should someone
pg_upgrade a "English_Canada.1521" cluster if we now reject that name?
 We'd need to do a conversion to "en-CA", or somehow tell the user to.
H.

[1] 
https://www.postgresql.org/message-id/flat/CAC%2BAXB0WFjJGL1n33bRv8wsnV-3PZD0A7kkjJ2KjPH0dOWqQdg%40mail.gmail.com
From d6d677fd185242590f0f716cf69d09e735122ff7 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 19 Jul 2022 06:31:17 +1200
Subject: [PATCH 1/2] Default to BCP 47 locale in initdb on Windows.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Avoid selecting traditional Windows locale names written with English
words, because they are unstable and not recommended for use in
databases.  Since setlocale() returns such names, instead use
GetUserDefaultLocaleName() if the user didn't provide an explicit
locale.

Also update the documentation to recommend BCP 47 over the traditional
names when providing explicit values to initdb.

Reviewed-by: Juan José Santamaría Flecha 
Discussion: https://postgr.es/m/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com
---
 doc/src/sgml/charset.sgml | 10 --
 src/bin/initdb/initdb.c   | 28 +++-
 2 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 445fd175d8..22e33f0f57 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -83,8 +83,14 @@ initdb --locale=sv_SE
 system under what names depends on what was provided by the operating
 system vendor and what was installed.  On most Unix systems, the command
 locale -a will provide a list of available locales.
-Windows uses more verbose locale names, such as German_Germany
-or Swedish_Sweden.1252, but the principles are the same.
+   
+
+   
+Windows uses BCP 47 language tags.
+For example, sv-SE represents Swedish as spoken in Sweden.
+Windows also supports more verbose locale names based on English words,
+such as German_Germany or Swedish_Sweden.1252,
+but these are not recommended.

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 89b888eaa5..57c5ecf3cf 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -59,6 +59,10 @@
 #include "sys/mman.h"
 #endif

+#ifdef WIN32
+#include 
+#endif
+
 #include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_class_d.h" /* pgrminclude ignore */
@@ -2022,7 +2026,27 @@ check_locale_name(int category, const char *locale, char **canonname)

 	/* for set

Re: Commitfest Update

On 7/18/22 15:32, Justin Pryzby wrote:
> On Mon, Jul 18, 2022 at 12:00:01PM -0700, Jacob Champion wrote:
>> And thank you for speaking up so quickly! It's a lot easier to undo
>> partial damage :D (Speaking of which: where is that CF update stream you
>> mentioned?)
> 
> https://commitfest.postgresql.org/activity/

Thank you. At this point, I think I've repaired all the entries.

--Jacob

Re: [PATCH] Introduce array_shuffle() and array_sample()


Am 19.07.22 um 00:18 schrieb Tom Lane:


Independently of the dimensionality question --- I'd imagined that
array_sample would select a random subset of the array elements
but keep their order intact.  If you want the behavior shown
above, you can do array_shuffle(array_sample(...)).  But if we
randomize it, and that's not what the user wanted, she has no
recourse.

Now, if you're convinced that the set of people wanting
sampling-without-shuffling is the empty set, then making everybody
else call two functions is a loser.  But I'm not convinced.
At the least, I'd like to see the argument made why nobody
would want that.



On the contrary! I am pretty sure there are people out there wanting 
sampling-without-shuffling. I will think about that.

Re: [PATCH] Introduce array_shuffle() and array_sample()

"David G. Johnston"  writes:
> On Mon, Jul 18, 2022 at 3:18 PM Tom Lane  wrote:
>> Independently of the dimensionality question --- I'd imagined that
>> array_sample would select a random subset of the array elements
>> but keep their order intact.  If you want the behavior shown
>> above, you can do array_shuffle(array_sample(...)).  But if we
>> randomize it, and that's not what the user wanted, she has no
>> recourse.

> And for those that want to know in what order those elements were chosen
> they have no recourse in the other setup.

Um ... why is "the order in which the elements were chosen" a concept
we want to expose?  ISTM sample() is a black box in which notionally
the decisions could all be made at once.

> I really think this function needs to grow an algorithm argument that can
> be used to specify stuff like ordering, replacement/without-replacement,
> etc...just some enums separated by commas that can be added to the call.

I think you might run out of gold paint somewhere around here.  I'm
still not totally convinced we should bother with the sample() function
at all, let alone that it needs algorithm variants.  At some point we
say to the user "here's a PL, write what you want for yourself".

regards, tom lane

Re: Commitfest Update

On Mon, Jul 18, 2022 at 12:00:01PM -0700, Jacob Champion wrote:
> And thank you for speaking up so quickly! It's a lot easier to undo
> partial damage :D (Speaking of which: where is that CF update stream you
> mentioned?)

https://commitfest.postgresql.org/activity/

-- 
Justin

Re: [PATCH] Introduce array_shuffle() and array_sample()

On Mon, Jul 18, 2022 at 3:18 PM Tom Lane  wrote:

>
> Independently of the dimensionality question --- I'd imagined that
> array_sample would select a random subset of the array elements
> but keep their order intact.  If you want the behavior shown
> above, you can do array_shuffle(array_sample(...)).  But if we
> randomize it, and that's not what the user wanted, she has no
> recourse.
>
>
And for those that want to know in what order those elements were chosen
they have no recourse in the other setup.

I really think this function needs to grow an algorithm argument that can
be used to specify stuff like ordering, replacement/without-replacement,
etc...just some enums separated by commas that can be added to the call.

David J.

Re: [PATCH] Introduce array_shuffle() and array_sample()

Martin Kalcher  writes:
> If we go with (1) array_shuffle() and array_sample() should shuffle each 
> element individually and always return a one-dimensional array.

>select array_shuffle('{{1,2},{3,4},{5,6}}');
>---
> {1,4,3,5,6,2}

>select array_sample('{{1,2},{3,4},{5,6}}', 3);
>--
> {1,4,3}

Independently of the dimensionality question --- I'd imagined that
array_sample would select a random subset of the array elements
but keep their order intact.  If you want the behavior shown
above, you can do array_shuffle(array_sample(...)).  But if we
randomize it, and that's not what the user wanted, she has no
recourse.

Now, if you're convinced that the set of people wanting
sampling-without-shuffling is the empty set, then making everybody
else call two functions is a loser.  But I'm not convinced.
At the least, I'd like to see the argument made why nobody
would want that.

regards, tom lane

Re: allow specifying action when standby encounters incompatible parameter settings

On Fri, Jun 24, 2022 at 11:42:29AM +0100, Simon Riggs wrote:
> This patch would undo a very important change - to keep servers
> available by default and go back to the old behavior for a huge fleet
> of Postgres databases. The old behavior of shutdown-on-change caused
> catastrophe many times for users and in one case brought down a rather
> large and important service provider, whose CTO explained to me quite
> clearly how stupid he thought that old behavior was. So I will not
> easily agree to allowing it to be put back into PostgreSQL, simply to
> avoid adding a small amount of easy code into an orchestration layer
> that could and should implement documented best practice.
> 
> I am otherwise very appreciative of your insightful contributions,
> just not this specific one.

Given this feedback, I intend to mark the associated commitfest entry as
Withdrawn at the conclusion of the current commitfest.

> Happy to discuss how we might introduce new parameters/behavior to
> reduce the list of parameters that need to be kept in lock-step.

Me, too.  I don't have anything concrete to propose at the moment.  I
thought Horiguchi-san's idea of automatically running ALTER SYSTEM was
intriguing, but I have not explored that in depth.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Use -fvisibility=hidden for shared libraries

Hi,

On 2022-07-18 00:05:16 -0700, Andres Freund wrote:
> It looks like we might be missing out on benefiting from this on more
> platforms - the test right now is in the gcc specific section of configure.ac,
> but it looks like at least Intel's, sun's and AIX's compilers might all
> support -fvisibility with the same syntax.
> 
> Given that that's just about all compilers we support using configure, perhaps
> we should just move that out of the compiler specific section? Doesn't look
> like there's much precedent for that so far...

Here's a potential patch along those lines.


I wonder if we also should move the -fno-strict-aliasing, -fwrapv tests
out. But that'd be something for later.

Greetings,

Andres Freund
>From 713236eb696b6b60bbde3582d0fd31f1de23d7b9 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 18 Jul 2022 15:05:58 -0700
Subject: [PATCH v4] configure: Check if -fvisibility is supported for all
 compilers.

---
 configure| 305 ++-
 configure.ac |  31 --
 2 files changed, 177 insertions(+), 159 deletions(-)

diff --git a/configure b/configure
index a4f4d321fb7..df68b86b09b 100755
--- a/configure
+++ b/configure
@@ -6304,154 +6304,6 @@ if test x"$pgac_cv_prog_CC_cflags__ftree_vectorize" = x"yes"; then
 fi
 
 
-  #
-  # If the compiler knows how to hide symbols add the switch needed for that
-  # to CFLAGS_SL_MODULE and define HAVE_VISIBILITY_ATTRIBUTE.
-  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports -fvisibility=hidden, for CFLAGS_SL_MODULE" >&5
-$as_echo_n "checking whether ${CC} supports -fvisibility=hidden, for CFLAGS_SL_MODULE... " >&6; }
-if ${pgac_cv_prog_CC_cflags__fvisibility_hidden+:} false; then :
-  $as_echo_n "(cached) " >&6
-else
-  pgac_save_CFLAGS=$CFLAGS
-pgac_save_CC=$CC
-CC=${CC}
-CFLAGS="${CFLAGS_SL_MODULE} -fvisibility=hidden"
-ac_save_c_werror_flag=$ac_c_werror_flag
-ac_c_werror_flag=yes
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-int
-main ()
-{
-
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_compile "$LINENO"; then :
-  pgac_cv_prog_CC_cflags__fvisibility_hidden=yes
-else
-  pgac_cv_prog_CC_cflags__fvisibility_hidden=no
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
-ac_c_werror_flag=$ac_save_c_werror_flag
-CFLAGS="$pgac_save_CFLAGS"
-CC="$pgac_save_CC"
-fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CC_cflags__fvisibility_hidden" >&5
-$as_echo "$pgac_cv_prog_CC_cflags__fvisibility_hidden" >&6; }
-if test x"$pgac_cv_prog_CC_cflags__fvisibility_hidden" = x"yes"; then
-  CFLAGS_SL_MODULE="${CFLAGS_SL_MODULE} -fvisibility=hidden"
-fi
-
-
-  if test "$pgac_cv_prog_CC_cflags__fvisibility_hidden" = yes; then
-
-$as_echo "#define HAVE_VISIBILITY_ATTRIBUTE 1" >>confdefs.h
-
-  fi
-  # For C++ we additionally want -fvisibility-inlines-hidden
-  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports -fvisibility=hidden, for CXXFLAGS_SL_MODULE" >&5
-$as_echo_n "checking whether ${CXX} supports -fvisibility=hidden, for CXXFLAGS_SL_MODULE... " >&6; }
-if ${pgac_cv_prog_CXX_cxxflags__fvisibility_hidden+:} false; then :
-  $as_echo_n "(cached) " >&6
-else
-  pgac_save_CXXFLAGS=$CXXFLAGS
-pgac_save_CXX=$CXX
-CXX=${CXX}
-CXXFLAGS="${CXXFLAGS_SL_MODULE} -fvisibility=hidden"
-ac_save_cxx_werror_flag=$ac_cxx_werror_flag
-ac_cxx_werror_flag=yes
-ac_ext=cpp
-ac_cpp='$CXXCPP $CPPFLAGS'
-ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5'
-ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
-ac_compiler_gnu=$ac_cv_cxx_compiler_gnu
-
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-
-int
-main ()
-{
-
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_cxx_try_compile "$LINENO"; then :
-  pgac_cv_prog_CXX_cxxflags__fvisibility_hidden=yes
-else
-  pgac_cv_prog_CXX_cxxflags__fvisibility_hidden=no
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
-ac_ext=c
-ac_cpp='$CPP $CPPFLAGS'
-ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
-ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
-ac_compiler_gnu=$ac_cv_c_compiler_gnu
-
-ac_cxx_werror_flag=$ac_save_cxx_werror_flag
-CXXFLAGS="$pgac_save_CXXFLAGS"
-CXX="$pgac_save_CXX"
-fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CXX_cxxflags__fvisibility_hidden" >&5
-$as_echo "$pgac_cv_prog_CXX_cxxflags__fvisibility_hidden" >&6; }
-if test x"$pgac_cv_prog_CXX_cxxflags__fvisibility_hidden" = x"yes"; then
-  CXXFLAGS_SL_MODULE="${CXXFLAGS_SL_MODULE} -fvisibility=hidden"
-fi
-
-  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports -fvisibility-inlines-hidden, for CXXFLAGS_SL_MODULE" >&5
-$as_echo_n "checking whether ${CXX} supports -fvisibility-inlines-hidden, for CXXFLAGS_SL_MODULE... " >&6; }
-if ${pgac_cv_prog_CXX_cxxflags__fvisibility_inlines_hidden+:} false; then :
-  $as_echo_n "(cached) " >&6
-else
-  pgac_save_CXXFLAGS=$C

Re: pg_parameter_aclcheck() and trusted extensions

On Thu, Jul 14, 2022 at 03:57:35PM -0700, Nathan Bossart wrote:
> However, ALTER ROLE RESET ALL will be blocked, while resetting only the
> individual GUC will go through.
> 
>   postgres=> ALTER ROLE other RESET ALL;
>   ALTER ROLE
>   postgres=> SELECT * FROM pg_db_role_setting;
>setdatabase | setrole |setconfig
>   -+-+-
>  0 |   16385 | {zero_damaged_pages=on}
>   (1 row)
> 
>   postgres=> ALTER ROLE other RESET zero_damaged_pages;
>   ALTER ROLE
>   postgres=> SELECT * FROM pg_db_role_setting;
>setdatabase | setrole | setconfig 
>   -+-+---
>   (0 rows)
> 
> I think this is because GUCArrayReset() is the only caller of
> validate_option_array_item() that sets skipIfNoPermissions to true.  The
> others fall through to set_config_option(), which does a
> pg_parameter_aclcheck().  So, you are right.

Here's a small patch that seems to fix this case.  However, I wonder if a
better way to fix this is to provide a way to stop set_config_option() from
throwing errors (e.g., setting elevel to -1).  That way, we could remove
the manual permissions checks in favor of always using the real ones, which
might help prevent similar bugs in the future.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0328029d43..fbc1014824 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -11689,7 +11689,8 @@ validate_option_array_item(const char *name, const char *value,
 		 * We cannot do any meaningful check on the value, so only permissions
 		 * are useful to check.
 		 */
-		if (superuser())
+		if (superuser() ||
+			pg_parameter_aclcheck(gconf->name, GetUserId(), ACL_SET) == ACLCHECK_OK)
 			return true;
 		if (skipIfNoPermissions)
 			return false;
@@ -11703,6 +11704,8 @@ validate_option_array_item(const char *name, const char *value,
 		 /* ok */ ;
 	else if (gconf->context == PGC_SUSET && superuser())
 		 /* ok */ ;
+	else if (pg_parameter_aclcheck(gconf->name, GetUserId(), ACL_SET) == ACLCHECK_OK)
+		 /* ok */ ;
 	else if (skipIfNoPermissions)
 		return false;
 	/* if a permissions error should be thrown, let set_config_option do it */

Re: [PATCH] Introduce array_shuffle() and array_sample()


Am 18.07.22 um 23:03 schrieb Tom Lane:

I wrote:

Martin had originally proposed (2), which I rejected on the grounds
that we don't treat multi-dimensional arrays as arrays-of-arrays for
any other purpose.


Actually, after poking at it for awhile, that's an overstatement.
It's true that the type system doesn't think N-D arrays are
arrays-of-arrays, but there are individual functions/operators that do.
Thanks Robert for pointing out the inconsistent behavior of 

array_sample(). That needs to be fixed.

As Tom's investigation showed, there is no consensus in the code if 
multi-dimensional arrays are treated as arrays-of-arrays or not. We need 
to decide what should be the correct treatment.


If we go with (1) array_shuffle() and array_sample() should shuffle each 
element individually and always return a one-dimensional array.


  select array_shuffle('{{1,2},{3,4},{5,6}}');
  ---
   {1,4,3,5,6,2}

  select array_sample('{{1,2},{3,4},{5,6}}', 3);
  --
   {1,4,3}

If we go with (2) both functions should only operate on the first 
dimension and shuffle whole subarrays and keep the dimensions intact.


  select array_shuffle('{{1,2},{3,4},{5,6}}');
  -
   {{3,4},{1,2},{5,6}}

  select array_sample('{{1,2},{3,4},{5,6}}', 2);
  ---
   {{3,4},{1,2}}

I do not feel qualified to make that decision. (2) complicates the code 
a bit, but that should not be the main argument here.


Martin

Re: [PATCH] Introduce array_shuffle() and array_sample()

I wrote:
> Martin had originally proposed (2), which I rejected on the grounds
> that we don't treat multi-dimensional arrays as arrays-of-arrays for
> any other purpose.

Actually, after poking at it for awhile, that's an overstatement.
It's true that the type system doesn't think N-D arrays are
arrays-of-arrays, but there are individual functions/operators that do.

For instance, @> is in the its-a-flat-list-of-elements camp:

regression=# select array[[1,2],[3,4]] @> array[1,3];
 ?column? 
--
 t
(1 row)

but || wants to preserve dimensionality:

regression=# select array[[1,2],[3,4]] || array[1];
ERROR:  cannot concatenate incompatible arrays
DETAIL:  Arrays with differing dimensions are not compatible for concatenation.

Various other functions dodge the issue by refusing to work on arrays
of more than one dimension.

There seem to be more functions that think arrays are flat than
otherwise, but it's not as black-and-white as I was thinking.

regards, tom lane

Re: [Commitfest 2022-07] Begins Now

Hi,

On 2022-07-18 13:34:52 -0700, Jacob Champion wrote:
> On 7/18/22 12:32, Andres Freund wrote:
> > I'm not following - I'm talking about the patch author needing a while to
> > address the higher level feedback given by a reviewer. The author might put
> > out a couple new versions, which each might still benefit from review. In 
> > that
> > - pretty common imo - situation I don't think it's useful for the reviewer
> > that provided the higher level feedback to be removed from the patch.
> 
> Okay, I think I get it now. Thanks.
> 
> There's still something off in that case that I can't quite
> articulate... Is it your intent to use Reviewer as a signal that "I'll
> come back to this eventually"?

That, and as a way to find out what I possible should look at again.

> As a signal to other prospective reviewers that you're handling the patch?

Definitely not. I think no reviewer on a patch should be taken as
that. There's often many angles to a patch, and leaving trivial patches aside,
no reviewer is an expert in all of them.

> How should a CFM move things forward when they come to a patch that's been
> responded to by the author but the sole Reviewer has been silent?

Ping the reviewer and/or thread, ensure the patch is needs-review state. I
don't think removing reviewers in the CF app would help with that anyway -
often some reviewers explicitly state that they're only reviewing a specific
part of the patch, or that looked at everything but lack expertise to be
confident in their positions etc.  Such reviewers might do more rounds of
feedback to newer patches, but the patch might still need more feedback.

ISTM that you're trying to get patches to have zero reviewers if they need
more reviewers, because that can serve as a signal in the CF app. But to me
that's a bad proxy.

Greetings,

Andres Freund

Re: doc: Clarify Routines and Extension Membership

On Thu, Jul 14, 2022 at 06:27:17PM -0700, David G. Johnston wrote:
> Thank you and apologies for being quiet here and on a few of the other 
> threads.
> I've been on vacation and flagged as ToDo some of the non-simple feedback 
> items
> that have come this way.

No need to worry --- we will incorporate your suggestions whenever you
can supply them.  I know you waited months for these to be addressed.

> The change to restrict and description in drop extension needs to be fixed up
> (the other pages look good).
> 
> "This option prevents the specified extensions from being dropped if there
> exists non-extension-member objects that depends on any the extensions. This 
> is
> the default."
> 
> At minimum: "...that depend on any of the extensions."

Agreed.

> I did just now confirm that if any of the named extensions failed to be 
> dropped
> the entire command fails.  There is no partial success mode.
> 
> I'd like to avoid non-extension-member, and one of the main points is that the
> routine dependency is member-like, not actual membership.  Hence the separate
> wording.

Okay.

> I thus propose to replace the drop extension / restrict paragraph and replace
> it with the following:
> 
> "This option prevents the specified extensions from being dropped if other
> objects - besides these extensions, their members, and their explicitly
> dependent routines - depend on them.  This is the default."

Good.

> Also, I'm thinking to change, on the same page (description):
> 
> "Dropping an extension causes its component objects,"
> 
> to be:
> 
> "Dropping an extension causes its member objects,"
> 
> I'm not sure why I originally chose component over member...

All done, in the attached patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/ref/drop_extension.sgml b/doc/src/sgml/ref/drop_extension.sgml
index c01ddace84..549b8d3b52 100644
--- a/doc/src/sgml/ref/drop_extension.sgml
+++ b/doc/src/sgml/ref/drop_extension.sgml
@@ -30,7 +30,7 @@ DROP EXTENSION [ IF EXISTS ] name [
 
   
DROP EXTENSION removes extensions from the database.
-   Dropping an extension causes its component objects, and other explicitly
+   Dropping an extension causes its member objects, and other explicitly
dependent routines (see ,
the depends on extension action), to be dropped as well.
   
@@ -79,9 +79,9 @@ DROP EXTENSION [ IF EXISTS ] name [
 RESTRICT
 
  
-  This option prevents the specified extensions from being dropped
-  if there exists non-extension-member objects that depends on any
-  the extensions.  This is the default.
+  This option prevents the specified extensions from being dropped if
+  other objects, besides these extensions, their members, and their
+  explicitly dependent routines, depend on them.  This is the default.

Re: [PATCH] Introduce array_shuffle() and array_sample()

Robert Haas  writes:
> On Mon, Jul 18, 2022 at 3:03 PM Martin Kalcher
>  wrote:
>> array_shuffle(anyarray) -> anyarray
>> array_sample(anyarray, integer) -> anyarray

> I think it's questionable whether the behavior of array_shuffle() is
> correct for a multi-dimensional array. The implemented behavior is to
> keep the dimensions as they were, but permute the elements across all
> levels at random. But there are at least two other behaviors that seem
> potentially defensible: (1) always return a 1-dimensional array, (2)
> shuffle the sub-arrays at the top-level without the possibility of
> moving elements within or between sub-arrays. What behavior we decide
> is best here should be documented.

Martin had originally proposed (2), which I rejected on the grounds
that we don't treat multi-dimensional arrays as arrays-of-arrays for
any other purpose.  Maybe we should have, but that ship sailed decades
ago, and I doubt that shuffle() is the place to start changing it.

I could get behind your option (1) though, to make it clearer that
the input array's dimensionality is not of interest.  Especially since,
as you say, (1) is pretty much the only sensible choice for array_sample.

> I also think you should add test cases involving multi-dimensional
> arrays, as well as arrays with non-default bounds. e.g. trying
> shuffling or sampling some values like
> '[8:10][-6:-5]={{1,2},{3,4},{5,6}}'::int[]

This'd only matter if we decide not to ignore the input's dimensionality.

regards, tom lane

Re: [Commitfest 2022-07] Begins Now

On 7/18/22 12:32, Andres Freund wrote:
> I'm not following - I'm talking about the patch author needing a while to
> address the higher level feedback given by a reviewer. The author might put
> out a couple new versions, which each might still benefit from review. In that
> - pretty common imo - situation I don't think it's useful for the reviewer
> that provided the higher level feedback to be removed from the patch.

Okay, I think I get it now. Thanks.

There's still something off in that case that I can't quite
articulate... Is it your intent to use Reviewer as a signal that "I'll
come back to this eventually"? As a signal to other prospective
reviewers that you're handling the patch? How should a CFM move things
forward when they come to a patch that's been responded to by the author
but the sole Reviewer has been silent?

--Jacob

Re: pg15b2: large objects lost on upgrade

On Mon, Jul 18, 2022 at 4:06 PM Andres Freund  wrote:
> How about adding a new binary_upgrade_* helper function for this purpose
> instead, instead of tying it into truncate?

I considered that briefly, but it would need to do a lot of things
that TRUNCATE already knows how to do, so it does not seem like a good
idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Introduce array_shuffle() and array_sample()

On Mon, Jul 18, 2022 at 3:03 PM Martin Kalcher
 wrote:
> Thanks for all your feedback and help. I got a patch that i consider
> ready for review. It introduces two new functions:
>
>array_shuffle(anyarray) -> anyarray
>array_sample(anyarray, integer) -> anyarray
>
> array_shuffle() shuffles an array (obviously). array_sample() picks n
> random elements from an array.

I like this idea.

I think it's questionable whether the behavior of array_shuffle() is
correct for a multi-dimensional array. The implemented behavior is to
keep the dimensions as they were, but permute the elements across all
levels at random. But there are at least two other behaviors that seem
potentially defensible: (1) always return a 1-dimensional array, (2)
shuffle the sub-arrays at the top-level without the possibility of
moving elements within or between sub-arrays. What behavior we decide
is best here should be documented.

array_sample() will return elements in random order when sample_size <
array_size, but in the original order when sample_size >= array_size.
Similarly, it will always return a 1-dimensional array in the former
case, but will keep the original dimensions in the latter case. That
seems pretty hard to defend. I think it should always return a
1-dimensional array with elements in random order, and I think this
should be documented.

I also think you should add test cases involving multi-dimensional
arrays, as well as arrays with non-default bounds. e.g. trying
shuffling or sampling some values like
'[8:10][-6:-5]={{1,2},{3,4},{5,6}}'::int[]

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [RFC] building postgres with meson - v10

Hi,

On 2022-07-18 11:33:09 +0200, Peter Eisentraut wrote:
> The following patches are ok to commit IMO:
> 
> a1c5542929 prereq: Deal with paths containing \ and spaces in 
> basebackup_to_shell tests
> e37951875d meson: prereq: psql: Output dir and dependency generation for 
> sql_help
> 18cc9fbd02 meson: prereq: ecpg: Add and use output directory argument for 
> preproc/*.pl
> 58a32694e9 meson: prereq: Move snowball_create.sql creation into perl file
> 59b8bffdaf meson: prereq: Add output path arg in generate-lwlocknames.pl
> 2db97b00d5 meson: prereq: generate-errcodes.pl: Accept output file
> fb8f52f21d meson: prereq: unicode: Allow to specify output directory
> 8f1e4410d6 meson: prereq: Refactor dtrace postprocessing make rules
> 3d18a20b11 meson: prereq: Add --outdir option to gen_node_support.pl

I pushed these. Thanks for the reviews and patches!

The symbol export stuff has also been pushed (discussed in a separate thread).

It's nice to see the meson patchset length reduced by this much.

I pushed a rebased version of the remaining branches to git. I'll be on
vacation for a bit, I'm not sure I can get a new version with further cleanups
out before.

Given that we can't use src/tools/gen_versioning_script.pl for the make build,
due to not depending on perl for tarball builds, I'm inclined to rewrite it
python (which we depend on via meson anyway) and consider it a meson specific
wrapper?

Bilal, Peter previously commented on the pg_regress change for ecpg, perhaps
you can comment on that?

In https://postgr.es/m/0e81e45c-c9a5-e95b-2782-ab2dfec8bf57%40enterprisedb.com
On 2022-07-06 11:03:31 +0200, Peter Eisentraut wrote:
> dff7b5a960 meson: prereq: regress: allow to specify director containing
> expected files.
> 
> This could use a bit more explanation, but it doesn't look
> controversial so far.

Greetings,

Andres Freund

Re: [PATCH] Introduce array_shuffle() and array_sample()


Am 18.07.22 um 21:29 schrieb Tom Lane:

The preferred thing to do is to add it to our "commitfest" queue,
which will ensure that it gets looked at eventually.  The currently
open cycle is 2022-09 [1] (see the "New Patch" button there).


Thanks Tom, did that. I am not sure if "SQL Commands" is the correct 
topic for this patch, but i guess its not that important. I am impressed 
by all the infrastructure build around this project.


Martin

Re: System column support for partitioned tables using heap

On Sun, Jul 17, 2022 at 9:04 PM Morris de Oryx  wrote:
> This fails on a partitioned table because xmax() may not exist. In fact, it 
> does exist in all of those tables, but the system doesn't know how to 
> guarantee that. I know which tables are partitioned, and can downgrade the 
> result on partitioned tables to the count(*) I've been using to date. But now 
> I'm wondering if working with xmax() like this is a poor idea going forward. 
> I don't want to lean on a feature/behavior that's likely to change. For 
> example, I noticed the other day that MERGE does not support RETURNING.
>
> I'd appreciate any insight or advice you can offer.

What is motivating you to want to see the xmax value here? It's not an
unreasonable thing to want to do, IMHO, but it's a little bit niche so
I'm just curious what the motivation is.

I do agree with you that it would be nice if this worked better than
it does, but I don't really know exactly how to make that happen. The
column list for a partitioned table must be fixed at the time it is
created, but we do not know what partitions might be added in the
future, and thus we don't know whether they will have an xmax column.
I guess we could have tried to work things out so that a 0 value would
be passed up from children that lack an xmax column, and that would
allow the parent to have such a column, but I don't feel too bad that
we didn't do that ... should I?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

On Mon, Jul 18, 2022 at 02:57:40PM -0400, Robert Haas wrote:
> So I tried implementing this but I didn't get it quite right the first
> time. It's not enough to call smgrdounlinkall() instead of
> RelationDropStorage(), because just as RelationDropStorage() does not
> actually drop the storage but only schedules it to be dropped later,
> so also smgrdounlinkall() does not in fact unlink all, but only some.
> It leaves the first segment of the main relation fork around to guard
> against the hazards discussed in the header comments for mdunlink().
> But those hazards don't really matter here either, because, again, any
> failure will necessarily require that the entire new cluster be
> destroyed, and also because there shouldn't be any concurrent activity
> in the new cluster while pg_upgrade is running. So I adjusted
> smgrdounlinkall() to actually remove everything when IsBinaryUpgrade =
> true. And then it all seems to work: pg_upgrade of a cluster that has
> had a rewrite of pg_largeobject works, and pg_upgrade of a cluster
> that has not had such a rewrite works too. Wa-hoo.

Using the IsBinaryUpgrade flag makes sense to me.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: pg15b2: large objects lost on upgrade

Hi,

On 2022-07-18 14:57:40 -0400, Robert Haas wrote:
> As to whether this is a good fix, I think someone could certainly
> argue otherwise. This is all a bit grotty. However, I don't find it
> all that bad. As long as we're moving files from between one PG
> cluster and another using an external tool rather than logic inside
> the server itself, I think we're bound to have some hacks someplace to
> make it all work. To me, extending them to a few more places to avoid
> leaving files behind on disk seems like a good trade-off. Your mileage
> may vary.

How about adding a new binary_upgrade_* helper function for this purpose
instead, instead of tying it into truncate?

Greetings,

Andres Freund

Re: Commitfest Update

On Mon, Jul 18, 2022 at 12:06:34PM -0700, Jacob Champion wrote:
> On 7/17/22 08:17, Nathan Bossart wrote:
>> Yeah, I happened to look in my commitfest update folder this morning and
>> was surprised to learn that I was no longer reviewing 3612.  I spent a good
>> amount of time getting that patch into a state where I felt it was
>> ready-for-committer, and I intended to follow up on any further
>> developments in the thread.  It's not uncommon for me to use the filter
>> functionality in the app to keep track of patches I'm reviewing.
> 
> I'm sorry again for interrupting that flow. Thank you for speaking up
> and establishing the use case.

No worries.  Thanks for managing this commitfest!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Doc about how to set max_wal_senders when setting minimal wal_level

On Fri, Jul 15, 2022 at 09:29:20PM +0800, Japin Li wrote:
> 
> On Fri, 15 Jul 2022 at 08:49, Bruce Momjian  wrote:
> > On Tue, Jul  5, 2022 at 08:02:33PM -0400, Tom Lane wrote:
> >> "Precondition" is an overly fancy word that makes things less clear
> >> not more so.  Does it mean that setting wal_level = minimal will fail
> >> if you don't do these other things, or does it just mean that you
> >> won't be getting the absolute minimum WAL volume?  If the former,
> >> I think it'd be better to say something like "To set wal_level to minimal,
> >> you must also set [these variables], which has the effect of disabling
> >> both WAL archiving and streaming replication."
> >
> > I have created the attached patch to try to improve this text.
> 
> IMO we can add the following sentence for wal_level description, since
> if wal_level = minimal and max_wal_senders > 0, we cannot start the database.
> 
> To set wal_level to minimal, you must also set max_wal_senders to 0,
> which has the effect of disabling both WAL archiving and streaming
> replication.

Okay, text added in the attached patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 37fd80388c..4c0489c9c3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2764,9 +2764,10 @@ include_dir 'conf.d'
 levels.  This parameter can only be set at server start.


-In minimal level, no information is logged for
-permanent relations for the remainder of a transaction that creates or
-rewrites them.  This can make operations much faster (see
+The minimal level generates the least WAL
+volume.  It logs no row information for permanent relations
+in transactions that create or
+rewrite them.  This can make operations much faster (see
 ).  Operations that initiate this
 optimization include:
 
@@ -2778,19 +2779,20 @@ include_dir 'conf.d'
  REINDEX
  TRUNCATE
 
-But minimal WAL does not contain enough information to reconstruct the
-data from a base backup and the WAL logs, so replica or
-higher must be used to enable WAL archiving
-() and streaming replication.
+However, minimal WAL does not contain sufficient information for
+point-in-time recovery, so replica or
+higher must be used to enable continuous archiving
+() and streaming binary replication.
+In fact, the server will not even start in this mode if
+max_wal_senders is non-zero.
 Note that changing wal_level to
-minimal makes any base backups taken before
-unavailable for archive recovery and standby server, which may
-lead to data loss.
+minimal makes any base backups taken before this
+unusable for point-in-time recovery and standby servers.


 In logical level, the same information is logged as
-with replica, plus information needed to allow
-extracting logical change sets from the WAL. Using a level of
+with replica, plus information needed to
+extract logical change sets from the WAL. Using a level of
 logical will increase the WAL volume, particularly if many
 tables are configured for REPLICA IDENTITY FULL and
 many UPDATE and DELETE statements are

Re: Use fadvise in wal replay

On Thu, Jun 23, 2022 at 5:49 AM Jakub Wartak  wrote:
> Cool. As for GUC I'm afraid there's going to be resistance of adding yet 
> another GUC (to avoid many knobs). Ideally it would be nice if we had some 
> advanced/deep/hidden parameters , but there isn't such thing.
> Maybe another option would be to use (N * maintenance_io_concurrency * 
> XLOG_BLCKSZ), so N=1 that's 80kB and N=2 160kB (pretty close to default 
> value, and still can be tweaked by enduser). Let's wait what others say?

I don't think adding more parameters is a problem intrinsically. A
good question to ask, though, is how the user is supposed to know what
value they should configure. If we don't have any idea what value is
likely to be optimal, odds are users won't either.

It's not very clear to me that we have any kind of agreement on what
the basic approach should be here, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [Commitfest 2022-07] Begins Now

Hi,

On 2022-07-18 12:22:25 -0700, Jacob Champion wrote:
> [dev hat]
> 
> On 7/15/22 18:07, Andres Freund wrote:
> > IDK, I've plenty times given feedback and it took months till it all was
> > implemented. What's the point of doing further rounds of review until then?
> 
> I guess I would wonder why we're optimizing for that case. Is it helpful
> for that patch to stick around in an active CF for months?

I'm not following - I'm talking about the patch author needing a while to
address the higher level feedback given by a reviewer. The author might put
out a couple new versions, which each might still benefit from review. In that
- pretty common imo - situation I don't think it's useful for the reviewer
that provided the higher level feedback to be removed from the patch.

Greetings,

Andres Freund

Re: [PATCH] Introduce array_shuffle() and array_sample()

Martin Kalcher  writes:
> Is someone interested in looking at it? What are the next steps?

The preferred thing to do is to add it to our "commitfest" queue,
which will ensure that it gets looked at eventually.  The currently
open cycle is 2022-09 [1] (see the "New Patch" button there).

I believe you have to have signed up as a community member[2]
in order to put yourself in as the patch author.  I think "New Patch"
will work anyway, but it's better to have an author listed.

regards, tom lane

[1] https://commitfest.postgresql.org/39/
[2] https://www.postgresql.org/account/login/

Re: replacing role-level NOINHERIT with a grant-level option

On Thu, Jul 14, 2022 at 10:53 AM tushar  wrote:
> GRANT "g2" TO "u1" WITH  GRANTED BY "edb";

Another good catch. Here is v5 with a fix for that problem.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v5-0001-Allow-grant-level-control-of-role-inheritance-beh.patch
Description: Binary data

Re: [Commitfest 2022-07] Begins Now

[dev hat]

On 7/15/22 18:07, Andres Freund wrote:
> IDK, I've plenty times given feedback and it took months till it all was
> implemented. What's the point of doing further rounds of review until then?

I guess I would wonder why we're optimizing for that case. Is it helpful
for that patch to stick around in an active CF for months? There's an
established need for keeping a "TODO item" around and not letting it
fall off, but I think that should remain separate in an application
which seems to be focused on organizing active volunteers.

And if that's supposed to be what Waiting on Author is for, then I think
we need more guidance on how to use that status effectively. Some
reviewers seem to use it as a "replied" flag. I think there's a
meaningful difference between soft-blocked on review feedback and
hard-blocked on new implementation. And maybe there's even a middle
state, where the patch just needs someone to do a mindless rebase.

I think you're in a better position than most to "officially" decide
that a patch can no longer benefit from review. Most of us can't do
that, I imagine -- nor should we.

Thanks,
--Jacob

Re: fix crash with Python 3.11

2022-07-18 Thread Peter Eisentraut


On 23.06.22 09:41, Markus Wanner wrote:


On 6/21/22 18:33, Tom Lane wrote:

My inclination at this point is to not back-patch the second change
12d768e70 ("Don't use static storage for 
SaveTransactionCharacteristics").

It's not clear that the benefit would be worth even a small risk of
somebody being unhappy about the API break.


Actually, the backport of 2e517818f ("Fix SPI's handling of errors") 
already broke the API for code using SPICleanup, as that function had 
been removed. Granted, it's not documented, but still exported.


I propose to re-introduce a no-op placeholder similar to what we have 
for SPI_start_transaction, somewhat like the attached patch.


I have applied your patch to branches 11 through 14.

Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-07-18 Thread Andrew Dunstan


On 2022-07-15 Fr 17:07, Andres Freund wrote:
> Hi,
>
> On 2022-07-08 17:05:49 -0400, Andrew Dunstan wrote:
>> On 2022-07-05 Tu 15:04, Andrew Dunstan wrote:
>>> On 2022-07-05 Tu 14:36, Andres Freund wrote:
>> I think Andrew's beta 2 comment was more about my other architectural
>> complains around the json expression eval stuff.
> Right. That's being worked on but it's not going to be a mechanical fix.
 Any updates here?
>>> Not yet. A colleague and I are working on it. I'll post a status this
>>> week if we can't post a fix.
>> We're still working on it. We've made substantial progress but there are
>> some tests failing that we need to fix.
> I think we need to resolve this soon - or consider the alternatives. A lot of
> the new json stuff doesn't seem fully baked, so I'm starting to wonder if we
> have to consider pushing it a release further down.
>
> Perhaps you could post your current state? I might be able to help resolving
> some of the problems.


Ok. Here is the state of things. This has proved to be rather more
intractable than I expected. Almost all the legwork here has been done
by Amit Langote, for which he deserves both my thanks and considerable
credit, but I take responsibility for it.

I just discovered today that this scheme is failing under
"force_parallel_mode = regress". I have as yet no idea if that can be
fixed simply or not. Apart from that I think the main outstanding issue
is to fill in the gaps in llvm_compile_expr().

If you have help you can offer that would be very welcome.

I'd still very much like to get this done, but if the decision is we've
run out of time I'll be sad but understand.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From 7c0d831f3baf6fb6ec27d1e033209535945e4858 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan 
Date: Mon, 18 Jul 2022 11:03:13 -0400
Subject: [PATCH 1/3] in JsonExprState just store a pointer to the input
 FmgrInfo

---
 src/backend/executor/execExpr.c   | 5 -
 src/backend/executor/execExprInterp.c | 2 +-
 src/include/executor/execExpr.h   | 2 +-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index c8d7145fe3..a55e5000e2 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -2606,11 +2606,14 @@ ExecInitExprRec(Expr *node, ExprState *state,
 	(jexpr->result_coercion && jexpr->result_coercion->via_io))
 {
 	Oid			typinput;
+	FmgrInfo   *finfo;
 
 	/* lookup the result type's input function */
 	getTypeInputInfo(jexpr->returning->typid, &typinput,
 	 &jsestate->input.typioparam);
-	fmgr_info(typinput, &jsestate->input.func);
+	finfo = palloc0(sizeof(FmgrInfo));
+	fmgr_info(typinput, finfo);
+	jsestate->input.finfo = finfo;
 }
 
 jsestate->args = NIL;
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 723770fda0..0512a81c7c 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -4664,7 +4664,7 @@ ExecEvalJsonExprCoercion(ExprEvalStep *op, ExprContext *econtext,
 			/* strip quotes and call typinput function */
 			char	   *str = *isNull ? NULL : JsonbUnquote(jb);
 
-			return InputFunctionCall(&jsestate->input.func, str,
+			return InputFunctionCall(jsestate->input.finfo, str,
 	 jsestate->input.typioparam,
 	 jexpr->returning->typmod);
 		}
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index 1e3f1bbee8..e55a572854 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -754,7 +754,7 @@ typedef struct JsonExprState
 
 	struct
 	{
-		FmgrInfo	func;		/* typinput function for output type */
+		FmgrInfo	*finfo;	/* typinput function for output type */
 		Oid			typioparam;
 	}			input;			/* I/O info for output type */
 
-- 
2.34.1

From ee39dadaa18f45c59224d4da908b36db7a2a8b0f Mon Sep 17 00:00:00 2001
From: amitlan 
Date: Mon, 18 Jul 2022 11:04:17 -0400
Subject: [PATCH 2/3] Evaluate various JsonExpr sub-expressions using parent
 ExprState

These include PASSING args, ON ERROR, ON EMPTY default expressions,
and result_coercion expression.

To do so, this moves a bunch of code from ExecEvalJson(),
ExecEvalJsonExpr(), and ExecEvalJsonExprCoercion(), all of which
would previously run under the single step EEOP_JSONEXPR steps into
a number of new (sub-) ExprEvalSteps that are now added for a given
JsonExpr.

TODO: update llvm_compile_expr() to handle newly added
EEOP_JSONEXPR_* steps.
---
 src/backend/executor/execExpr.c   | 288 +++-
 src/backend/executor/execExprInterp.c | 366 ++
 src/backend/jit/llvm/llvmjit_expr.c   |  35 ++-
 src/include/executor/execExpr.h   | 136 --
 4 files changed, 639 insertions(+), 186 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index a55e5000e2..4d

Re: Commitfest Update

On 7/17/22 08:17, Nathan Bossart wrote:
> On Fri, Jul 15, 2022 at 09:37:14PM -0500, Justin Pryzby wrote:
>> I'm not suggesting to give the community regulars special treatment, but you
>> could reasonably assume that when they added themselves and then "didn't 
>> remove
>> themself", it was on purpose and not by omission.  I think most of those 
>> people
>> would be surprised to learn that they're no longer considered to be reviewing
>> the patch.
> 
> Yeah, I happened to look in my commitfest update folder this morning and
> was surprised to learn that I was no longer reviewing 3612.  I spent a good
> amount of time getting that patch into a state where I felt it was
> ready-for-committer, and I intended to follow up on any further
> developments in the thread.  It's not uncommon for me to use the filter
> functionality in the app to keep track of patches I'm reviewing.

I'm sorry again for interrupting that flow. Thank you for speaking up
and establishing the use case.

> Of course, there are probably patches where I could be removed from the
> reviewers field.  I can try to stay on top of that better.  If you think I
> shouldn't be marked as a reviewer and that it's hindering further review
> for a patch, feel free to message me publicly or privately about it.

Sure. I don't plan on removing anyone else from a Reviewer list this
commitfest, but if I do come across a reason I'll make sure to ask first. :)

--Jacob

Re: Commitfest Update

On 7/15/22 19:59, Michael Paquier wrote:
> On this point, I'd like to think that a window of two weeks is a right
> balance.  That's half of the commit fest, so that leaves plenty of
> time for one to answer.  There is always the case where one is on
> vacations for a period longer than that, but it is also possible for
> an author to add a new entry in a future CF for the same patch.

That seems reasonable. My suggestion was going to be more aggressive, at
five days, but really anywhere in that range seems good.

--Jacob

[PATCH] Introduce array_shuffle() and array_sample()

Thanks for all your feedback and help. I got a patch that i consider 
ready for review. It introduces two new functions:


  array_shuffle(anyarray) -> anyarray
  array_sample(anyarray, integer) -> anyarray

array_shuffle() shuffles an array (obviously). array_sample() picks n 
random elements from an array.


Is someone interested in looking at it? What are the next steps?

MartinFrom 5498bb2d9f1fab4cad56cd0d3a6eeafa24a26c8e Mon Sep 17 00:00:00 2001
From: Martin Kalcher 
Date: Sun, 17 Jul 2022 18:06:04 +0200
Subject: [PATCH] Introduce array_shuffle() and array_sample()

* array_shuffle() shuffles to elements of an array.
* array_sample() chooses n elements from an array by random.

Shuffling of arrays can already be accomplished with SQL
using unnest() and array_agg(order by random()). But that is
very slow compared to the new functions.
---
 doc/src/sgml/func.sgml   |  34 ++
 src/backend/utils/adt/arrayfuncs.c   | 163 +++
 src/include/catalog/pg_proc.dat  |   6 +
 src/test/regress/expected/arrays.out |  30 +
 src/test/regress/sql/arrays.sql  |  11 ++
 5 files changed, 244 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b6783b7ad0..c676031b4a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19395,6 +19395,40 @@ SELECT NULLIF(value, '(none)') ...

   
 
+  
+   
+
+ array_sample
+
+array_sample ( array anyarray, n integer )
+anyarray
+   
+   
+Returns n randomly chosen elements from array.
+   
+   
+array_sample(ARRAY[1,2,3,4,5,6], 3)
+{4,5,1}
+   
+  
+
+  
+   
+
+ array_shuffle
+
+array_shuffle ( anyarray )
+anyarray
+   
+   
+Shuffles the elements of the array.
+   
+   
+array_shuffle(ARRAY[1,2,3,4,5,6])
+{4,5,1,3,2,6}
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index fb167f226a..a6769a8ebf 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -6872,3 +6872,166 @@ trim_array(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(result);
 }
+
+/*
+ * Shuffle the elements of an array.
+ */
+Datum
+array_shuffle(PG_FUNCTION_ARGS)
+{
+	ArrayType  *array,
+			   *result;
+	int16		elmlen;
+	bool		elmbyval;
+	char		elmalign;
+	Oid			elmtyp;
+	TypeCacheEntry *typentry;
+	Datum	   *elms,
+elm;
+	bool	   *nuls,
+nul;
+	int			nelms,
+i,
+j;
+
+	array = PG_GETARG_ARRAYTYPE_P(0);
+	nelms = ArrayGetNItems(ARR_NDIM(array), ARR_DIMS(array));
+
+	/* There is no point in shuffling arrays with less then two elements. */
+	if (nelms < 2)
+		PG_RETURN_ARRAYTYPE_P(array);
+
+	elmtyp = ARR_ELEMTYPE(array);
+	typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+
+	if (typentry == NULL || typentry->type_id != elmtyp)
+	{
+		typentry = lookup_type_cache(elmtyp, 0);
+		fcinfo->flinfo->fn_extra = (void *) typentry;
+	}
+	elmlen = typentry->typlen;
+	elmbyval = typentry->typbyval;
+	elmalign = typentry->typalign;
+
+	deconstruct_array(array, elmtyp, elmlen, elmbyval, elmalign,
+	  &elms, &nuls, &nelms);
+
+	/* Shuffle elements and nulls using Fisher-Yates shuffle algorithm. */
+	for (i = nelms - 1; i > 0; i--)
+	{
+		j = random() % (i + 1);
+		elm = elms[i];
+		nul = nuls[i];
+		elms[i] = elms[j];
+		nuls[i] = nuls[j];
+		elms[j] = elm;
+		nuls[j] = nul;
+	}
+
+	result = construct_md_array(elms, nuls,
+ARR_NDIM(array), ARR_DIMS(array), ARR_LBOUND(array),
+elmtyp, elmlen, elmbyval, elmalign);
+
+	pfree(elms);
+	pfree(nuls);
+	PG_FREE_IF_COPY(array, 0);
+
+	PG_RETURN_ARRAYTYPE_P(result);
+}
+
+/*
+ * Choose N random elements from an array.
+ */
+Datum
+array_sample(PG_FUNCTION_ARGS)
+{
+	ArrayType  *array,
+			   *result;
+	int16		elmlen;
+	bool		elmbyval;
+	char		elmalign;
+	Oid			elmtyp;
+	TypeCacheEntry *typentry;
+	Datum	   *elms,
+elm;
+	bool	   *nuls,
+nul;
+	int			nelms,
+rnelms,
+rdims[1],
+rlbs[1],
+i,
+j;
+
+	array = PG_GETARG_ARRAYTYPE_P(0);
+	nelms = ArrayGetNItems(ARR_NDIM(array), ARR_DIMS(array));
+	elmtyp = ARR_ELEMTYPE(array);
+	rnelms = PG_GETARG_INT32(1);
+
+	if (rnelms < 0)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("second parameter must not be negative")));
+
+	/* Return an empty array if the requested sample size is zero. */
+	if (rnelms == 0)
+	{
+		PG_FREE_IF_COPY(array, 0);
+		PG_RETURN_ARRAYTYPE_P(construct_empty_array(elmtyp));
+	}
+
+	/*
+	 * Return the original array if the requested sample size is greater than
+	 * or equal to its own size.
+	 */
+	if (rnelms >= nelms)
+		PG_RETURN_ARRAYTYPE_P(array);
+
+	typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+
+	if (typentry == NULL || typentry->type_id != elmtyp)
+	{
+		typentry = lookup_type_cache(elmtyp, 0);
+		fcinfo->flinfo->fn_extra = (

Re: Commitfest Update

On 7/18/22 06:13, Justin Pryzby wrote:
> On Mon, Jul 18, 2022 at 03:05:51PM +0200, Alvaro Herrera wrote:
>> Maybe we should have two reviewers columns -- one for history-tracking
>> purposes (and commit msg credit) and another for current ones.
> 
> Maybe.  Or, the list of reviewers shouldn't be shown prominently in the list 
> of
> patches.  But changing that would currently break cfbot's web scraping.

I think separating use cases of "what you can currently do for this
patch" and "what others have historically done for this patch" is
important. Whether that's best done with more columns or with some other
workflow, I'm not sure.

It seems like being able to mark items on a personal level, in a way
that doesn't interfere with recordkeeping being done centrally, could
help as well.

--Jacob

Re: Commitfest Update

Justin,

(Consolidating replies here.)

On 7/15/22 19:13, Justin Pryzby wrote:
> cfbot is Thomas's project, so moving it run on postgres vm was one step, but I
> imagine the "integration with cfapp" requires coordination with Magnus.
> 
> What patch ?

https://www.postgresql.org/message-id/CAAWbhmg84OsO5VkaSjX4jokHy8mdpWpNKFgZJHHbb4mprXmtiQ%40mail.gmail.com

It was intended to be a toe in the water -- see if I'm following
conventions, and if I even have the right list.

>>> Similarly, patches could be summarily set to "waiting on author" if they 
>>> didn't
>>> recently apply, compile, and pass tests.  That's the minimum standard.
>>> However, I think it's better not to do this immediately after the patch 
>>> stops
>>> applying/compiling/failing tests, since it's usually easy enough to review 
>>> it.
>>
>> It's hard to argue with that, but without automation, this is plenty of
>> busy work too.
> 
> I don't think that's busywork, since it's understood to require human
> judgement, like 1) to deal with false-positive test failures, and 2) check if
> there's actually anything left for the author to do; 3) check if it passed
> tests recently; 4) evaluate existing opinions in the thread and make a
> judgement call.

[Dev hat; strong opinions ahead.]

I maintain that 1) and 3) are busy work. You should not have to do those
things, in the ideal end state.

>> I think this may have been the goal but I don't think it actually works
>> in practice. The app refuses to let you carry a RwF patch to a new CF.
> 
> I was able to do what Peter said.  I don't know why the cfapp has that
> restriction, it seems like an artificial constraint.
Thanks, I'll work on a patch.

> On Fri, Jul 15, 2022 at 05:23:48PM -0700, Jacob Champion wrote:
> I'm not suggesting to give the community regulars special treatment, but you
> could reasonably assume that when they added themselves and then "didn't 
> remove
> themself", it was on purpose and not by omission.  I think most of those 
> people
> would be surprised to learn that they're no longer considered to be reviewing
> the patch.

For some people, I can maybe(?) assume that, but I'm being honest when I
say that I don't really know who that's true for. I definitely don't
think it's true for everybody. And once I start making those decisions
as a CFM, then it really does boil down to who I know and have
interacted with before.

> Can you give an example of a patch where you sent a significant review, and
> added yourself as a reviewer, but wouldn't mind if someone summarily removed
> you, in batch ?

Literally all of them. That's probably the key disconnect here, and why
I didn't think too hard about doing it. (I promise, I didn't think to
myself "I would really hate it if someone did this to me", and then go
ahead and do it to twenty-some other people. :D)

I come from OSS communities that discourage cookie-licking, whether
accidental or on purpose. I don't like marking myself as a Reviewer in
general (although I have done it, because it seems like the thing to do
here?). Simultaneous reviews are never "wasted work" and I'd just rather
not call dibs on a patch. So I wouldn't have a problem with someone
coming along, seeing that I haven't interacted with a patch for a while,
and removing my name. I trust that committers will give credit if credit
is due.

> The stated goal was to avoid the scenario that a would-be reviewer decides not
> to review a patch because cfapp already shows someone else as a reviewer.  I'm
> sure that could happen, but I doubt it's something that happens frequently..

I do that every commitfest. It's one of the first things I look at to
determine what to pay attention to, because I'm trying to find patches
that have slipped through the cracks. As Tom pointed out, others do it
too, though I don't know how many or if their motivations match mine.

>> Why do you think it's useful to remove annotations that people added ? (And, 
>> if
>> it were useful, why shouldn't that be implemented in the cfapp, which already
>> has all the needed information.)
> 
> Or, to say it differently, since "reviewers" are preserved when a patch is
> moved to the next CF, it comes as a surprise when by some other mechanism or
> policy the field doesn't stay there.  (If it's intended to be more like a
> per-CF field, I think its behavior should be changed in the cfapp, to avoid
> manual effort, and to avoid other people executing it differently.)

It was my assumption, based on the upthread discussion, that that was
the end goal, and that we just hadn't implemented it yet for lack of time.

>> It undoes work that you and others have done to make the historical
>> record more accurate, and I think that's understandably frustrating. But
>> I thought we were trying to move away from that usage of it altogether.
> 
> I gather that your goal was to make the "reviewers" field more like "people 
> who
> are reviewing the current version of the patch", to make it easy to
> find/isolate pat

Re: pg15b2: large objects lost on upgrade

On Tue, Jul 12, 2022 at 4:51 PM Robert Haas  wrote:
> I have a few more ideas to try here. It occurs to me that we could fix
> this more cleanly if we could get the dump itself to set the
> relfilenode for pg_largeobject to the desired value. Right now, it's
> just overwriting the relfilenode stored in the catalog without
> actually doing anything that would cause a change on disk. But if we
> could make it change the relfilenode in a more principled way that
> would actually cause an on-disk change, then the orphaned-file problem
> would be fixed, because we'd always be installing the new file over
> top of the old file. I'm going to investigate how hard it would be to
> make that work.

Well, it took a while to figure out how to make that work, but I
believe I've got it now. Attached please find a couple of patches that
should get the job done. They might need a bit of polish, but I think
the basic concepts are sound.

My first thought was to have the dump issue VACUUM FULL pg_largeobject
after first calling binary_upgrade_set_next_heap_relfilenode() and
binary_upgrade_set_next_index_relfilenode(), and have the VACUUM FULL
use the values configured by those calls for the new heap and index
OID. I got this working in standalone testing, only to find that this
doesn't work inside pg_upgrade. The complaint is "ERROR:  VACUUM
cannot run inside a transaction block", and I think that happens
because pg_restore sends the entire TOC entry for a single object to
the server as a single query, and here it contains multiple SQL
commands. That multi-command string ends up being treated like an
implicit transaction block.

So my second thought was to have the dump still call
binary_upgrade_set_next_heap_relfilenode() and
binary_upgrade_set_next_index_relfilenode(), but then afterwards call
TRUNCATE rather than VACUUM FULL. I found that a simple change to
RelationSetNewRelfilenumber() did the trick: I could then easily
change the heap and index relfilenodes for pg_largeobject to any new
values I liked. However, I realized that this approach had a problem:
what if the pg_largeobject relation had never been rewritten in the
old cluster? Then the heap and index relfilenodes would be unchanged.
It's also possible that someone might have run REINDEX in the old
cluster, in which case it might happen that the heap relfilenode was
unchanged, but the index relfilenode had changed. I spent some time
fumbling around with trying to get the non-transactional truncate path
to cover these cases, but the fact that we might need to change the
relfilenode for the index but not the heap makes this approach seem
pretty awful.

But it occurred to me that in the case of a pg_upgrade, we don't
really need to keep the old storage around until commit time. We can
unlink it first, before creating the new storage, and then if the old
and new storage happen to be the same, it still works. I can think of
two possible objections to this way forward. First, it means that the
operation isn't properly transactional. However, if the upgrade fails
at this stage, the new cluster is going to have to be nuked and
recreated anyway, so the fact that things might be in an unclean state
after an ERROR is irrelevant. Second, one might wonder whether such a
fix is really sufficient. For example, what happens if the relfilenode
allocated to pg_largebject in the old cluster is assigned to its index
in the new cluster, and vice versa? To make that work, we would need
to remove all storage for all relfilenodes first, and then recreate
them all afterward. However, I don't think we need to make that work.
If an object in the old cluster has a relfilenode < 16384, then that
should mean it's never been rewritten and therefore its relfilenode in
the new cluster should be the same. The only way this wouldn't be true
is if we shuffled around the initial relfilenode assignments in a new
version of PG so that the same values were used but now for different
objects, which would be a really dumb idea. And on the other hand, if
the object in the old cluster has a relfilenode > 16384, then that
relfilenode value should be unused in the new cluster. If not, the
user has tinkered with the new cluster more than they ought.

So I tried implementing this but I didn't get it quite right the first
time. It's not enough to call smgrdounlinkall() instead of
RelationDropStorage(), because just as RelationDropStorage() does not
actually drop the storage but only schedules it to be dropped later,
so also smgrdounlinkall() does not in fact unlink all, but only some.
It leaves the first segment of the main relation fork around to guard
against the hazards discussed in the header comments for mdunlink().
But those hazards don't really matter here either, because, again, any
failure will necessarily require that the entire new cluster be
destroyed, and also because there shouldn't be any concurrent activity
in the new cluster while pg_upgrade is running. So I adjusted
smgrdounlinkall() to actua

Re: proposal: possibility to read dumped table's name from file

2022-07-18 Thread Pavel Stehule

ne 17. 7. 2022 v 16:01 odesílatel Justin Pryzby 
napsal:

> Thanks for updating the patch.
>
> This failed to build on windows.
> http://cfbot.cputube.org/pavel-stehule.html
>
>
Yes, there was a significant problem with the function exit_nicely, that is
differently implemented in pg_dump and pg_dumpall.



> Some more comments inline.
>
> On Sun, Jul 17, 2022 at 08:20:47AM +0200, Pavel Stehule wrote:
> > The attached patch implements the --filter option for pg_dumpall and for
> > pg_restore too.
>
> > diff --git a/doc/src/sgml/ref/pg_dump.sgml
> b/doc/src/sgml/ref/pg_dump.sgml
> > index 5efb442b44..ba2920dbee 100644
> > --- a/doc/src/sgml/ref/pg_dump.sgml
> > +++ b/doc/src/sgml/ref/pg_dump.sgml
> > @@ -779,6 +779,80 @@ PostgreSQL documentation
> >
> >   
> >
> > + 
> > +  --filter= class="parameter">filename
> > +  
> > +   
> > +Specify a filename from which to read patterns for objects to
> include
> > +or exclude from the dump. The patterns are interpreted
> according to the
> > +same rules as the corresponding options:
> > +-t/--table for tables,
> > +-n/--schema for schemas,
> > +--include-foreign-data for data on foreign
> servers and
> > +--exclude-table-data for table data.
> > +To read from STDIN use
> - as the

STDIN comma
>

fixed


>
> > +   
> > +Lines starting with # are considered
> comments and
> > +are ignored. Comments can be placed after filter as well. Blank
> lines
>
> change "are ignored" to "ignored", I think.
>

changed


>
> > diff --git a/doc/src/sgml/ref/pg_dumpall.sgml
> b/doc/src/sgml/ref/pg_dumpall.sgml
> > index 8a081f0080..137491340c 100644
> > --- a/doc/src/sgml/ref/pg_dumpall.sgml
> > +++ b/doc/src/sgml/ref/pg_dumpall.sgml
> > @@ -122,6 +122,29 @@ PostgreSQL documentation
> >
> >   
> >
> > + 
> > +  --filter= class="parameter">filename
> > +  
> > +   
> > +Specify a filename from which to read patterns for databases
> excluded
> > +from dump. The patterns are interpretted according to the same
> rules
> > +like --exclude-database.
>
> same rules *as*
>

fixed


>
> > +To read from STDIN use
> - as the
>
> comma
>

fixed


>
> > +filename.  The --filter option can be
> specified in
> > +conjunction with the above listed options for including or
> excluding
>
> For dumpall, remove "for including or"
> change "above listed options" to "exclude-database" ?
>

fixed


>
> > diff --git a/doc/src/sgml/ref/pg_restore.sgml
> b/doc/src/sgml/ref/pg_restore.sgml
> > index 526986eadb..5f16c4a333 100644
> > --- a/doc/src/sgml/ref/pg_restore.sgml
> > +++ b/doc/src/sgml/ref/pg_restore.sgml
> > @@ -188,6 +188,31 @@ PostgreSQL documentation
> >
> >   
> >
> > + 
> > +  --filter= class="parameter">filename
> > +  
> > +   
> > +Specify a filename from which to read patterns for objects
> excluded
> > +or included from restore. The patterns are interpretted
> according to the
> > +same rules like --schema,
> --exclude-schema,
>
> s/like/as/
>

changed


>
> > +--function, --index,
> --table
> > +or --trigger.
> > +To read from STDIN use
> - as the
>
> STDIN comma
>

fixed



>
> > +/*
> > + * filter_get_keyword - read the next filter keyword from buffer
> > + *
> > + * Search for keywords (limited to containing ascii alphabetic
> characters) in
>
> remove "containing"
>

fixed


>
> > + /*
> > +  * If the object name pattern has been quoted we must take care
> parse out
> > +  * the entire quoted pattern, which may contain whitespace and can
> span
> > +  * over many lines.
>
> quoted comma
> *to parse
> remove "over"
>

fixed


>
> > + * The pattern is either simple without any  whitespace, or properly
> quoted
>
> double space
>

fixed


>
> > + * in case there is whitespace in the object name. The pattern handling
> follows
>
> s/is/may be/
>

fixed


>
> > + if (size == 7 && pg_strncasecmp(keyword,
> "include", 7) == 0)
> > + *is_include = true;
> > + else if (size == 7 && pg_strncasecmp(keyword,
> "exclude", 7) == 0)
> > + *is_include = false;
>
> Can't you write strncasecmp(keyword, "include", size) to avoid hardcoding
> "7" ?
>

I need to compare the size of the keyword with expected size, but I can use
strlen(conststr). I wrote new macro is_keyword_str to fix this issue

fixed


>
> > +
> > + if (size == 4 && pg_strncasecmp(keyword, "data",
> 4) == 0)
> > + *objtype = FILTER_OBJECT_TYPE_DATA;
> > + else if (size == 8 && pg_strncasecmp(keyword,
> "database", 8) == 0)
> > + *objtype = FILTER_OBJECT_TYPE_DATABASE;
> > + else if (size == 12 && pg_strncasecmp(keyword,
> "foreign_data", 12) == 0)
> > +

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

Tomas Vondra  writes:
> Thanks. I'll switch this to "needs review" now.

OK, I looked through this, and attach some review suggestions in the
form of a delta patch.  (0001 below is your two patches merged, 0002
is my delta.)  A lot of the delta is comment-smithing, but not all.

After reflection I think that what you've got, ie use reltuples but
don't try to sample if reltuples <= 0, is just fine.  The remote
would only have reltuples <= 0 in a never-analyzed table, which
shouldn't be a situation that persists for long unless the table
is tiny.  Also, if reltuples is in error, the way to bet is that
it's too small thanks to the table having been enlarged.  But
an error in that direction doesn't hurt us: we'll overestimate
the required sample_frac and pull back more data than we need,
but we'll still end up with a valid sample of the right size.
So I doubt it's worth the complication to try to correct based
on relpages etc.  (Note that any such correction would almost
certainly end in increasing our estimate of reltuples.  But
it's safer to have an underestimate than an overestimate.)

I messed around with the sample_frac choosing logic slightly,
to make it skip pointless calculations if we decide right off
the bat to disable sampling.  That way we don't need to worry
about avoiding zero divide, nor do we have to wonder if any
of the later calculations could misbehave.

I left your logic about "disable if saving fewer than 100 rows"
alone, but I have to wonder if using an absolute threshold rather
than a relative one is well-conceived.  Sampling at a rate of
99.9 percent seems pretty pointless, but this code is perfectly
capable of accepting that if reltuples is big enough.  So
personally I'd do that more like

if (sample_frac > 0.95)
method = ANALYZE_SAMPLE_OFF;

which is simpler and would also eliminate the need for the previous
range-clamp step.  I'm not sure what the right cutoff is, but
your "100 tuples" constant is just as arbitrary.

I rearranged the docs patch too.  Where you had it, analyze_sampling
was between fdw_startup_cost/fdw_tuple_cost and the following para
discussing them, which didn't seem to me to flow well at all.  I ended
up putting analyze_sampling in its own separate list.  You could almost
make a case for giving it its own , but I concluded that was
probably overkill.

One thing I'm not happy about, but did not touch here, is the expense
of the test cases you added.  On my machine, that adds a full 10% to
the already excessively long runtime of postgres_fdw.sql --- and I
do not think it's buying us anything.  It is not this module's job
to test whether bernoulli sampling works on partitioned tables.
I think you should just add enough to make sure we exercise the
relevant code paths in postgres_fdw itself.

With these issues addressed, I think this'd be committable.

regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index a9766f9734..ea2139fbfa 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2369,14 +2369,56 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 	appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ);
 }
 
+/*
+ * Construct SELECT statement to acquire a number of rows of a relation.
+ *
+ * Note: Maybe this should compare relpages and current relation size
+ * and adjust reltuples accordingly?
+ */
+void
+deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
+{
+	StringInfoData relname;
+
+	/* We'll need the remote relation name as a literal. */
+	initStringInfo(&relname);
+	deparseRelation(&relname, rel);
+
+	appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = ");
+	deparseStringLiteral(buf, relname.data);
+	appendStringInfoString(buf, "::pg_catalog.regclass");
+}
+
 /*
  * Construct SELECT statement to acquire sample rows of given relation.
  *
  * SELECT command is appended to buf, and list of columns retrieved
  * is returned to *retrieved_attrs.
+ *
+ * XXX We allow customizing the sampling method, but we only support methods
+ * we can decide based on server version. Allowing custom TSM modules (for
+ * example tsm_system_rows) might be useful, but it would require detecting
+ * which extensions are installed, to allow automatic fall-back. Moreover, the
+ * methods use different parameters (not sampling rate). So we don't do this
+ * for now, leaving it for future improvements.
+ *
+ * XXX Using remote random() to sample rows has advantages & disadvantages.
+ * The advantages are that this works on all PostgreSQL versions (unlike
+ * TABLESAMPLE), and that it does the sampling on the remote side (unlike
+ * the old approach, which transfers everything and then discards most data).
+ * We could also do "ORDER BY random() LIMIT x", which would always pick
+ * the expected number of rows, but it requires sorting so it's a bit more
+ * expensive.
+ *
+ * The disadvantage is that we still have to read al

Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages

Overall, these patches look reasonable.

On Mon, Jul 18, 2022 at 04:24:12PM +0530, Bharath Rupireddy wrote:
> record.  Because the entire content of data pages is saved in the
> -   log on the first page modification after a checkpoint (assuming
> +   WAL record on the first page modification after a checkpoint (assuming
>  is not disabled), all pages
> changed since the checkpoint will be restored to a consistent
> state.

nitpick: I would remove the word "record" in this change.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

On Mon, Jul 18, 2022 at 04:53:18PM +0530, Bharath Rupireddy wrote:
> Just wondering - do we ever have a problem if we can't remove the
> snapshot or mapping file?

Besides running out of disk space, there appears to be a transaction ID
wraparound risk with the mappings files.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Commitfest Update

On Fri, Jul 15, 2022 at 05:23:48PM -0700, Jacob Champion wrote:
>> This is important stuff to discuss, for sure, but I also want to revisit
>> the thing I put on pause, which is to clear out old Reviewer entries to
>> make it easier for new reviewers to find work to do. If we're not using
>> Reviewers as a historical record, is there any reason for me not to keep
>> clearing that out?

On Fri, Jul 15, 2022 at 09:37:14PM -0500, Justin Pryzby wrote:
> Why do you think it's useful to remove annotations that people added ? (And, 
> if
> it were useful, why shouldn't that be implemented in the cfapp, which already
> has all the needed information.)

Or, to say it differently, since "reviewers" are preserved when a patch is
moved to the next CF, it comes as a surprise when by some other mechanism or
policy the field doesn't stay there.  (If it's intended to be more like a
per-CF field, I think its behavior should be changed in the cfapp, to avoid
manual effort, and to avoid other people executing it differently.)

> It undoes work that you and others have done to make the historical
> record more accurate, and I think that's understandably frustrating. But
> I thought we were trying to move away from that usage of it altogether.

I gather that your goal was to make the "reviewers" field more like "people who
are reviewing the current version of the patch", to make it easy to
find/isolate patch-versions which need to be reviewed, and hopefully accelarate
the process.

But IMO there's already no trouble finding the list of patches which need to be
reviewed - it's the long list that say "Needs Review" - which is what's
actually needed; that's not easy to do, which is why it's a long list, and no
amount of updating the annotations will help with that.  I doubt many people
search for patches to review by seeking out those which have no reviewer (which
is not a short list anyway).  I think they look for the most interesting
patches, or the ones that are going to be specifically useful to them.

Here's an idea: send out batch mails to people who are listed as reviewers for
patches which "Need Review".  That starts to treat the reviewers field as a
functional thing rather than purely an annotation.  Be sure in your message to
say "You are receiving this message because you're listed as a reviewer for a
patch which -Needs Review-".  I think it's reasonable to get a message like
that 1 or 2 times per month (but not per-month-per-patch).  Ideally it'd
include the list of patches specific to that reviewer, but I think it'd okay to
get an un-personalized email reminder 1x/month with a link.

BTW, one bulk update to make is for the few dozen patches that say "v15" on
them, and (excluding bugfixes) those are nearly all wrong.  Since the field
isn't visible in cfbot, it's mostly ignored.  The field is useful toward the
end of a release cycle to indicate patches that aren't intended for
consideration for the next release.  Ideally, it'd also be used to indicate the
patches *are* being considered, but it seems like nobody does that and it ends
up being a surprise which patches are or are not committed (this seems weird
and easy to avoid but .. ).  The patches which say "v15" are probably from
patches submitted during the v15 cycle, and now the version should be removed,
unless there's a reason to believe the patch is going to target v16 (like if a
committer has assigned themself).

Thanks for receiving my criticism well :)

-- 
Justin

Re: support for CREATE MODULE

On Thu, Mar 17, 2022 at 04:30:43PM -0700, Nathan Bossart wrote:
> On Thu, Mar 17, 2022 at 04:26:31PM -0700, Swaha Miller wrote:
>> On Thu, Mar 17, 2022 at 4:16 PM Nathan Bossart 
>> wrote:
>>> It seems unlikely that this will be committed for v15.  Swaha, should the
>>> commitfest entry be adjusted to v16 and moved to the next commitfest?
>>>
>>>
>> Yes please, thank you Nathan
> 
> Done!

I spoke with Swaha off-list, and we agreed that this commitfest entry can
be closed for now.  I'm going to mark it as returned-with-feedback.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: [Commitfest 2022-07] Begins Now

On 7/15/22 16:42, Jacob Champion wrote:
> If you have thoughts/comments on this approach, please share them!

Okay, plenty of feedback to sift through here.

[CFM hat]

First of all: mea culpa. I unilaterally made a change that I had assumed
would be uncontroversial; it clearly was not, and I interrupted the flow
of the CF for people when my goal was to be mostly invisible this month.
(My single email to a single thread saying "any objections?" is, in
retrospect, not nearly enough reach or mandate to have made this
change.) Big thank you to Justin for seeing it happen and speaking up
immediately.

Here is a rough summary of opinions that have been shared so far; pulled
from the other thread [1] as well:

There are at least three major use cases for the Reviewer field at the
moment.

1) As a new reviewer, find a patch that needs help moving forward.
2) As a committer, give credit to people who moved the patch forward.
3) As an established reviewer, keep track of patches "in flight."

I had never realized the third case existed. To those of you who I've
interrupted by modifying your checklist without permission, I'm sorry. I
see that several of you have already added yourselves back, which is
great; I will try to find the CF update stream that has been alluded to
elsewhere and see if I can restore the original Reviewers lists that I
nulled out on Friday.

It was suggested that we track historical reviewers and current reviews
separately from each other, to handle both cases 1 and 2.

There appears to be a need for people to be able to consider a patch
"blocked" pending some action, so that further review cycles aren't
burned on it. Some people use Waiting on Author for that, but others use
WoA as soon as an email is sent. The two cases have similarities but, to
me at least, aren't the same and may be working at cross purposes.

It is is apparently possible to pull one of your closed patches from a
prior commitfest into the new one, but you have to set it back to Needs
Review first. I plan to work on a CF patch to streamline that, if
someone does not beat me to it.

Okay, I think those are the broad strokes. I will put my [dev hat] on
now and respond more granularly to threads, with stronger opinions.

Thanks,
--Jacob

[1]
https://www.postgresql.org/message-id/flat/34b32cb2-a728-090a-00d5-067305874174%40timescale.com#3247e661b219f8736ae418c9b5452d63

Re: Convert planner's AggInfo and AggTransInfo to Nodes

2022-07-18 Thread Dagfinn Ilmari Mannsåker

Tom Lane  writes:

> I got annoyed just now upon finding that pprint() applied to the planner's
> "root" pointer doesn't dump root->agginfos or root->aggtransinfos.  That's
> evidently because AggInfo and AggTransInfo aren't proper Nodes, just bare
> structs, which presumably is because somebody couldn't be bothered to
> write outfuncs support for them.  I'd say that was a questionable shortcut
> even when it was made, and there's certainly precious little excuse now
> that gen_node_support.pl can do all the heavy lifting.  Hence, PFA a
> little finger exercise to turn them into Nodes.  I took the opportunity
> to improve related comments too, and in particular to fix some comments
> that leave the impression that preprocess_minmax_aggregates still does
> its own scan of the query tree.  (It was momentary confusion over that
> idea that got me to the point of being annoyed in the first place.)
>
> Any objections so far?

It seems like a reasonable idea, but I don't know enough to judge the
wider ramifications of it.  But one thing that the patch should also do,
is switch to using the l*_node() functions instead of manual casting.

The ones I noticed in the patch/context are below, but there are a few
more:

>   foreach(lc, root->agginfos)
>   {
>   AggInfo*agginfo = (AggInfo *) lfirst(lc);

AggInfo*agginfo = lfirst_node(AggInfo, lc);

[…]
>   foreach(lc, transnos)
>   {
>   int transno = lfirst_int(lc);
> - AggTransInfo *pertrans = (AggTransInfo *) 
> list_nth(root->aggtransinfos, transno);
> + AggTransInfo *pertrans = (AggTransInfo *) 
> list_nth(root->aggtransinfos,
> + 
>transno);
> + AggTransInfo *pertrans = (AggTransInfo *) 
> list_nth(root->aggtransinfos,
> + 
>transno);

AggTransInfo *pertrans = list_nth_node(AggTransInfo, 
root->aggtransinfos,

   transno);

- ilmari

Re: Making pg_rewind faster

2022-07-18 Thread Justin Kwan

Hi Tom,

Thank you for taking a look at this and that sounds good. I will send over a 
patch compatible with Postgres v16.

Justin

From: Tom Lane 
Sent: July 17, 2022 2:40 PM
To: Justin Kwan 
Cc: pgsql-hackers ; vignesh 
; jk...@cloudflare.com ; vignesh 
ravichandran ; hlinn...@iki.fi 
Subject: Re: Making pg_rewind faster

Justin Kwan  writes:
> I've also attached the pg_rewind optimization patch file for Postgres version 
> 14.4. The previous patch file targets version Postgres version 15 Beta 1/2.

It's very unlikely that we would consider committing such changes into
released branches.  In fact, it's too late even for v15.  You should
be submitting non-bug-fix patches against master (v16-to-be).

regards, tom lane

Re: Costing elided SubqueryScans more nearly correctly

Alvaro Herrera  writes:
> On 2022-Jul-18, Richard Guo wrote:
>> BTW, not related to this patch, the new lines for parallel_aware check
>> in setrefs.c are very wide. How about wrap them to keep consistent with
>> arounding codes?

> Not untrue!  Something like this, you mean?  Fixed the nearby typo while
> at it.

WFM.  (I'd fixed the comment typo in my patch, but I don't mind if
you get there first.)

regards, tom lane

Re: Costing elided SubqueryScans more nearly correctly

2022-07-18 Thread Alvaro Herrera

On 2022-Jul-18, Richard Guo wrote:

> BTW, not related to this patch, the new lines for parallel_aware check
> in setrefs.c are very wide. How about wrap them to keep consistent with
> arounding codes?

Not untrue!  Something like this, you mean?  Fixed the nearby typo while
at it.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
>From 985ffec3086fc01585cb784a58ddb8975f832350 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 18 Jul 2022 16:40:01 +0200
Subject: [PATCH] Wrap overly long lines

---
 src/backend/optimizer/plan/setrefs.c | 29 
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 9cef92cab2..707c1016c2 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1637,14 +1637,16 @@ set_append_references(PlannerInfo *root,
 	 * See if it's safe to get rid of the Append entirely.  For this to be
 	 * safe, there must be only one child plan and that child plan's parallel
 	 * awareness must match that of the Append's.  The reason for the latter
-	 * is that the if the Append is parallel aware and the child is not then
-	 * the calling plan may execute the non-parallel aware child multiple
-	 * times.
+	 * is that if the Append is parallel aware and the child is not, then the
+	 * calling plan may execute the non-parallel aware child multiple times.
 	 */
-	if (list_length(aplan->appendplans) == 1 &&
-		((Plan *) linitial(aplan->appendplans))->parallel_aware == aplan->plan.parallel_aware)
-		return clean_up_removed_plan_level((Plan *) aplan,
-		   (Plan *) linitial(aplan->appendplans));
+	if (list_length(aplan->appendplans) == 1)
+	{
+		Plan	   *p = (Plan *) linitial(aplan->appendplans);
+
+		if (p->parallel_aware == aplan->plan.parallel_aware)
+			return clean_up_removed_plan_level((Plan *) aplan, p);
+	}
 
 	/*
 	 * Otherwise, clean up the Append as needed.  It's okay to do this after
@@ -1709,14 +1711,17 @@ set_mergeappend_references(PlannerInfo *root,
 	 * See if it's safe to get rid of the MergeAppend entirely.  For this to
 	 * be safe, there must be only one child plan and that child plan's
 	 * parallel awareness must match that of the MergeAppend's.  The reason
-	 * for the latter is that the if the MergeAppend is parallel aware and the
+	 * for the latter is that if the MergeAppend is parallel aware and the
 	 * child is not then the calling plan may execute the non-parallel aware
 	 * child multiple times.
 	 */
-	if (list_length(mplan->mergeplans) == 1 &&
-		((Plan *) linitial(mplan->mergeplans))->parallel_aware == mplan->plan.parallel_aware)
-		return clean_up_removed_plan_level((Plan *) mplan,
-		   (Plan *) linitial(mplan->mergeplans));
+	if (list_length(mplan->mergeplans) == 1)
+	{
+		Plan	   *p = (Plan *) linitial(mplan->mergeplans);
+
+		if (p->parallel_aware == mplan->plan.parallel_aware)
+			return clean_up_removed_plan_level((Plan *) mplan, p);
+	}
 
 	/*
 	 * Otherwise, clean up the MergeAppend as needed.  It's okay to do this
-- 
2.30.2

Convert planner's AggInfo and AggTransInfo to Nodes

I got annoyed just now upon finding that pprint() applied to the planner's
"root" pointer doesn't dump root->agginfos or root->aggtransinfos.  That's
evidently because AggInfo and AggTransInfo aren't proper Nodes, just bare
structs, which presumably is because somebody couldn't be bothered to
write outfuncs support for them.  I'd say that was a questionable shortcut
even when it was made, and there's certainly precious little excuse now
that gen_node_support.pl can do all the heavy lifting.  Hence, PFA a
little finger exercise to turn them into Nodes.  I took the opportunity
to improve related comments too, and in particular to fix some comments
that leave the impression that preprocess_minmax_aggregates still does
its own scan of the query tree.  (It was momentary confusion over that
idea that got me to the point of being annoyed in the first place.)

Any objections so far?

I'm kind of tempted to mount an effort to get rid of as many of
pathnodes.h's "read_write_ignore" annotations as possible.  Some are
necessary to prevent infinite recursion, and others represent considered
judgments that they'd bloat node dumps more than they're worth --- but
I think quite a lot of them arose from plain laziness about updating
outfuncs.c.  With the infrastructure we have now, that's no longer
a good reason.

In particular, I'm tempted to make a dump of PlannerInfo include
all the baserel RelOptInfos (not joins though; there could be a
mighty lot of those.)  I think we didn't print the simple_rel_array[]
array before mostly because outfuncs didn't use to have reasonable
support for printing arrays.

Thoughts?

regards, tom lane

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 9330908cbf..0f5d8fd978 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -137,8 +137,8 @@ preprocess_minmax_aggregates(PlannerInfo *root)
 		return;
 
 	/*
-	 * Scan the tlist and HAVING qual to find all the aggregates and verify
-	 * all are MIN/MAX aggregates.  Stop as soon as we find one that isn't.
+	 * Examine all the aggregates and verify all are MIN/MAX aggregates.  Stop
+	 * as soon as we find one that isn't.
 	 */
 	aggs_list = NIL;
 	if (!can_minmax_aggs(root, &aggs_list))
@@ -227,21 +227,21 @@ preprocess_minmax_aggregates(PlannerInfo *root)
 
 /*
  * can_minmax_aggs
- *		Walk through all the aggregates in the query, and check
- *		if they are all MIN/MAX aggregates.  If so, build a list of the
- *		distinct aggregate calls in the tree.
+ *		Examine all the aggregates in the query, and check if they are
+ *		all MIN/MAX aggregates.  If so, build a list of MinMaxAggInfo
+ *		nodes for them.
  *
  * Returns false if a non-MIN/MAX aggregate is found, true otherwise.
- *
- * This does not descend into subqueries, and so should be used only after
- * reduction of sublinks to subplans.  There mustn't be outer-aggregate
- * references either.
  */
 static bool
 can_minmax_aggs(PlannerInfo *root, List **context)
 {
 	ListCell   *lc;
 
+	/*
+	 * This function used to have to scan the query for itself, but now we can
+	 * just thumb through the AggInfo list made by preprocess_aggrefs.
+	 */
 	foreach(lc, root->agginfos)
 	{
 		AggInfo*agginfo = (AggInfo *) lfirst(lc);
diff --git a/src/backend/optimizer/prep/prepagg.c b/src/backend/optimizer/prep/prepagg.c
index 404a5f1dac..8f4686 100644
--- a/src/backend/optimizer/prep/prepagg.c
+++ b/src/backend/optimizer/prep/prepagg.c
@@ -229,7 +229,7 @@ preprocess_aggref(Aggref *aggref, PlannerInfo *root)
 	}
 	else
 	{
-		AggInfo*agginfo = palloc(sizeof(AggInfo));
+		AggInfo*agginfo = makeNode(AggInfo);
 
 		agginfo->finalfn_oid = aggfinalfn;
 		agginfo->representative_aggref = aggref;
@@ -266,7 +266,7 @@ preprocess_aggref(Aggref *aggref, PlannerInfo *root)
 		same_input_transnos);
 		if (transno == -1)
 		{
-			AggTransInfo *transinfo = palloc(sizeof(AggTransInfo));
+			AggTransInfo *transinfo = makeNode(AggTransInfo);
 
 			transinfo->args = aggref->args;
 			transinfo->aggfilter = aggref->aggfilter;
@@ -452,7 +452,8 @@ find_compatible_trans(PlannerInfo *root, Aggref *newagg, bool shareable,
 	foreach(lc, transnos)
 	{
 		int			transno = lfirst_int(lc);
-		AggTransInfo *pertrans = (AggTransInfo *) list_nth(root->aggtransinfos, transno);
+		AggTransInfo *pertrans = (AggTransInfo *) list_nth(root->aggtransinfos,
+		   transno);
 
 		/*
 		 * if the transfns or transition state types are not the same then the
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 69ba254372..e650af5ff2 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -442,15 +442,15 @@ struct PlannerInfo
 	 * Information about aggregates. Filled by preprocess_aggrefs().
 	 */
 	/* AggInfo structs */
-	List	   *agginfos pg_node_attr(read_write_ignore);
+	List	   *agginfos;
 	/* AggTransInfo structs */
-	List	   *aggtransinfos pg_node_at

Re: Transparent column encryption

On Mon, Jul 18, 2022 at 6:53 AM Peter Eisentraut
 wrote:
> I think a way to look at this is that this column encryption feature
> isn't suitable for disguising the existence or absence of data, it can
> only disguise the particular data that you know exists.

+1.

Even there, what can be accomplished with a feature that only encrypts
individual column values is by nature somewhat limited. If you have a
text column that, for one row, stores the value 'a', and for some
other row, stores the entire text of Don Quixote in the original
Spanish, it is going to be really difficult to keep an adversary who
can read from the disk from distinguishing those rows. If you want to
fix that, you're going to need to do block-level encryption or
something of that sort. And even then, you still have another version
of the problem, because now imagine you have one *table* that contains
nothing but the value 'a' and another that contains nothing but the
entire text of Don Quixote, it is going to be possible for an
adversary to tell those tables apart, even with the corresponding
files encrypted in their entirety.

But I don't think that this means that a feature like this has no
value. I think it just means that we need to clearly document how the
feature works and not over-promise.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [RFC] building postgres with meson - v10

Hi,

On 2022-07-18 11:12:10 +0300, Aleksander Alekseev wrote:
> > Just a quick question - is there a reason for changing the subject of
> > the emails?
> >
> > Not all email clients handle this well, e.g. Google Mail considers
> > this being 10 separate threads. The CF application and/or
> > pgsql-hackers@ archive also don't recognise this as a continuation of
> > the original thread. So all the discussions in -v8, -v9, -v9 ets
> > threads get lost.
> >
> > May I suggest using a single thread?
> 
> OK, the part about the archive is wrong - I scrolled right to the end
> of the thread, didn't notice v10 patch above and assumed it was lost.
> Sorry for the confusion. However, the part about various email clients
> is accurate.

For me the thread is too long to look through without some separation. I
wouldn't do the version in the subject for a small patchset / thread, but at
this size I think it's reasonable.

Greetings,

Andres Freund

Re: doc: mentioned CREATE+ATTACH PARTITION as an alternative to CREATE TABLE..PARTITION OF

2022-07-18 Thread Andrew Dunstan



On 2022-07-18 Mo 10:33, Justin Pryzby wrote:
> It's easy to use CREATE TABLE..LIKE + ALTER..ATTACH PARTITION to avoid
> acquiring a strong lock when creating a new partition.
> But it's also easy to forget.
>
> commit 76c0d1198cf2908423b321cd3340d296cb668c8e
> Author: Justin Pryzby 
> Date:   Mon Jul 18 09:24:55 2022 -0500
>
> doc: mention CREATE+ATTACH PARTITION as an alternative to CREATE 
> TABLE..PARTITION OF
> 
> See also: 898e5e3290a72d288923260143930fb32036c00c
> Should backpatch to v12
>
> diff --git a/doc/src/sgml/ref/create_table.sgml 
> b/doc/src/sgml/ref/create_table.sgml
> index 6bbf15ed1a4..db7d8710bae 100644
> --- a/doc/src/sgml/ref/create_table.sgml
> +++ b/doc/src/sgml/ref/create_table.sgml
> @@ -619,6 +619,16 @@ WITH ( MODULUS  class="parameter">numeric_literal, REM
>with DROP TABLE requires taking an ACCESS
>EXCLUSIVE lock on the parent table.
>   
> +
> + 
> +  Note that creating a partition acquires an ACCESS
> +  EXCLUSIVE lock on the parent table.
> +  It may be preferable to first CREATE a separate table and then ATTACH 
> it,
> +  which does not require as strong of a lock.
> +  See ATTACH 
> PARTITION
> +  and  for more information.
> + 
> +
>  
> 
>  


Style nitpick.


I would prefer "does not require as strong a lock."


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: limits of max, min optimization

2022-07-18 Thread Pavel Stehule

po 18. 7. 2022 v 16:29 odesílatel Tom Lane  napsal:

> Alvaro Herrera  writes:
> > On 2022-Jul-18, Pavel Stehule wrote:
> >> I am trying to fix one slow query, and found that optimization of min,
> max
> >> functions is possible only when there is no JOIN in the query.
>
> > See preprocess_minmax_aggregates() in
> > src/backend/optimizer/plan/planagg.c
> > Maybe it is possible to hack that code so that this case can be handled
> > better.
>
> The comments show this was already thought about:
>
>  * We also restrict the query to reference exactly one table, since
> join
>  * conditions can't be handled reasonably.  (We could perhaps handle a
>  * query containing cartesian-product joins, but it hardly seems worth
> the
>  * trouble.)
>
>
Thank you for reply

Regards

Pavel


> regards, tom lane
>

doc: mentioned CREATE+ATTACH PARTITION as an alternative to CREATE TABLE..PARTITION OF

It's easy to use CREATE TABLE..LIKE + ALTER..ATTACH PARTITION to avoid
acquiring a strong lock when creating a new partition.
But it's also easy to forget.

commit 76c0d1198cf2908423b321cd3340d296cb668c8e
Author: Justin Pryzby 
Date:   Mon Jul 18 09:24:55 2022 -0500

doc: mention CREATE+ATTACH PARTITION as an alternative to CREATE 
TABLE..PARTITION OF

See also: 898e5e3290a72d288923260143930fb32036c00c
Should backpatch to v12

diff --git a/doc/src/sgml/ref/create_table.sgml 
b/doc/src/sgml/ref/create_table.sgml
index 6bbf15ed1a4..db7d8710bae 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -619,6 +619,16 @@ WITH ( MODULUS numeric_literal, REM
   with DROP TABLE requires taking an ACCESS
   EXCLUSIVE lock on the parent table.
  
+
+ 
+  Note that creating a partition acquires an ACCESS
+  EXCLUSIVE lock on the parent table.
+  It may be preferable to first CREATE a separate table and then ATTACH it,
+  which does not require as strong of a lock.
+  See ATTACH 
PARTITION
+  and  for more information.
+ 
+

Re: limits of max, min optimization

Alvaro Herrera  writes:
> On 2022-Jul-18, Pavel Stehule wrote:
>> I am trying to fix one slow query, and found that optimization of min, max
>> functions is possible only when there is no JOIN in the query.

> See preprocess_minmax_aggregates() in
> src/backend/optimizer/plan/planagg.c
> Maybe it is possible to hack that code so that this case can be handled
> better.

The comments show this was already thought about:

 * We also restrict the query to reference exactly one table, since join
 * conditions can't be handled reasonably.  (We could perhaps handle a
 * query containing cartesian-product joins, but it hardly seems worth the
 * trouble.)

regards, tom lane

Re: limits of max, min optimization

2022-07-18 Thread Alvaro Herrera

On 2022-Jul-18, Pavel Stehule wrote:

> Hi
> 
> I am trying to fix one slow query, and found that optimization of min, max
> functions is possible only when there is no JOIN in the query.
> 
> Is it true?

See preprocess_minmax_aggregates() in
src/backend/optimizer/plan/planagg.c

> select max(insert_date) from foo join boo on foo.boo_id = boo.id
> where foo.item_id = 100 and boo.is_ok

Maybe it is possible to hack that code so that this case can be handled
better.


-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Use fadvise in wal replay

2022-07-18 Thread Andrey Borodin

> On 23 Jun 2022, at 12:50, Jakub Wartak  wrote:
> 
> Thoughts?

I've looked into the patch one more time. And I propose to change this line
+   posix_fadvise(readFile, readOff + RACHUNK, RACHUNK, 
POSIX_FADV_WILLNEED);
to
+   posix_fadvise(readFile, readOff + XLOG_BLCKSZ, RACHUNK, 
POSIX_FADV_WILLNEED);

Currently first 128Kb of the file are not prefetched. But I expect that this 
change will produce similar performance results. I propose this change only for 
consistency, so we prefetch all data that we did not prefetch yet and going to 
read. What do you think?

Best regards, Andrey Borodin.

Re: Proposal to introduce a shuffle function to intarray extension