date:20220701

Re: making relfilenodes 56 bits

2022-07-01 Thread Andres Freund

Hi,

I'm not feeling inspired by "locator", tbh. But I don't really have a great
alternative, so ...


On 2022-07-01 16:12:01 +0530, Dilip Kumar wrote:
> From f07ca9ef19e64922c6ee410707e93773d1a01d7c Mon Sep 17 00:00:00 2001
> From: dilip kumar 
> Date: Sat, 25 Jun 2022 10:43:12 +0530
> Subject: [PATCH v4 2/4] Preliminary refactoring for supporting larger
>  relfilenumber

I don't think we have abbreviated buffer as 'buff' in many places? I think we
should either spell buffer out or use 'buf'. This is in regard to BuffTag etc.



> Subject: [PATCH v4 3/4] Use 56 bits for relfilenumber to avoid wraparound

>  /*
> + * GenerateNewRelFileNumber
> + *
> + * Similar to GetNewObjectId but instead of new Oid it generates new
> + * relfilenumber.
> + */
> +RelFileNumber
> +GetNewRelFileNumber(void)
> +{
> + RelFileNumber   result;
> +
> + /* Safety check, we should never get this far in a HS standby */

Normally we don't capitalize the first character of a comment that's not a
full sentence (i.e. ending with a punctuation mark).

> + if (RecoveryInProgress())
> + elog(ERROR, "cannot assign RelFileNumber during recovery");
> +
> + LWLockAcquire(RelFileNumberGenLock, LW_EXCLUSIVE);
> +
> + /* Check for the wraparound for the relfilenumber counter */
> + if (unlikely (ShmemVariableCache->nextRelFileNumber > 
> MAX_RELFILENUMBER))
> + elog(ERROR, "relfilenumber is out of bound");
> +
> + /* If we run out of logged for use RelFileNumber then we must log more 
> */

"logged for use" - looks like you reformulated the sentence incompletely.


> + if (ShmemVariableCache->relnumbercount == 0)
> + {
> + XLogPutNextRelFileNumber(ShmemVariableCache->nextRelFileNumber +
> +  
> VAR_RFN_PREFETCH);

I know this is just copied, but I find "XLogPut" as a prefix pretty unhelpful.


> diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
> index e1f4eef..1cf039c 100644
> --- a/src/include/catalog/pg_class.h
> +++ b/src/include/catalog/pg_class.h
> @@ -31,6 +31,10 @@
>   */
>  CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP 
> BKI_ROWTYPE_OID(83,RelationRelation_Rowtype_Id) BKI_SCHEMA_MACRO
>  {
> + /* identifier of physical storage file */
> + /* relfilenode == 0 means it is a "mapped" relation, see relmapper.c */
> + int64   relfilenode BKI_DEFAULT(0);
> +
>   /* oid */
>   Oid oid;
>  
> @@ -52,10 +56,6 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP 
> BKI_ROWTYPE_OID(83,Relat
>   /* access method; 0 if not a table / index */
>   Oid relam BKI_DEFAULT(heap) BKI_LOOKUP_OPT(pg_am);
>  
> - /* identifier of physical storage file */
> - /* relfilenode == 0 means it is a "mapped" relation, see relmapper.c */
> - Oid relfilenode BKI_DEFAULT(0);
> -
>   /* identifier of table space for relation (0 means default for 
> database) */
>   Oid reltablespace BKI_DEFAULT(0) 
> BKI_LOOKUP_OPT(pg_tablespace);
>

What's the story behind moving relfilenode to the front? Alignment
consideration? Seems odd to move the relfilenode before the oid. If there's an
alignment issue, can't you just swap it with reltablespace or such to resolve
it?



> From f6e8e0e7412198b02671e67d1859a7448fe83f38 Mon Sep 17 00:00:00 2001
> From: dilip kumar 
> Date: Wed, 29 Jun 2022 13:24:32 +0530
> Subject: [PATCH v4 4/4] Don't delay removing Tombstone file until next
>  checkpoint
> 
> Currently, we can not remove the unused relfilenode until the
> next checkpoint because if we remove them immediately then
> there is a risk of reusing the same relfilenode for two
> different relations during single checkpoint due to Oid
> wraparound.

Well, not quite "currently", because at this point we've fixed that in a prior
commit ;)


> Now as part of the previous patch set we have made relfilenode
> 56 bit wider and removed the risk of wraparound so now we don't
> need to wait till the next checkpoint for removing the unused
> relation file and we can clean them up on commit.

Hm. Wasn't there also some issue around crash-restarts benefiting from having
those files around until later? I think what I'm remembering is what is
referenced in this comment:


> - * For regular relations, we don't unlink the first segment file of the rel,
> - * but just truncate it to zero length, and record a request to unlink it 
> after
> - * the next checkpoint.  Additional segments can be unlinked immediately,
> - * however.  Leaving the empty file in place prevents that relfilenumber
> - * from being reused.  The scenario this protects us from is:
> - * 1. We delete a relation (and commit, and actually remove its file).
> - * 2. We create a new relation, which by chance gets the same relfilenumber 
> as
> - * the just-deleted one (OIDs must've wrapped around for that to happen).
> -

Re: Assert name/short_desc to prevent SHOW ALL segfault

2022-07-01 Thread Nathan Bossart

On Sat, Jul 02, 2022 at 12:33:28PM +0900, Michael Paquier wrote:
> Now that v16 is open for business, I have been able to look at it
> again, and applied the patch on HEAD.  My apologies for the wait.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Assert name/short_desc to prevent SHOW ALL segfault

2022-07-01 Thread Michael Paquier

On Sat, May 28, 2022 at 05:50:18AM -0700, Nathan Bossart wrote:
> Yeah, I see no reason that this should go into v15.  I created a new
> commitfest entry so that this isn't forgotten:
> 
>   https://commitfest.postgresql.org/38/3655/
> 
> And I've reposted 0002 here so that we get some cfbot coverage in the
> meantime.

Now that v16 is open for business, I have been able to look at it
again, and applied the patch on HEAD.  My apologies for the wait.
--
Michael


signature.asc
Description: PGP signature

Re: Support logical replication of DDLs

2022-07-01 Thread Amit Kapila

On Fri, Jul 1, 2022 at 10:22 PM vignesh C  wrote:
>
> On Wed, Jun 29, 2022 at 3:25 PM houzj.f...@fujitsu.com
>  wrote:
> >
>
> Thanks for the updated patch.
> Few comments on 0002 patch:
> 1) When we create a subscription for a publication with the existing
> default PUBLISH parameter having default value as
> 'insert,update,delete,truncate', we do an initial table sync to get
> the initial table data from the publisher to the subscriber. But in
> case of a publication created with 'ddl', the subscription expects the
> existing initial tables present in the publisher to be created
> beforehand in the subscriber. Should this be the default behavior?
> Should we do a ddl dump for all the tables and restore the ddl to the
> subscription while creating the subscription? Or is this planned as an
> option for the later version.
>

The idea is to develop initial sync (for ddl replication) as a
separate patch. But both need to be integrated at some point.

>
> 3) SYNTAX Support:
> Currently creation of "FOR TABLE" publication with ddl is supported.
> Should we allow support of ddl for "FOR TABLE" publication.
>

The above comment is unclear to me. It seems to me in the first
sentence, you are saying that the "FOR TABLE" syntax is supported and
in the second sentence, you are asking to allow support of it? I think
at this stage, the focus is to build the core part of the feature
(allow ddl replication and deparsing support), and then we can discuss
more on Syntax. Having said that, it will be good if we can support
table-level DDL replication as well in the patch as you seem to be
suggesting.

-- 
With Regards,
Amit Kapila.

Re: pg15b2: large objects lost on upgrade

2022-07-01 Thread Michael Paquier

On Fri, Jul 01, 2022 at 06:14:13PM -0500, Justin Pryzby wrote:
> I reproduced the problem at 9a974cbcba but not its parent commit.
> 
> commit 9a974cbcba005256a19991203583a94b4f9a21a9
> Author: Robert Haas 
> Date:   Mon Jan 17 13:32:44 2022 -0500
> 
> pg_upgrade: Preserve relfilenodes and tablespace OIDs.

Oops.  Robert?

This reproduces as well when using pg_upgrade with the same version as
origin and target, meaning that this extra step in the TAP test is
able to reproduce the issue (extra VACUUM FULL before chdir'ing):
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -208,6 +208,8 @@ if (defined($ENV{oldinstall}))
}
 }

+$oldnode->safe_psql("regression", "VACUUM FULL pg_largeobject;");
+
# In a VPATH build, we'll be started in the source directory, but we want
--
Michael


signature.asc
Description: PGP signature

Re: Time to remove unparenthesized syntax for VACUUM?

2022-07-01 Thread Noah Misch

On Fri, Jul 01, 2022 at 03:13:16PM -0700, Nathan Bossart wrote:
> On Fri, Jul 01, 2022 at 03:05:55PM -0700, Andres Freund wrote:
> > On 2022-07-01 14:56:42 -0700, Nathan Bossart wrote:
> >> The unparenthesized syntax for VACUUM has been marked deprecated since v9.1
> >> (ad44d50).  Should it be removed in v16?  If not, should we start emitting
> >> WARNINGs when it is used?
> > 
> > What would we gain? ISTM that the number of scripts and typing habits that'd
> > be broken would vastly exceed the benefit.
> 
> Beyond removing a few lines from gram.y and vacuum.sgml, probably not much.
> If it isn't going to be removed, IMO we should consider removing the
> deprecation notice in the docs.

Deprecation doesn't imply eventual removal.  java.io.StringBufferInputStream
has been deprecated for 25 years.  One should not expect it or the old VACUUM
syntax to go away.

Re: First draft of the PG 15 release notes

2022-07-01 Thread Noah Misch

On Fri, Jul 01, 2022 at 02:08:00PM -0400, Bruce Momjian wrote:
> On Wed, Jun 29, 2022 at 10:08:08PM -0700, Noah Misch wrote:
> > On Tue, Jun 28, 2022 at 04:35:45PM -0400, Bruce Momjian wrote:

> > > > >   permissions on the public schema has not
> > > > >   been changed.  Databases restored from previous Postgres 
> > > > > releases
> > > > >   will be restored with their current permissions.  Users wishing
> > > > >   to have the old permissions on new objects will need to grant
> > > > 
> > > > The phrase "old permissions on new objects" doesn't sound right to me, 
> > > > but I'm
> > > > not sure why.  I think you're aiming for the fact that this is just a 
> > > > default;
> > > > one can still change the ACL to anything, including to the old default. 
> > > >  If
> > > > these notes are going to mention the old default like they do so far, I 
> > > > think
> > > > they should also urge readers to understand
> > > > https://www.postgresql.org/docs/devel/ddl-schemas.html#DDL-SCHEMAS-PATTERNS
> > > > before returning to the old default.  What do you think?
> > > 
> > > Agreed, the new text is:
> > > 
> > >   Users wishing to have the former permissions will need to grant
> > >   CREATE permission for PUBLIC on
> > >   the public schema; this change can be made on
> > >   template1 to cause all new databases to have these
> > >   permissions.
> > 
> > What do you think about the "should also urge readers ..." part of my 
> > message?
> 
> I see your point, that there is no indication of why you might not want
> to restore the old permissions.  I created the attached patch which
> makes two additions to clarify this.

> --- a/doc/src/sgml/release-15.sgml
> +++ b/doc/src/sgml/release-15.sgml
> @@ -63,12 +63,11 @@ Author: Noah Misch 
>permissions on the public schema has not
>been changed.  Databases restored from previous Postgres releases
>will be restored with their current permissions.  Users wishing
> -  to have the former more-open permissions will need to grant
> +  to have the former permissions will need to grant
>CREATE permission for PUBLIC
>on the public schema; this change can be made
>on template1 to cause all new databases
> -  to have these permissions.  This change was made to increase
> -  security.
> +  to have these permissions.
>   
>  

Here's what I've been trying to ask: what do you think of linking to
https://www.postgresql.org/docs/devel/ddl-schemas.html#DDL-SCHEMAS-PATTERNS
here?  The release note text is still vague, and the docs have extensive
coverage of the topic.  The notes can just link to that extensive coverage.

Re: Issue with pg_stat_subscription_stats

2022-07-01 Thread Masahiko Sawada

On Sat, Jul 2, 2022 at 2:53 Andres Freund  wrote:

> Hi,
>
> On 2022-07-01 16:08:48 +0900, Masahiko Sawada wrote:
> > Yes, my point is that it may be misleading that the subscription stats
> > are created when a subscription is created.
>
> I think it's important to create stats at that time, because otherwise it's
> basically impossible to ensure that stats are dropped when a transaction
> rolls
> back. If some / all columns should return something else before stats are
> reported that can be addressed easily by tracking that in a separate field.
>
>
> > On the other hand, I'm not sure we agreed that the behavior that
> > Melanie reported is not a problem. The user might get confused since
> > the subscription stats works differently than other stats when a
> > reset. Previously, the primary reason why I hesitated to create the
> > subscription stats when creating a subscription is that CREATE
> > SUBSCRIPTION (with create_slot = false) can be rolled back. But with
> > the shmem stats, we can easily resolve it by using
> > pgstat_create_transactional().
>
> Yep.
>
>
> > > > The second problem is that the following code in DropSubscription()
> > > > should be updated:
> > > >
> > > > /*
> > > >  * Tell the cumulative stats system that the subscription is
> getting
> > > >  * dropped. We can safely report dropping the subscription
> statistics here
> > > >  * if the subscription is associated with a replication slot
> since we
> > > >  * cannot run DROP SUBSCRIPTION inside a transaction block.
> Subscription
> > > >  * statistics will be removed later by (auto)vacuum either if
> it's not
> > > >  * associated with a replication slot or if the message for
> dropping the
> > > >  * subscription gets lost.
> > > >  */
> > > > if (slotname)
> > > > pgstat_drop_subscription(subid);
> > > >
> > > > I think we can call pgstat_drop_subscription() even if slotname is
> > > > NULL and need to update the comment.
> > > >
> > >
> > > +1.
> > >
> > > > IIUC autovacuum is no longer
> > > > responsible for garbage collection.
> > > >
> > >
> > > Right, this is my understanding as well.
> >
> > Thank you for the confirmation.
>
> Want to propose a patch?


Yes, I’ll propose a patch.

Regards,
-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Patch proposal: New hooks in the connection path

2022-07-01 Thread Roberto Mello

On Fri, Jul 1, 2022 at 5:00 PM Nathan Bossart 
wrote:

>
>
> That being said, I don't see why this information couldn't be provided in a
> system view.  IMO it is generically useful.

+1 for a system view with appropriate permissions, in addition to the
hooks.

That would make the information easily accessible to a number or monitoring
systems besides the admin.

Roberto

—
Crunchy Data — passion for open source PostgreSQL

>

Re: should check interrupts in BuildRelationExtStatistics ?

2022-07-01 Thread Justin Pryzby

On Fri, Jul 01, 2022 at 07:19:11PM -0400, Tom Lane wrote:
> I tried interrupting at a random point and then stepping, and
> look what I hit after just a couple of steps:

I'd come up with the trick of setting
  SET backtrace_functions='ProcessInterrupts';

> That, um, piqued my interest.  After a bit of digging,
> I modestly propose the attached.  I'm not sure if it's
> okay to back-patch this, because maybe someone out there
> is relying on qsort() to be incapable of throwing an error
> --- but it'd solve the problem at hand and a bunch of other
> issues of the same ilk.

Confirmed this fixes the 3 types of extended stats, at least for the example
cases I made.

If it's okay to backpatch, v14 seems adequate since the problem is more
prominent with expressional statistics (I'm sure that's why I hit it) ...
otherwise v15 would do.

-- 
Justin

Re: Allow placeholders in ALTER ROLE w/o superuser

2022-07-01 Thread Nathan Bossart

On Sun, Jun 05, 2022 at 11:20:38PM -0500, Steve Chavez wrote:
> However, defining placeholders at the role level require superuser:
> 
>   alter role myrole set my.username to 'tomas';
>   ERROR:  permission denied to set parameter "my.username"
> 
> Which is inconsistent and surprising behavior. I think it doesn't make
> sense since you can already set them at the session or transaction
> level(SET LOCAL my.username = 'tomas'). Enabling this would allow sidecar
> services to store metadata scoped to its pertaining role.
> 
> I've attached a patch that removes this restriction. From my testing, this
> doesn't affect permission checking when an extension defines its custom GUC
> variables.
> 
>DefineCustomStringVariable("my.custom", NULL, NULL,  _custom,  NULL,
>   PGC_SUSET, ..);
> 
> Using PGC_SUSET or PGC_SIGHUP will fail accordingly. Also no tests fail
> when doing "make installcheck".

IIUC you are basically proposing to revert a6dcd19 [0], but it is not clear
to me why that is safe.  Am I missing something?

[0] https://www.postgresql.org/message-id/flat/4090.1258042387%40sss.pgh.pa.us

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: margay fails assertion in stats/dsa/dsm code

2022-07-01 Thread Andres Freund

Hi,

On 2022-07-02 11:10:07 +1200, Thomas Munro wrote:
> 2022-07-01 18:25:25.848 CEST [27738:21] pg_regress/prepared_xacts
> ERROR:  could not open shared memory segment "/PostgreSQL.499018794":
> File exists
> 2022-07-01 18:25:25.848 CEST [27738:22] pg_regress/prepared_xacts
> STATEMENT:  SELECT * FROM pxtest1;
> TRAP: FailedAssertion("!hash_table->find_locked", File: "dshash.c",
> Line: 312, PID: 27784)
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'ExceptionalCondition+0x64
> [0x1008bb8b0]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'dshash_detach+0x48
> [0x10058674c]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'pgstat_detach_shmem+0x68
> [0x10075e630]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'pgstat_shutdown_hook+0x94
> [0x10075989c]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'shmem_exit+0x84
> [0x100701198]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'proc_exit_prepare+0x88
> [0x100701394]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'proc_exit+0x4
> [0x10070148c]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'StartBackgroundWorker+0x150
> [0x10066957c]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'maybe_start_bgworkers+0x604
> [0x1006717ec]
> /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'sigusr1_handler+0x190
> [0x100672510]
> 
> So that's an exception safety problem in dshash or pgstat's new usage
> thereof, which is arguably independent of Solaris and probably
> deserves a new thread.  You don't need Solaris to see it, you can just
> add in some random fault injection.

FWIW potentially relevant thread for that aspect: 
https://postgr.es/m/20220311012712.botrpsikaufzteyt%40alap3.anarazel.de

What do you think about the proposal at the end of that email?

Greetings,

Andres Freund

Re: should check interrupts in BuildRelationExtStatistics ?

2022-07-01 Thread Tom Lane

Justin Pryzby  writes:
> On Mon, Jun 06, 2022 at 04:23:34PM +0900, Michael Paquier wrote:
>> Hmm.  I have to admit that adding a CFI() in multi_sort_compare()
>> stresses me a bit as it is dependent on the number of rows involved,
>> and it can be used as a qsort routine.

> That's exactly the problem for which I showed a backtrace - it took 10s of
> seconds to do qsort, which is (uh) a human timescale and too long to be
> unresponsive, even if I create on a table with many rows a stats object with a
> lot of columns and a high stats target.

Hmm.  On my machine, the example last shown upthread takes about 9
seconds, which I agree is a mighty long time to be unresponsive
--- but it appears that fully half of that elapses before we
reach multi_sort_compare for the first time.  The first half of
the ANALYZE run does seem to contain some CFI calls, but they
are not exactly thick on the ground there either.  So I'm feeling
like this patch isn't ambitious enough.

I tried interrupting at a random point and then stepping, and
look what I hit after just a couple of steps:

(gdb) s
qsort_arg (data=data@entry=0x13161410, n=, n@entry=1679616, 
element_size=element_size@entry=16, 
compare=compare@entry=0x649450 , 
arg=arg@entry=0x7ffec539c0f0) at ../../src/include/lib/sort_template.h:353
353 if (r == 0)
(gdb) 
358 pc -= ST_POINTER_STEP;
(gdb) 
359 DO_CHECK_FOR_INTERRUPTS();

That, um, piqued my interest.  After a bit of digging,
I modestly propose the attached.  I'm not sure if it's
okay to back-patch this, because maybe someone out there
is relying on qsort() to be incapable of throwing an error
--- but it'd solve the problem at hand and a bunch of other
issues of the same ilk.

regards, tom lane

diff --git a/src/port/qsort.c b/src/port/qsort.c
index 7879e6cd56..34b99aac08 100644
--- a/src/port/qsort.c
+++ b/src/port/qsort.c
@@ -2,7 +2,12 @@
  *	qsort.c: standard quicksort algorithm
  */
 
-#include "c.h"
+#ifndef FRONTEND
+#include "postgres.h"
+#include "miscadmin.h"
+#else
+#include "postgres_fe.h"
+#endif
 
 #define ST_SORT pg_qsort
 #define ST_ELEMENT_TYPE_VOID
@@ -10,6 +15,11 @@
 #define ST_SCOPE
 #define ST_DECLARE
 #define ST_DEFINE
+
+#ifndef FRONTEND
+#define ST_CHECK_FOR_INTERRUPTS
+#endif
+
 #include "lib/sort_template.h"
 
 /*
diff --git a/src/port/qsort_arg.c b/src/port/qsort_arg.c
index fa7e11a3b8..685ab32913 100644
--- a/src/port/qsort_arg.c
+++ b/src/port/qsort_arg.c
@@ -2,7 +2,12 @@
  *	qsort_arg.c: qsort with a passthrough "void *" argument
  */
 
-#include "c.h"
+#ifndef FRONTEND
+#include "postgres.h"
+#include "miscadmin.h"
+#else
+#include "postgres_fe.h"
+#endif
 
 #define ST_SORT qsort_arg
 #define ST_ELEMENT_TYPE_VOID
@@ -11,4 +16,9 @@
 #define ST_COMPARE_ARG_TYPE void
 #define ST_SCOPE
 #define ST_DEFINE
+
+#ifndef FRONTEND
+#define ST_CHECK_FOR_INTERRUPTS
+#endif
+
 #include "lib/sort_template.h"

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

2022-07-01 Thread Andres Freund

HHi,

On 2022-07-01 13:14:23 -0700, Andres Freund wrote:
> I saw a couple failures of 031 on CI for the meson patch - which obviously
> doesn't change anything around this. However it adds a lot more distributions,
> and the added ones run in docker containers on a shared host, presumably
> adding a lot of noise.  I saw this more frequently when I accidentally had the
> test runs at the number of CPUs in the host, rather than the allotted CPUs in
> the container.
>
> That made me look more into these issues. I played with adding a pg_usleep()
> to RecoveryConflictInterrupt() to simulate slow signal delivery.
>
> Found a couple things:
>
> - the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can
>   completely swamp the target(s) on a busy system. This surely is exascerbated
>   by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling pace
>   does seem like a bad idea.

This one is also implicated in
https://postgr.es/m/20220701154105.jjfutmngoedgiad3%40alvherre.pgsql
and related issues.

Besides being very short, it also seems like a bad idea to wait when we might
not need to? Seems we should only wait if we subsequently couldn't get the
lock?

Misleadingly WaitExceedsMaxStandbyDelay() also contains a usleep, which at
least I wouldn't expect given its name.


A minimal fix would be to increase the wait time, similar how it is done with
standbyWait_us?

Medium term it seems we ought to set the startup process's latch when handling
a conflict, and use a latch wait. But avoiding races probably isn't quite
trivial.


> - we process the same recovery conflict (not a single signal, but a single
>   "real conflict") multiple times in the target of a conflict, presumably
>   while handling the error. That includes handling the same interrupt once as
>   an ERROR and once as FATAL.
>
>   E.g.
>
> 2022-07-01 12:19:14.428 PDT [2000572] LOG:  recovery still waiting after 
> 10.032 ms: recovery conflict on buffer pin
> 2022-07-01 12:19:14.428 PDT [2000572] CONTEXT:  WAL redo at 0/344E098 for 
> Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 100; blkref #0: rel 
> 1663/16385/1>
> 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl ERROR:  
> canceling statement due to conflict with recovery at character 15
> 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl DETAIL:  User 
> transaction caused buffer deadlock with recovery.
> 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl STATEMENT:  
> SELECT * FROM test_recovery_conflict_table2;
> 2022-07-01 12:19:54.778 PDT [2000572] LOG:  recovery finished waiting after 
> 40359.937 ms: recovery conflict on buffer pin
> 2022-07-01 12:19:54.778 PDT [2000572] CONTEXT:  WAL redo at 0/344E098 for 
> Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 100; blkref #0: rel 
> 1663/16385/1>
> 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl FATAL:  
> terminating connection due to conflict with recovery
> 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl DETAIL:  User 
> transaction caused buffer deadlock with recovery.
> 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl HINT:  In a 
> moment you should be able to reconnect to the database and repeat your 
> command.
> 2022-07-01 12:19:54.837 PDT [2001389] 031_recovery_conflict.pl LOG:  
> statement: SELECT 1;
>
>   note that the startup process considers the conflict resolved by the time
>   the backend handles the interrupt.

I guess the reason we first get an ERROR and then a FATAL is that the second
iteration hits the if (RecoveryConflictPending && DoingCommandRead) bit,
because we end up there after handling the first error? And that's a FATAL.

I suspect that Thomas' fix will address at least part of this, as the check
whether we're still waiting for a lock will be made just before the error is
thrown.


>   I also see cases where a FATAL is repeated:
>
> 2022-07-01 12:43:18.190 PDT [2054721] LOG:  recovery still waiting after 
> 15.410 ms: recovery conflict on snapshot
> 2022-07-01 12:43:18.190 PDT [2054721] DETAIL:  Conflicting process: 2054753.
> 2022-07-01 12:43:18.190 PDT [2054721] CONTEXT:  WAL redo at 0/344AB90 for 
> Heap2/PRUNE: latestRemovedXid 732 nredirected 18 ndead 0; blkref #0: rel 
> 1663/16385/>
> 2054753: reporting recovery conflict 9
> 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl FATAL:  
> terminating connection due to conflict with recovery
> 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl DETAIL:  User 
> query might have needed to see row versions that must be removed.
> 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl HINT:  In a 
> moment you should be able to reconnect to the database and repeat your 
> command.
> ...
> 2054753: reporting recovery conflict 9
> 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl FATAL:  
> terminating connection due to conflict with recovery
> 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl

pg15b2: large objects lost on upgrade

2022-07-01 Thread Justin Pryzby

I noticed this during beta1, but dismissed the issue when it wasn't easily
reproducible.  Now, I saw the same problem while upgrading from beta1 to beta2,
so couldn't dismiss it.  It turns out that LOs are lost if VACUUM FULL was run.

| /usr/pgsql-15b1/bin/initdb --no-sync -D pg15b1.dat -k
| /usr/pgsql-15b1/bin/postgres -D pg15b1.dat -c logging_collector=no -p 5678 -k 
/tmp&
| psql -h /tmp postgres -p 5678 -c '\lo_import /etc/shells' -c 'VACUUM FULL 
pg_largeobject'
| rm -fr pg15b2.dat && /usr/pgsql-15b2/bin/initdb --no-sync -k -D pg15b2.dat && 
/usr/pgsql-15b2/bin/pg_upgrade -d pg15b1.dat -D pg15b2.dat -b 
/usr/pgsql-15b1/bin
| /usr/pgsql-15b2/bin/postgres -D pg15b2.dat -c logging_collector=no -p 5678 -k 
/tmp&

Or, for your convenience, with paths in tmp_install:
| ./tmp_install/usr/local/pgsql/bin/initdb --no-sync -D pg15b1.dat -k
| ./tmp_install/usr/local/pgsql/bin/postgres -D pg15b1.dat -c 
logging_collector=no -p 5678 -k /tmp&
| psql -h /tmp postgres -p 5678 -c '\lo_import /etc/shells' -c 'VACUUM FULL 
pg_largeobject'
| rm -fr pg15b2.dat && ./tmp_install/usr/local/pgsql/bin/initdb --no-sync -k -D 
pg15b2.dat && ./tmp_install/usr/local/pgsql/bin/pg_upgrade -d pg15b1.dat -D 
pg15b2.dat -b ./tmp_install/usr/local/pgsql/bin
| ./tmp_install/usr/local/pgsql/bin/postgres -D pg15b2.dat -c 
logging_collector=no -p 5678 -k /tmp&

postgres=# table pg_largeobject_metadata ;
 16384 |   10 | 

postgres=# \lo_list 
 16384 | pryzbyj | 

postgres=# \lo_export 16384 /dev/stdout
lo_export
postgres=# table pg_largeobject;

postgres=# \dt+ pg_largeobject
 pg_catalog | pg_largeobject | table | pryzbyj | permanent   | heap  | 
0 bytes | 

I reproduced the problem at 9a974cbcba but not its parent commit.

commit 9a974cbcba005256a19991203583a94b4f9a21a9
Author: Robert Haas 
Date:   Mon Jan 17 13:32:44 2022 -0500

pg_upgrade: Preserve relfilenodes and tablespace OIDs.

-- 
Justin

Re: margay fails assertion in stats/dsa/dsm code

2022-07-01 Thread Thomas Munro

On Sat, Jul 2, 2022 at 1:15 AM Robert Haas  wrote:
> Changing the default on certain platforms to 'posix' or 'sysv'
> according to what works best on that platform seems reasonable to me.

Ok, I'm going to make that change in 15 + master.

> I agree that defaulting to 'mmap' doesn't seem like a lot of fun,
> although I think it could be a reasonable choice on a platform where
> everything else is broken. You could alternatively try to fix 'posix'
> by adding some kind of code to work around that platform's
> deficiencies. Insert handwaving here.

I don't think that 'posix' mode is salvageable on Solaris, but a new
GUC to control where 'mmap' mode puts its files would be nice.  Then
you could set it to '/tmp' (or some other RAM disk), and you'd have
the same end result as shm_open() on that platform, without the lock
problem.  Perhaps someone could propose a patch for 16.

As for the commit I already made, we can now see the new error:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay=2022-07-01%2016%3A00%3A07

2022-07-01 18:25:25.848 CEST [27784:1] ERROR:  could not open shared
memory segment "/PostgreSQL.499018794": File exists

Unfortunately this particular run crashed anyway, for a new reason:
one backend didn't like the state the new error left the dshash in,
during shmem_exit:

2022-07-01 18:25:25.848 CEST [27738:21] pg_regress/prepared_xacts
ERROR:  could not open shared memory segment "/PostgreSQL.499018794":
File exists
2022-07-01 18:25:25.848 CEST [27738:22] pg_regress/prepared_xacts
STATEMENT:  SELECT * FROM pxtest1;
TRAP: FailedAssertion("!hash_table->find_locked", File: "dshash.c",
Line: 312, PID: 27784)
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'ExceptionalCondition+0x64
[0x1008bb8b0]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'dshash_detach+0x48
[0x10058674c]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'pgstat_detach_shmem+0x68
[0x10075e630]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'pgstat_shutdown_hook+0x94
[0x10075989c]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'shmem_exit+0x84
[0x100701198]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'proc_exit_prepare+0x88
[0x100701394]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'proc_exit+0x4
[0x10070148c]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'StartBackgroundWorker+0x150
[0x10066957c]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'maybe_start_bgworkers+0x604
[0x1006717ec]
/home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'sigusr1_handler+0x190
[0x100672510]

So that's an exception safety problem in dshash or pgstat's new usage
thereof, which is arguably independent of Solaris and probably
deserves a new thread.  You don't need Solaris to see it, you can just
add in some random fault injection.

Re: Patch proposal: New hooks in the connection path

2022-07-01 Thread Nathan Bossart

On Fri, Jul 01, 2022 at 09:48:40AM +0200, Drouvot, Bertrand wrote:
>> However, I'm personally not okay with having multiple hooks
>> as proposed in the v1 patch.
> 
> I agree that it would be great to reduce the number of proposed hooks.
> 
> But,
> 
>>   Can we think of having a single hook
> 
> The proposed hooks are triggered during errors (means that the connection
> attempt break) and:
> 
> - In the connection paths that will not reach the ClientAuthentication_hook
> at all: those are the ones related to the bad startup packet and timeout
> while processing the startup packet.
> 
> or
> 
> - After the ClientAuthentication_hook is fired: those are the bad db oid,
> bad db name and bad perm ones.
> 
> So, It does look like having only one hook would require refactoring in the
> connection path and I'm not sure if this is worth it.
> 
>> or
>> enhancing the existing ClientAuthentication_hook where we pass a
>> PURPOSE parameter (CONN_SUCCESS, CONN_FAILURE, CONN_FOO, CONN_BAR
>> ) tp the hook?
> 
> I think one could already "predict" the bad db and bad perm errors within
> the current ClientAuthentication_hook.
> 
> But in case of multiple "possible" errors (within the same connection
> attempt) how could we know for sure the one that will be actually reported?
> That's why i think the best way is to put new hooks as close as possible to
> the place where the related errors are reported.
> 
> What do you think?

Could we model this after fmgr_hook?  The first argument in that hook
indicates where it is being called from.  This doesn't alleviate the need
for several calls to the hook in the authentication logic, but extension
authors would only need to define one hook.

That being said, I don't see why this information couldn't be provided in a
system view.  IMO it is generically useful.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Reply: Remove useless param for create_groupingsets_path

2022-07-01 Thread Tom Lane

XueJing Zhao  writes:
> You are right, The patch is incorrect, and I generate a patch once more, It 
> is sent as as attachment named new,patch, please check, thanks!

LGTM.  Pushed, thanks!

regards, tom lane

Re: Emit extra debug message when executing extension script.

2022-07-01 Thread Nathan Bossart

On Fri, Jul 01, 2022 at 03:24:27PM -0700, Jeff Davis wrote:
> + ereport(DEBUG1, errmsg("executing extension update script from 
> version '%s' to '%s'", from_version, version));

nitpick: I would suggest "executing extension script for update from
version X to Y."

I personally would rather this output the name of the file.  If revealing
the directory is a concern, perhaps we could just trim everything but the
file name.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Emit extra debug message when executing extension script.

2022-07-01 Thread Jeff Davis

On Wed, 2022-06-29 at 21:39 -0400, Robert Haas wrote:
> > This should either be elog or use errmsg_internal.
> 
> Why?

I didn't see a response, so I'm still using ereport(). I attached a new
version though that doesn't emit the actual script filename; instead
just the from/to version.

The output looks nicer and I don't have to worry about whether the user
should be able to know the share directory or not.

Regards,
Jeff Davis

From 3024b3189156402e0d4250e43dcccec5aa5d01a3 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 28 Jun 2022 12:06:03 -0700
Subject: [PATCH] Emit debug message when executing extension script.

Allows extension authors to more easily debug problems related to the
sequence of update scripts that are executed.
---
 src/backend/commands/extension.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 767d9b96190..a35c49ac7e7 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -887,6 +887,11 @@ execute_extension_script(Oid extensionOid, ExtensionControlFile *control,
 
 	filename = get_extension_script_filename(control, from_version, version);
 
+	if (from_version == NULL)
+		ereport(DEBUG1, errmsg("executing extension script for version '%s'", version));
+	else
+		ereport(DEBUG1, errmsg("executing extension update script from version '%s' to '%s'", from_version, version));
+
 	/*
 	 * If installing a trusted extension on behalf of a non-superuser, become
 	 * the bootstrap superuser.  (This switch will be cleaned up automatically
-- 
2.17.1

Re: Time to remove unparenthesized syntax for VACUUM?

2022-07-01 Thread Nathan Bossart

On Fri, Jul 01, 2022 at 03:19:28PM -0700, Andres Freund wrote:
> On 2022-07-01 15:13:16 -0700, Nathan Bossart wrote:
>> On Fri, Jul 01, 2022 at 03:05:55PM -0700, Andres Freund wrote:
>> > On 2022-07-01 14:56:42 -0700, Nathan Bossart wrote:
>> >> The unparenthesized syntax for VACUUM has been marked deprecated since 
>> >> v9.1
>> >> (ad44d50).  Should it be removed in v16?  If not, should we start emitting
>> >> WARNINGs when it is used?
>> > 
>> > What would we gain? ISTM that the number of scripts and typing habits 
>> > that'd
>> > be broken would vastly exceed the benefit.
>> 
>> Beyond removing a few lines from gram.y and vacuum.sgml, probably not much.
>> If it isn't going to be removed, IMO we should consider removing the
>> deprecation notice in the docs.
> 
> Still serves as an explanation as to why newer options haven't been / won't be
> added in an unparenthesized manner. And maybe there one day will be reason to
> remove them, e.g. grammar ambiguities.

Fair point.  Thanks for the discussion.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Time to remove unparenthesized syntax for VACUUM?

2022-07-01 Thread Andres Freund

Hi,

On 2022-07-01 15:13:16 -0700, Nathan Bossart wrote:
> On Fri, Jul 01, 2022 at 03:05:55PM -0700, Andres Freund wrote:
> > On 2022-07-01 14:56:42 -0700, Nathan Bossart wrote:
> >> The unparenthesized syntax for VACUUM has been marked deprecated since v9.1
> >> (ad44d50).  Should it be removed in v16?  If not, should we start emitting
> >> WARNINGs when it is used?
> > 
> > What would we gain? ISTM that the number of scripts and typing habits that'd
> > be broken would vastly exceed the benefit.
> 
> Beyond removing a few lines from gram.y and vacuum.sgml, probably not much.
> If it isn't going to be removed, IMO we should consider removing the
> deprecation notice in the docs.

Still serves as an explanation as to why newer options haven't been / won't be
added in an unparenthesized manner. And maybe there one day will be reason to
remove them, e.g. grammar ambiguities.

Greetings,

Andres Freund

Re: EINTR in ftruncate()

2022-07-01 Thread Andres Freund

Hi,

On 2022-07-02 09:52:33 +1200, Thomas Munro wrote:
> On Sat, Jul 2, 2022 at 9:06 AM Andres Freund  wrote:
> > On 2022-07-01 13:29:44 -0700, Andres Freund wrote:
> > Chris, do you have any additional details about the machine that lead to 
> > this
> > change? OS version, whether it might have been swapping, etc?
> >
> > I wonder if what happened is that posix_fallocate() used glibc's fallback
> > implementation because the kernel was old enough to not support fallocate()
> > for tmpfs.  Looks like support for fallocate() for tmpfs was added in 3.5
> > ([1]). So e.g. a rhel 6 wouldn't have had that.
> 
> With a quick test program on my Linux 5.10 kernel I see that an
> SA_RESTART signal handler definitely causes posix_fallocate() to
> return EINTR (can post trivial program).
> 
> A drive-by look at the current/modern kernel source supports this:
> shmem_fallocate returns -EINTR directly (not -ERESTARTSYS, which seems
> to be the Linux-y way to say you want EINTR or restart as
> appropriate?), and it also undoes all partial progress too (not too
> surprising), which would explain why a perfectly timed machine gun
> stream of signals from our recovery conflict system can make an
> fallocate retry loop never terminate, for large enough sizes.

Yea :(

And even if we fix recovery to not do douse other processes in signals quite
that badly, there are plenty other sources of signals that can arrive at a
steady clip. So I think we need to do something to defuse this another way.

Ideas:

1) do the fallocate in smaller chunks, thereby making it much more likely to
   complete between two signal deliveries
2) block signals while calling posix_fallocate(). That won't work for
   everything (e.g. rapid SIGSTOP/SIGCONT), but that's not something we'd send
   ourselves, so whatever.
3) 1+2
4) ?

Greetings,

Andres Freund

Re: Time to remove unparenthesized syntax for VACUUM?

2022-07-01 Thread Nathan Bossart

On Fri, Jul 01, 2022 at 03:05:55PM -0700, Andres Freund wrote:
> On 2022-07-01 14:56:42 -0700, Nathan Bossart wrote:
>> The unparenthesized syntax for VACUUM has been marked deprecated since v9.1
>> (ad44d50).  Should it be removed in v16?  If not, should we start emitting
>> WARNINGs when it is used?
> 
> What would we gain? ISTM that the number of scripts and typing habits that'd
> be broken would vastly exceed the benefit.

Beyond removing a few lines from gram.y and vacuum.sgml, probably not much.
If it isn't going to be removed, IMO we should consider removing the
deprecation notice in the docs.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Time to remove unparenthesized syntax for VACUUM?

2022-07-01 Thread Andres Freund

Hi,

On 2022-07-01 14:56:42 -0700, Nathan Bossart wrote:
> The unparenthesized syntax for VACUUM has been marked deprecated since v9.1
> (ad44d50).  Should it be removed in v16?  If not, should we start emitting
> WARNINGs when it is used?

What would we gain? ISTM that the number of scripts and typing habits that'd
be broken would vastly exceed the benefit.

- Andres

Time to remove unparenthesized syntax for VACUUM?

2022-07-01 Thread Nathan Bossart

Hi hackers,

The unparenthesized syntax for VACUUM has been marked deprecated since v9.1
(ad44d50).  Should it be removed in v16?  If not, should we start emitting
WARNINGs when it is used?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: EINTR in ftruncate()

2022-07-01 Thread Thomas Munro

On Sat, Jul 2, 2022 at 9:06 AM Andres Freund  wrote:
> On 2022-07-01 13:29:44 -0700, Andres Freund wrote:
> > On 2022-07-01 19:55:16 +0200, Alvaro Herrera wrote:
> > > Allow DSM allocation to be interrupted.
> > >
> > > Chris Travers reported that the startup process can repeatedly try to
> > > cancel a backend that is in a posix_fallocate()/EINTR loop and cause 
> > > it
> > > to loop forever.  Teach the retry loop to give up if an interrupt is
> > > pending.  Don't actually check for interrupts in that loop though,
> > > because a non-local exit would skip some clean-up code in the caller.
> >
> > That whole approach seems quite wrong to me. At the absolute very least the
> > code needs to check if interrupts are being processed in the current context
> > before just giving up due to ProcDiePending || QueryCancelPending.
> >
> > I'm very unconvinced this ought to be fixed in dsm_impl_posix_resize(), 
> > rather
> > than the startup process signalling.

I agree it's not great.  It was a back-patchable bandaid in need of a
better solution.

> Chris, do you have any additional details about the machine that lead to this
> change? OS version, whether it might have been swapping, etc?
>
> I wonder if what happened is that posix_fallocate() used glibc's fallback
> implementation because the kernel was old enough to not support fallocate()
> for tmpfs.  Looks like support for fallocate() for tmpfs was added in 3.5
> ([1]). So e.g. a rhel 6 wouldn't have had that.

With a quick test program on my Linux 5.10 kernel I see that an
SA_RESTART signal handler definitely causes posix_fallocate() to
return EINTR (can post trivial program).

A drive-by look at the current/modern kernel source supports this:
shmem_fallocate returns -EINTR directly (not -ERESTARTSYS, which seems
to be the Linux-y way to say you want EINTR or restart as
appropriate?), and it also undoes all partial progress too (not too
surprising), which would explain why a perfectly timed machine gun
stream of signals from our recovery conflict system can make an
fallocate retry loop never terminate, for large enough sizes.

Re: pg_checkpointer is not a verb or verb phrase

2022-07-01 Thread Nathan Bossart

On Fri, Jul 01, 2022 at 03:36:48PM +0200, Magnus Hagander wrote:
> +1 for pg_checkpoint on that -- let's not make it longer than necessary.
> 
> And yes, +1 for actually changing it. It's a lot cheaper to change it now
> than it will be in the future.  Yes, it would've been even cheaper to have
> already changed it, but we can't go back in time...

Yeah, pg_checkpoint seems like the obvious alternative to pg_checkpointer.
I didn't see a patch in this thread yet, so I've put one together.  I did
not include the catversion bump.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 2a514b16238a55b9929029ce8f641e51178077f1 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 1 Jul 2022 14:44:15 -0700
Subject: [PATCH v1 1/1] Rename pg_checkpointer to pg_checkpoint.

---
 doc/src/sgml/ref/checkpoint.sgml  | 2 +-
 doc/src/sgml/user-manag.sgml  | 2 +-
 src/backend/po/de.po  | 2 +-
 src/backend/po/ja.po  | 4 ++--
 src/backend/po/sv.po  | 2 +-
 src/backend/tcop/utility.c| 4 ++--
 src/include/catalog/pg_authid.dat | 4 ++--
 7 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/ref/checkpoint.sgml b/doc/src/sgml/ref/checkpoint.sgml
index 1cebc03d15..28a1d717b8 100644
--- a/doc/src/sgml/ref/checkpoint.sgml
+++ b/doc/src/sgml/ref/checkpoint.sgml
@@ -53,7 +53,7 @@ CHECKPOINT
 
   
Only superusers or users with the privileges of
-   the pg_checkpointer
+   the pg_checkpoint
role can call CHECKPOINT.
   
  
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index 9067be1d9c..6e36b8 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -583,7 +583,7 @@ DROP ROLE doomed_role;
COPY and other functions which allow executing a server-side program.
   
   
-   pg_checkpointer
+   pg_checkpoint
Allow executing
the CHECKPOINT
command.
diff --git a/src/backend/po/de.po b/src/backend/po/de.po
index f665817458..999f2c2839 100644
--- a/src/backend/po/de.po
+++ b/src/backend/po/de.po
@@ -22711,7 +22711,7 @@ msgstr "%s kann nicht in einem Hintergrundprozess ausgeführt werden"
 #: tcop/utility.c:953
 #, fuzzy, c-format
 #| msgid "must be superuser to do CHECKPOINT"
-msgid "must be superuser or have privileges of pg_checkpointer to do CHECKPOINT"
+msgid "must be superuser or have privileges of pg_checkpoint to do CHECKPOINT"
 msgstr "nur Superuser können CHECKPOINT ausführen"
 
 #: tsearch/dict_ispell.c:52 tsearch/dict_thesaurus.c:615
diff --git a/src/backend/po/ja.po b/src/backend/po/ja.po
index 0925465d22..8eca82d8cb 100644
--- a/src/backend/po/ja.po
+++ b/src/backend/po/ja.po
@@ -22010,8 +22010,8 @@ msgstr "バックグラウンドプロセス内で%sを実行することはで
 
 #: tcop/utility.c:953
 #, c-format
-msgid "must be superuser or have privileges of pg_checkpointer to do CHECKPOINT"
-msgstr "CHECKPOINTを実行するにはスーパーユーザーであるか、またはpg_checkpointerの権限を持つ必要があります"
+msgid "must be superuser or have privileges of pg_checkpoint to do CHECKPOINT"
+msgstr "CHECKPOINTを実行するにはスーパーユーザーであるか、またはpg_checkpointの権限を持つ必要があります"
 
 #: tsearch/dict_ispell.c:52 tsearch/dict_thesaurus.c:615
 #, c-format
diff --git a/src/backend/po/sv.po b/src/backend/po/sv.po
index 05f8184506..095eb4e4ea 100644
--- a/src/backend/po/sv.po
+++ b/src/backend/po/sv.po
@@ -22794,7 +22794,7 @@ msgstr "kan inte köra %s i en bakgrundsprocess"
 #: tcop/utility.c:953
 #, fuzzy, c-format
 #| msgid "must be superuser to do CHECKPOINT"
-msgid "must be superuser or have privileges of pg_checkpointer to do CHECKPOINT"
+msgid "must be superuser or have privileges of pg_checkpoint to do CHECKPOINT"
 msgstr "måste vara superuser för att göra CHECKPOINT"
 
 #: tsearch/dict_ispell.c:52 tsearch/dict_thesaurus.c:615
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 6a5bcded55..6b0a865262 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -947,10 +947,10 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			break;
 
 		case T_CheckPointStmt:
-			if (!has_privs_of_role(GetUserId(), ROLE_PG_CHECKPOINTER))
+			if (!has_privs_of_role(GetUserId(), ROLE_PG_CHECKPOINT))
 ereport(ERROR,
 		(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-		 errmsg("must be superuser or have privileges of pg_checkpointer to do CHECKPOINT")));
+		 errmsg("must be superuser or have privileges of pg_checkpoint to do CHECKPOINT")));
 
 			RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT |
 			  (RecoveryInProgress() ? 0 : CHECKPOINT_FORCE));
diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat
index 6c28119fa1..3343a69ddb 100644
--- a/src/include/catalog/pg_authid.dat
+++ b/src/include/catalog/pg_authid.dat
@@ -79,8 +79,8 @@
   rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
   rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
   rolpassword => '_null_', rolvaliduntil => '_null_' },
-{ oid => '4544', oid_symbol => 'ROLE_PG_CHECKPOINTER',
-

Re: refactor some protocol message sending in walsender and basebackup

2022-07-01 Thread Nathan Bossart

On Thu, Jun 23, 2022 at 04:36:36PM +0200, Peter Eisentraut wrote:
> Some places in walsender.c and basebackup_copy.c open-code the sending of
> RowDescription and DataRow protocol messages.  But there are already more
> compact and robust solutions available for this, using DestRemoteSimple and
> associated machinery, already in use in walsender.c.
> 
> The attached patches 0001 and 0002 are tiny bug fixes I found during this.
> 
> Patches 0003 and 0004 are the main refactorings.  They should probably be
> combined into one patch eventually, but this way the treatment of
> RowDescription and DataRow is presented separately.

All 4 patches look reasonable to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Use outerPlanState macro instead of referring to leffttree

2022-07-01 Thread Tom Lane

Richard Guo  writes:
> In the executor code, we mix use outerPlanState macro and referring to
> leffttree. Commit 40f42d2a tried to keep the code consistent by
> replacing referring to lefftree with outerPlanState macro, but there are
> still some outliers. This patch tries to clean them up.

Seems generally reasonable, but what about righttree?  I find a few
of those too with "grep".

Backing up a little bit, one thing not to like about the outerPlanState
and innerPlanState macros is that they lose all semblance of type
safety:

#define innerPlanState(node)(((PlanState *)(node))->righttree)
#define outerPlanState(node)(((PlanState *)(node))->lefttree)

You can pass any pointer you want, and the compiler will not complain.
I wonder if there's any trick (even a gcc-only one) that could improve
on that.  In the absence of such a check, people might feel that
increasing our reliance on these macros isn't such a hot idea.

Now, the typical coding pattern you've used:

 ExecReScanHash(HashState *node)
 {
+   PlanState  *outerPlan = outerPlanState(node);

is probably reasonably secure against wrong-pointer slip-ups.  But
I'm less convinced about that for in-line usages in the midst of
a function, particularly in the common case that the function has
a variable pointing to its Plan node as well as PlanState node.
Would it make sense to try to use the local-variable style everywhere?

regards, tom lane

Re: replacing role-level NOINHERIT with a grant-level option

2022-07-01 Thread Nathan Bossart

On Fri, Jul 01, 2022 at 02:59:53PM -0400, Robert Haas wrote:
> And here is a patch that makes it work that way. In this version, you
> can GRANT foo TO bar WITH INHERIT { TRUE | FALSE | DEFAULT }, and
> DEFAULT means that role-level [NO]INHERIT property is controlling.
> Otherwise, the grant-level option overrides the role-level option.

I haven't spent a ton of time reviewing the patch yet, but I did read
through it and thought it looked pretty good.

> I have some hesitations about this approach. On the positive side, I
> believe it's fully backward compatible, and that's something to like.
> On the negative side, I've always felt that the role-level properties
> - [NO]INHERIT, [NO]CREATEROLE, [NO]SUPERUSER, [NO]CREATEDB were hacky
> kludges, and I'd be happier they all went away in favor of more
> principled ways of controlling those behaviors. I think that the
> previous design moves us in the direction of being able to eventually
> remove [NO]INHERIT, whereas this, if anything, entrenches it.

+1.  As mentioned upthread [0], I think attributes like CREATEROLE,
SUPERUSER, and CREATEDB could be replaced with predefined roles.  However,
since role attributes aren't inherited, that would result in a behavior
change.  I'm curious what you have in mind.  It might be worth spinning off
a new thread for this sooner than later.

> It might even encourage people to add *more* role-level Boolean flags.
> For instance, say we add another grant-level property that controls
> whether or not you can SET ROLE to the granted role. Well, it somebody
> going to then say that, for symmetry with CREATE ROLE .. [NO]INHERIT
> we need CREATE ROLE .. [NO]SETROLE? I think such additions would be
> actively harmful, not only because they add complexity, but also
> because I think they are fundamentally the wrong way to think about
> these kinds of properties.

Perhaps there should just be a policy against adding new role-level
options.  Instead, new functionality should be provided via predefined
roles or grant-level options.  There is some precedent for this.  For
example, VACUUM has several options that are only usable via the
parenthesized syntax, and the unparenthesized syntax is marked deprecated
in the documentation.  (Speaking of which, is it finally time to remove the
unparenthesized syntax or add WARNINGs when it is used?)  For [NO]INHERIT
and WITH INHERIT DEFAULT, presumably we could do something similar.  Down
the road, those would be removed in favor of only using grant-level
options.

[0] https://postgr.es/m/20220602180755.GA2402942%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: EINTR in ftruncate()

2022-07-01 Thread Andres Freund

Hi Chris,

On 2022-07-01 13:29:44 -0700, Andres Freund wrote:
> On 2022-07-01 19:55:16 +0200, Alvaro Herrera wrote:
> > On 2022-Jul-01, Andres Freund wrote:
> > 
> > > On 2022-07-01 17:41:05 +0200, Alvaro Herrera wrote:
> > > > Nicola Contu reported two years ago to pgsql-general[1] that they were
> > > > having sporadic query failures, because EINTR is reported on some system
> > > > call.  I have been told that the problem persists, though it is very
> > > > infrequent.  I propose the attached patch.  Kyotaro proposed a slightly
> > > > different patch which also protects write(), but I think that's not
> > > > necessary.
> > > 
> > > What is the reason for the || ProcDiePending || QueryCancelPending bit? 
> > > What
> > > if there's dsm operations intentionally done while QueryCancelPending?
> > 
> > That mirrors the test for the other block in that function, which was
> > added by 63efab4ca139, whose commit message explains:
> > 
> > Allow DSM allocation to be interrupted.
> > 
> > Chris Travers reported that the startup process can repeatedly try to
> > cancel a backend that is in a posix_fallocate()/EINTR loop and cause it
> > to loop forever.  Teach the retry loop to give up if an interrupt is
> > pending.  Don't actually check for interrupts in that loop though,
> > because a non-local exit would skip some clean-up code in the caller.
> 
> That whole approach seems quite wrong to me. At the absolute very least the
> code needs to check if interrupts are being processed in the current context
> before just giving up due to ProcDiePending || QueryCancelPending.
> 
> I'm very unconvinced this ought to be fixed in dsm_impl_posix_resize(), rather
> than the startup process signalling.
> 
> There is an argument for allowing more things to be cancelled, but we'd need a
> retry loop for the !INTERRUPTS_CAN_BE_PROCESSED() case.

Chris, do you have any additional details about the machine that lead to this
change? OS version, whether it might have been swapping, etc?

I wonder if what happened is that posix_fallocate() used glibc's fallback
implementation because the kernel was old enough to not support fallocate()
for tmpfs.  Looks like support for fallocate() for tmpfs was added in 3.5
([1]). So e.g. a rhel 6 wouldn't have had that.

Greetings,

Andres Freund

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e2d12e22c59ce714008aa5266d769f8568d74eac

Re: [PATCH] Log details for client certificate failures

2022-07-01 Thread Jacob Champion

On Thu, Jun 30, 2022 at 2:54 AM Graham Leggett  wrote:
>
> I added this to httpd a while back:
>
> SSL_CLIENT_CERT_RFC4523_CEA
>
> It would be good to interoperate.

What kind of interoperation did you have in mind? Are there existing
tools that want to scrape this information for observability?

I think the CEA syntax might not be a good fit for this particular
patch: first, we haven't actually verified the certificate, so no one
should be using it to assert certificate equality (and I'm truncating
the Issuer anyway, to avoid letting someone flood the logs). Second,
this is designed to be human-readable rather than machine-readable.

Thanks,
--Jacob

Re: [PATCH] Log details for client certificate failures

2022-07-01 Thread Jacob Champion

On Thu, Jun 30, 2022 at 2:43 AM Peter Eisentraut
 wrote:
>
> On 13.05.22 00:36, Jacob Champion wrote:
> > v2 limits the maximum subject length and adds the serial number to the
> > logs.
>
> I wrote that pg_stat_ssl uses the *issuer* plus serial number to
> identify a certificate.  What your patch shows is the subject and the
> serial number, which isn't the same thing.  Let's get that sorted out
> one way or the other.

Sorry for the misunderstanding! v3 adds the Issuer to the logs as well.

I wanted to clarify that this "issuer" has not actually been verified,
but all I could come up with was "purported issuer" which doesn't read
well to me. "Claimed issuer"? "Alleged issuer"? Thoughts?

> Another point, your patch produces
>
>  LOG:  connection received: host=localhost port=44120
>  LOG:  client certificate verification failed at depth 1: ...
>  DETAIL:  failed certificate had subject ...
>  LOG:  could not accept SSL connection: certificate verify failed
>
> I guess what we really would like is
>
>  LOG:  connection received: host=localhost port=44120
>  LOG:  could not accept SSL connection: certificate verify failed
>  DETAIL:  client certificate verification failed at depth 1: ...
>  failed certificate had subject ...
>
> But I suppose that would be very cumbersome to produce with the callback
> structure provided by OpenSSL?

I was about to say "yes, very cumbersome", but I actually think we
might be able to do that without bubbling the error up through
multiple callback layers, using SSL_set_ex_data() and friends. I'll
take a closer look.

Thanks!
--Jacob
commit dfe12fa0c3b93cf98169da2095e53edd79c936e6
Author: Jacob Champion 
Date:   Fri Jul 1 13:39:34 2022 -0700

squash! Log details for client certificate failures

Add the Issuer to the log.

diff --git a/src/backend/libpq/be-secure-openssl.c 
b/src/backend/libpq/be-secure-openssl.c
index 3ccc23ba62..80b361b105 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -1076,6 +1076,40 @@ dummy_ssl_passwd_cb(char *buf, int size, int rwflag, 
void *userdata)
return 0;
 }
 
+/*
+ * Examines the provided certificate name, and if it's too long to log, 
modifies
+ * and truncates it. The return value is NULL if no truncation was needed; it
+ * otherwise points into the middle of the input string, and should not be
+ * freed.
+ */
+static char *
+truncate_cert_name(char *name)
+{
+   size_t  namelen = strlen(name);
+   char   *truncated;
+
+   /*
+* Common Names are 64 chars max, so for a common case where the CN is 
the
+* last field, we can still print the longest possible CN with a 
7-character
+* prefix (".../CN=[64 chars]"), for a reasonable limit of 71 
characters.
+*/
+#define MAXLEN 71
+
+   if (namelen <= MAXLEN)
+   return NULL;
+
+   /*
+* Keep the end of the name, not the beginning, since the most specific
+* field is likely to give users the most information.
+*/
+   truncated = name + namelen - MAXLEN;
+   truncated[0] = truncated[1] = truncated[2] = '.';
+
+#undef MAXLEN
+
+   return truncated;
+}
+
 /*
  * Certificate verification callback
  *
@@ -1093,8 +1127,8 @@ verify_cb(int ok, X509_STORE_CTX *ctx)
int errcode;
const char *errstring;
X509   *cert;
-   char   *certname = NULL;
-   char   *truncated = NULL;
+   char   *subject = NULL, *issuer = NULL;
+   char   *sub_truncated = NULL, *iss_truncated = NULL;
char   *serialno = NULL;
 
if (ok)
@@ -,33 +1145,18 @@ verify_cb(int ok, X509_STORE_CTX *ctx)
cert = X509_STORE_CTX_get_current_cert(ctx);
if (cert)
{
-   size_t  namelen;
ASN1_INTEGER *sn;
BIGNUM *b;
 
/*
-* Get the Subject for logging, but don't let maliciously huge 
certs
-* flood the logs.
-*
-* Common Names are 64 chars max, so for a common case where 
the CN is
-* the last field, we can still print the longest possible CN 
with a
-* 7-character prefix (".../CN=[64 chars]"), for a reasonable 
limit of
-* 71 characters.
+* Get the Subject and Issuer for logging, but don't let 
maliciously
+* huge certs flood the logs.
 */
-#define MAXLEN 71
-   certname = X509_NAME_to_cstring(X509_get_subject_name(cert));
-   namelen = strlen(certname);
+   subject = X509_NAME_to_cstring(X509_get_subject_name(cert));
+   sub_truncated = truncate_cert_name(subject);
 
-   if (namelen > MAXLEN)
-   {
-   /*
-* Keep the end of the Subject, not the beginning, 
since the most
-

Re: postgres_fdw versus regconfig and similar constants

2022-07-01 Thread Robert Haas

On Mon, May 16, 2022 at 1:33 PM Tom Lane  wrote:
> 0001 deals with the lack-of-schema-qualification issue by forcing
> search_path to be just "pg_catalog" while we're deparsing constants.
> This seems straightforward, if annoyingly expensive, and it's enough
> to fix the bug as presented.

Yeah, that does seem like the thing to do. I doubt it will be the last
problem setting we need to add to that list, either. It's kind of
unfortunate that data type output formatting is context-dependent like
this, but I don't have an idea.

> 0002 tightens deparse.c's rules to only consider an OID alias constant
> as shippable if the object it refers to is shippable.  This seems
> obvious in hindsight; I wonder how come we've not realized it before?
> However, this has one rather nasty problem for regconfig in particular:
> with our standard shippability rules none of the built-in text search
> configurations would be treated as shippable, because initdb gives them
> non-fixed OIDs above .  That seems like a performance hit we don't
> want to take.  In the attached, I hacked around that by making a special
> exception for OIDs up to 16383, but that seems like a fairly ugly kluge.
> Anybody have a better idea?

No. It feels to me like there are not likely to be any really
satisfying answers here. We have a way of mapping a given local table
to a given foreign table, but to the best of my knowledge we have no
similar mechanism for any other type of object. So it's just crude
guesswork. Who is to say whether the fact that we have a local text
search configuration means that there is a remote text search
configuration with the same name, and even if yes, that it has the
same semantics? And similarly for any other object types? Every
release adds and occasionally removes SQL objects from the system
catalogs, and depending on the object type, it can also vary by
operating system. There are several multiple forks of PostgreSQL, too.

> While using find_expr_references() as a reference for writing the new code
> in 0002, I was dismayed to find that it omitted handling regcollation;
> and a quick search showed that other places that specially process REG*
> types hadn't been updated for that addition either.  0003 closes those
> oversights.

Makes sense.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Making Vars outer-join aware

2022-07-01 Thread Tom Lane

"Finnerty, Jim"  writes:
> Given that views are represented in a parsed representation, does 
> anything need to happen to the Vars inside a view when that view is 
> outer-joined to?  

No.  The markings only refer to what is in the same Query tree as the Var
itself.

Subquery flattening during planning does deal with this: if we pull up a
subquery (possibly inserted from a view) that was underneath an outer
join, the nullingrel marks on the upper-level Vars referring to subquery
outputs will get merged into what is pulled up, either by unioning the
varnullingrel bitmaps if what is pulled up is just a Var, or if what is
pulled up isn't a Var, by wrapping it in a PlaceHolderVar that carries
the old outer Var's markings.  We had essentially this same behavior
with PlaceHolderVars before, but I think this way makes it a lot more
principled and intelligible (and I suspect there are now cases where we
manage to avoid inserting unnecessary PlaceHolderVars that the old code
couldn't avoid).

> If an outer join is converted to an inner join, must this information get 
> propagated to all the affected Vars, potentially across query block levels?

Yes.  The code is there in the patch to run around and remove nullingrel
bits from affected Vars.

One thing that doesn't happen (and didn't before, so this is not a
regression) is that if we strength-reduce a FULL JOIN USING to an outer
or plain join, it'd be nice if the "COALESCE" hack we represent the
merged USING column with could be replaced with the same lower-relation
Var that the parser would have used if the join weren't FULL to begin
with.  Without that, we're leaving optimization opportunities on the
table.  I'm hesitant to try to do that though as long as the COALESCE
structures look exactly like something a user could write.  It'd be
safer if we used some bespoke node structure for this purpose ...
but nobody's bothered to invent that.

regards, tom lane

Re: EINTR in ftruncate()

2022-07-01 Thread Andres Freund

Hi,

On 2022-07-01 19:55:16 +0200, Alvaro Herrera wrote:
> On 2022-Jul-01, Andres Freund wrote:
> 
> > On 2022-07-01 17:41:05 +0200, Alvaro Herrera wrote:
> > > Nicola Contu reported two years ago to pgsql-general[1] that they were
> > > having sporadic query failures, because EINTR is reported on some system
> > > call.  I have been told that the problem persists, though it is very
> > > infrequent.  I propose the attached patch.  Kyotaro proposed a slightly
> > > different patch which also protects write(), but I think that's not
> > > necessary.
> > 
> > What is the reason for the || ProcDiePending || QueryCancelPending bit? What
> > if there's dsm operations intentionally done while QueryCancelPending?
> 
> That mirrors the test for the other block in that function, which was
> added by 63efab4ca139, whose commit message explains:
> 
> Allow DSM allocation to be interrupted.
> 
> Chris Travers reported that the startup process can repeatedly try to
> cancel a backend that is in a posix_fallocate()/EINTR loop and cause it
> to loop forever.  Teach the retry loop to give up if an interrupt is
> pending.  Don't actually check for interrupts in that loop though,
> because a non-local exit would skip some clean-up code in the caller.

That whole approach seems quite wrong to me. At the absolute very least the
code needs to check if interrupts are being processed in the current context
before just giving up due to ProcDiePending || QueryCancelPending.

I'm very unconvinced this ought to be fixed in dsm_impl_posix_resize(), rather
than the startup process signalling.

There is an argument for allowing more things to be cancelled, but we'd need a
retry loop for the !INTERRUPTS_CAN_BE_PROCESSED() case.

Greetings,

Andres Freund

Re: Making Vars outer-join aware

2022-07-01 Thread Finnerty, Jim

Tom, two quick questions before attempting to read the patch:

Given that views are represented in a parsed representation, does anything 
need to happen to the Vars inside a view when that view is outer-joined to?  

If an outer join is converted to an inner join, must this information get 
propagated to all the affected Vars, potentially across query block levels?

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

2022-07-01 Thread Andres Freund

Hi,

On 2022-06-21 19:33:01 -0700, Andres Freund wrote:
> On 2022-06-21 17:22:05 +1200, Thomas Munro wrote:
> > Problem: I saw 031_recovery_conflict.pl time out while waiting for a
> > buffer pin conflict, but so far once only, on CI:
> >
> > https://cirrus-ci.com/task/5956804860444672
> >
> > timed out waiting for match: (?^:User was holding shared buffer pin
> > for too long) at t/031_recovery_conflict.pl line 367.
> >
> > Hrmph.  Still trying to reproduce that, which may be a bug in this
> > patch, a bug in the test or a pre-existing problem.  Note that
> > recovery didn't say something like:
> >
> > 2022-06-21 17:05:40.931 NZST [57674] LOG:  recovery still waiting
> > after 11.197 ms: recovery conflict on buffer pin
> >
> > (That's what I'd expect to see in
> > https://api.cirrus-ci.com/v1/artifact/task/5956804860444672/log/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log
> > if the startup process had decided to send the signal).
> >
> > ... so it seems like the problem in that run is upstream of the interrupt 
> > stuff.
>
> Odd. The only theory I have so far is that the manual vacuum on the primary
> somehow decided to skip the page, and thus didn't trigger a conflict. Because
> clearly replay progressed past the records of the VACUUM. Perhaps we should
> use VACUUM VERBOSE? In contrast to pg_regress tests that should be
> unproblematic?

I saw a couple failures of 031 on CI for the meson patch - which obviously
doesn't change anything around this. However it adds a lot more distributions,
and the added ones run in docker containers on a shared host, presumably
adding a lot of noise.  I saw this more frequently when I accidentally had the
test runs at the number of CPUs in the host, rather than the allotted CPUs in
the container.

That made me look more into these issues. I played with adding a pg_usleep()
to RecoveryConflictInterrupt() to simulate slow signal delivery.

Found a couple things:

- the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can
  completely swamp the target(s) on a busy system. This surely is exascerbated
  by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling pace
  does seem like a bad idea.

- we process the same recovery conflict (not a single signal, but a single
  "real conflict") multiple times in the target of a conflict, presumably
  while handling the error. That includes handling the same interrupt once as
  an ERROR and once as FATAL.

  E.g.

2022-07-01 12:19:14.428 PDT [2000572] LOG:  recovery still waiting after 10.032 
ms: recovery conflict on buffer pin
2022-07-01 12:19:14.428 PDT [2000572] CONTEXT:  WAL redo at 0/344E098 for 
Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 100; blkref #0: rel 
1663/16385/1>
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl ERROR:  
canceling statement due to conflict with recovery at character 15
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl DETAIL:  User 
transaction caused buffer deadlock with recovery.
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl STATEMENT:  
SELECT * FROM test_recovery_conflict_table2;
2022-07-01 12:19:54.778 PDT [2000572] LOG:  recovery finished waiting after 
40359.937 ms: recovery conflict on buffer pin
2022-07-01 12:19:54.778 PDT [2000572] CONTEXT:  WAL redo at 0/344E098 for 
Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 100; blkref #0: rel 
1663/16385/1>
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl FATAL:  
terminating connection due to conflict with recovery
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl DETAIL:  User 
transaction caused buffer deadlock with recovery.
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl HINT:  In a 
moment you should be able to reconnect to the database and repeat your command.
2022-07-01 12:19:54.837 PDT [2001389] 031_recovery_conflict.pl LOG:  statement: 
SELECT 1;

  note that the startup process considers the conflict resolved by the time
  the backend handles the interrupt.

  I also see cases where a FATAL is repeated:

2022-07-01 12:43:18.190 PDT [2054721] LOG:  recovery still waiting after 15.410 
ms: recovery conflict on snapshot
2022-07-01 12:43:18.190 PDT [2054721] DETAIL:  Conflicting process: 2054753.
2022-07-01 12:43:18.190 PDT [2054721] CONTEXT:  WAL redo at 0/344AB90 for 
Heap2/PRUNE: latestRemovedXid 732 nredirected 18 ndead 0; blkref #0: rel 
1663/16385/>
2054753: reporting recovery conflict 9
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl FATAL:  
terminating connection due to conflict with recovery
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl DETAIL:  User 
query might have needed to see row versions that must be removed.
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl HINT:  In a 
moment you should be able to reconnect to the database and repeat your command.
...
2054753: reporting recovery conflict 9
2022-07-01 12:43:19.068 PDT [2054753]

Re: Minimal logical decoding on standbys

2022-07-01 Thread Ibrar Ahmed

On Thu, Jun 30, 2022 at 1:49 PM Drouvot, Bertrand 
wrote:

> Hi,
>
> On 2/25/22 10:34 AM, Drouvot, Bertrand wrote:
> > Hi,
> >
> > On 10/28/21 11:07 PM, Andres Freund wrote:
> >> Hi,
> >>
> >> On 2021-10-28 16:24:22 -0400, Robert Haas wrote:
> >>> On Wed, Oct 27, 2021 at 2:56 AM Drouvot, Bertrand
> >>>  wrote:
>  So you have in mind to check for XLogLogicalInfoActive() first, and
>  if true, then open the relation and call
>  RelationIsAccessibleInLogicalDecoding()?
> >>> I think 0001 is utterly unacceptable. We cannot add calls to
> >>> table_open() in low-level functions like this. Suppose for example
> >>> that _bt_getbuf() calls _bt_log_reuse_page() which with 0001 applied
> >>> would call get_rel_logical_catalog(). _bt_getbuf() will have acquired
> >>> a buffer lock on the page. The idea that it's safe to call
> >>> table_open() while holding a buffer lock cannot be taken seriously.
> >> Yes - that's pretty clearly a deadlock hazard. It shouldn't too hard
> >> to fix, I
> >> think. Possibly a bit more verbose than nice, but...
> >>
> >> Alternatively we could propagate the information whether a relcache
> >> entry is
> >> for a catalog from the table to the index. Then we'd not need to
> >> change the
> >> btree code to pass the table down.
> >
> > +1 for the idea of propagating to the index. If that sounds good to
> > you too, I can try to have a look at it.
> >
> > Thanks Robert and Andres for the feedbacks you have done on the
> > various sub-patches.
> >
> > I've now in mind to work sub patch by sub patch (starting with 0001
> > then) and move to the next one once we agree that the current one is
> > "ready".
> >
> > I think that could help us to get this new feature moving forward more
> > "easily", what do you think?
> >
> > Thanks
> >
> > Bertrand
> >
> I'm going to re-create a CF entry for it, as:
>
> - It seems there is a clear interest for the feature (given the time
> already spend on it and the number of people that worked on)
>
> - I've in mind to resume working on it
>
> I have already done some research on that, I can definitely look at it.


> - It would give more visibility in case others want to jump in
>
> Hope that makes sense,
>
> Thanks,
>
> Bertrand
>
>

-- 
Ibrar Ahmed

Re: ccache, MSVC, and meson

2022-07-01 Thread Justin Pryzby

On Tue, May 24, 2022 at 01:30:39PM -0700, Andres Freund wrote:
> On 2022-05-24 14:52:02 -0500, Justin Pryzby wrote:
> > > The spurious message should be fixed, of course. I suspect you dont need a
> > > wrapper, you can just set CC='ccache cl.exe' or similar? Afaics it's not
> > > meaningful to do 'CC=ccache.exe' alone, because then it'll interpret 
> > > arguments
> > > as ccache options, rather than compiler options.
> > 
> > if meson didn't crash CC=ccache.exe might have worked, because I had set
> > CCACHE_COMPILER.
> 
> Did you report the issue? Should be simple enough to fix.
> 
> I seriously doubt it's a good idea to use CCACHE_COMPILER - there's no way
> meson (or autoconf or ..) can rely on the results of compiler tests that way,
> since CCACHE_COMPILER can change at any time.

This updated patch doesn't use CCACHE_COMPILER.

cache misses are several times slower (12 minute build time vs 2:30 with ninja,
without ccache), so it's possible that can be *slower* if the hit ratio is
inadequate.  ninja on cirrus builds 3x faster with ccache, but msbuild is only
~2x faster, so I recommend using it only with ninja.

There's still warts requires using "plain" with /Z7 /MDd.
>From 1c3823095eb9300fd0aae1fb569a3cc09b1346cc Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 22 May 2022 11:55:02 -0500
Subject: [PATCH] cirrus/windows: ccache

https://community.chocolatey.org/packages/ccache

https://www.postgresql.org/message-id/flat/20220522232606.GZ19626%40telsasoft.com
20220321001747.5o2zfoeqxvbjm...@alap3.anarazel.de
20211012083721.hvixq4pnh2pixr3j%40alap3.anarazel.de
CA%2BhUKGLs6ENKL4w1o%2B1RpcS4VLyLXSUMWRYJVaiLBOJtarW25g%40mail.gmail.com

https://cirrus-ci.com/task/5456827583299584
https://cirrus-ci.com/task/6019777536720896

ci-os-only: xindows, windows-meson-ninja, xindows-meson-msbuild
---
 .cirrus.yml | 34 +++---
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index bf20f85f8ce..ed9e86557d7 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -814,6 +814,9 @@ windows_template: _template
 # currently have a tool for that...
 CIRRUS_ESCAPING_PROCESSES: 1
 
+CCACHE_DIR: $CIRRUS_WORKING_DIR/.ccache
+CCACHE_LOGFILE: ccache.log
+
   only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*windows.*'
 
   windows_container:
@@ -828,12 +831,14 @@ windows_template: _template
 set
 
   setup_additional_packages_script: |
-REM choco install -y --no-progress ...
-
+choco install -y --no-progress ccache --version 4.6.1
+cp c:\ProgramData\chocolatey\lib\ccache\tools\ccache-4.6.1-windows-x86_64\ccache.exe \BuildTools\VC\Auxiliary\Build
+cp c:\ProgramData\chocolatey\lib\ccache\tools\ccache-4.6.1-windows-x86_64\ccache.exe \ProgramData\chocolatey\bin\cl.exe
 
 task:
   <<: *windows_template
   name: Windows - Server 2019, VS 2019 - Homegrown
+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*windows-mkvcbuild'
 
   configure_script:
 # copy errors out when using forward slashes
@@ -901,14 +906,22 @@ task:
 - mkdir subprojects
 - meson wrap install lz4
 - meson wrap install zlib
-- meson setup --buildtype debug --backend ninja  -Dcassert=true -Db_pch=true -Dssl=openssl -Dlz4=enabled -Dzlib=enabled -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=c:/windows/system32/tar.exe build
+- set CC=ccache.exe cl.exe
+- meson setup --buildtype plain --backend ninja  -Dcassert=true -Db_pch=false -Dssl=openssl -Dlz4=enabled -Dzlib=enabled -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=c:/windows/system32/tar.exe -Dc_args="/Z7 /MDd" build
+
+  ccache_cache:
+folder: $CCACHE_DIR
 
   build_script:
+- ccache.exe --zero-stats
 - vcvarsall x64
 - ninja -C build
+- ccache.exe --show-stats
+
+  always:
+upload_caches: ccache
 
   check_world_script:
-- vcvarsall x64
 - meson test --no-rebuild -C build
 
   on_failure:
@@ -934,14 +947,21 @@ task:
 - mkdir subprojects
 - meson wrap install lz4
 - meson wrap install zlib
-- meson setup --buildtype debug --backend vs -Dcassert=true -Db_pch=true -Dssl=openssl -Dlz4=enabled -Dzlib=enabled -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=c:/windows/system32/tar.exe build
+- meson setup --buildtype plain --backend vs -Dcassert=true -Db_pch=false -Dssl=openssl -Dlz4=enabled -Dzlib=enabled -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=c:/windows/system32/tar.exe -Dc_args=/Z7 build
+
+  ccache_cache:
+folder: $CCACHE_DIR
 
   build_script:
+- ccache.exe --zero-stats
 - vcvarsall x64
-- msbuild /p:UseMultiToolTask=true %MSBFLAGS% build\postgresql.sln
+- msbuild /p:UseMultiToolTask=true /p:CLToolPath=c:\ProgramData\chocolatey\bin %MSBFLAGS%

Re: [RFC] building postgres with meson -v9

2022-07-01 Thread Andres Freund

Hi,

On 2022-07-01 14:01:11 -0500, Justin Pryzby wrote:
> vcvarsall isn't needed in cirrus' "check_world" scripts.

E.g. for the ecpg tests it isn't, except that we don't currently build ecpg on
windows. But I plan to fix that.

> I'm missing any way to re/run cirrus only for msbuild OR ninja OR homegrown
> with something more granular than "ci-os-only: windows".  (The same thing
> applies to the mingw patch).

Not sure that's really worth adding - I don't forsee merging all those
tasks. But I'm open to proposals.

> I'll mail shortly about ccache.

There's a meson PR that should fix some of the issues, need to test it...

Greetings,

Andres Freund

Re: [RFC] building postgres with meson -v9

2022-07-01 Thread Justin Pryzby

vcvarsall isn't needed in cirrus' "check_world" scripts.

I'm missing any way to re/run cirrus only for msbuild OR ninja OR homegrown
with something more granular than "ci-os-only: windows".  (The same thing
applies to the mingw patch).

I'll mail shortly about ccache.

-- 
Justin

Re: replacing role-level NOINHERIT with a grant-level option

2022-07-01 Thread Robert Haas

On Fri, Jul 1, 2022 at 9:05 AM Robert Haas  wrote:
> > So now A has implicit inherited privs for t1 but not for t2.
>
> Yeah, I anticipate that this would work in the way that you postulate here.

And here is a patch that makes it work that way. In this version, you
can GRANT foo TO bar WITH INHERIT { TRUE | FALSE | DEFAULT }, and
DEFAULT means that role-level [NO]INHERIT property is controlling.
Otherwise, the grant-level option overrides the role-level option.

I have some hesitations about this approach. On the positive side, I
believe it's fully backward compatible, and that's something to like.
On the negative side, I've always felt that the role-level properties
- [NO]INHERIT, [NO]CREATEROLE, [NO]SUPERUSER, [NO]CREATEDB were hacky
kludges, and I'd be happier they all went away in favor of more
principled ways of controlling those behaviors. I think that the
previous design moves us in the direction of being able to eventually
remove [NO]INHERIT, whereas this, if anything, entrenches it.

It might even encourage people to add *more* role-level Boolean flags.
For instance, say we add another grant-level property that controls
whether or not you can SET ROLE to the granted role. Well, it somebody
going to then say that, for symmetry with CREATE ROLE .. [NO]INHERIT
we need CREATE ROLE .. [NO]SETROLE? I think such additions would be
actively harmful, not only because they add complexity, but also
because I think they are fundamentally the wrong way to think about
these kinds of properties.

To me, setting a default for whether or not role grants have INHERIT
default by default is a bit like setting a default for the size of
your future credit card transactions. "If I use my credit card and
don't say how much I want to charge, make it $6.82!" This is obviously
nonsense, at least in normal scenarios, because the amount of a credit
card transaction depends fundamentally on the purpose of the credit
card transaction, and you cannot know what the right amount to charge
in future transactions will be unless you already know the purpose of
those transactions. So here: we probably cannot say anything about the
purpose of a future GRANT based on the identity of the role that is
receiving the privileges. You could have a special purpose role that
receives many grants but always with the same general purpose, just as
you could have a credit card holder that only ever buys things that
cost $6.82, and either case, a default could I guess be useful. But
those seem to be corner cases, and even then the benefit of being able
to set the default seems to be modest.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v2-0001-Add-a-grant-level-INHERIT-option-to-override-the-.patch
Description: Binary data

Re: oat_post_create expected behavior

2022-07-01 Thread Jeff Davis

On Mon, 2022-06-06 at 17:11 -0400, Tom Lane wrote:
> Right, same thing I'm saying.  I also think we should discourage
> people from doing cowboy CCIs inside their OAT hooks, because that
> makes the testability problem even worse.  Maybe that means we
> need to uniformly move the CREATE hooks to after a system-provided
> CCI, but I've not thought hard about the implications of that.

Uniformly moving the post-create hooks after CCI might not be as
convenient as I thought at first. Many extensions using post-create
hooks will also want to use post-alter hooks, and it would be difficult
to reuse extension code between those two hooks. It's probably better
to just always specify the snapshot unless you're sure you won't need a
post-alter hook.

It would be nice if it was easier to enforce that these hooks do the
right thing, though.

Regards,
Jeff Davis

Re: First draft of the PG 15 release notes

2022-07-01 Thread Bruce Momjian

On Wed, Jun 29, 2022 at 10:08:08PM -0700, Noah Misch wrote:
> On Tue, Jun 28, 2022 at 04:35:45PM -0400, Bruce Momjian wrote:
> > > If you dump/reload an unmodified v14 template1 (as pg_dumpall and 
> > > pg_upgrade
> > > do), your v15 template1 will have a v14 ACL on its public schema.  At that
> > > point, the fate of "newly-created databases in existing clusters" depends 
> > > on
> > > whether you clone template1 or template0.  Does any of that detail belong
> > > here, or does the existing text suffice?
> > 
> > I think it is very confusing to have template0 have one value and
> > template1 have a different one, but as I understand it template0 will
> > only be used for pg_dump comparison, and that will keep template1 with
> > the same permissions, so I guess it is okay.
> 
> It's an emergent property of two decisions.  In the interest of backward
> compatibility, I decided to have v15 pg_dump emit GRANT for the public schema
> even when the source is an unmodified v14- database.  When that combines with
> the ancient decision that a pg_dumpall or pg_upgrade covers template1 but not
> template0, one gets the above consequences.  I don't see a way to improve on
> this outcome.

Thanks for the summary.

> > > >   permissions on the public schema has not
> > > >   been changed.  Databases restored from previous Postgres releases
> > > >   will be restored with their current permissions.  Users wishing
> > > >   to have the old permissions on new objects will need to grant
> > > 
> > > The phrase "old permissions on new objects" doesn't sound right to me, 
> > > but I'm
> > > not sure why.  I think you're aiming for the fact that this is just a 
> > > default;
> > > one can still change the ACL to anything, including to the old default.  
> > > If
> > > these notes are going to mention the old default like they do so far, I 
> > > think
> > > they should also urge readers to understand
> > > https://www.postgresql.org/docs/devel/ddl-schemas.html#DDL-SCHEMAS-PATTERNS
> > > before returning to the old default.  What do you think?
> > 
> > Agreed, the new text is:
> > 
> > Users wishing to have the former permissions will need to grant
> > CREATE permission for PUBLIC on
> > the public schema; this change can be made on
> > template1 to cause all new databases to have these
> > permissions.
> 
> What do you think about the "should also urge readers ..." part of my message?

I see your point, that there is no indication of why you might not want
to restore the old permissions.  I created the attached patch which
makes two additions to clarify this.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/release-15.sgml b/doc/src/sgml/release-15.sgml
index ed7cb89b03..47ac329e79 100644
--- a/doc/src/sgml/release-15.sgml
+++ b/doc/src/sgml/release-15.sgml
@@ -63,12 +63,11 @@ Author: Noah Misch 
   permissions on the public schema has not
   been changed.  Databases restored from previous Postgres releases
   will be restored with their current permissions.  Users wishing
-  to have the former more-open permissions will need to grant
+  to have the former permissions will need to grant
   CREATE permission for PUBLIC
   on the public schema; this change can be made
   on template1 to cause all new databases
-  to have these permissions.  This change was made to increase
-  security.
+  to have these permissions.

Re: EINTR in ftruncate()

2022-07-01 Thread Alvaro Herrera

On 2022-Jul-01, Andres Freund wrote:

> On 2022-07-01 17:41:05 +0200, Alvaro Herrera wrote:
> > Nicola Contu reported two years ago to pgsql-general[1] that they were
> > having sporadic query failures, because EINTR is reported on some system
> > call.  I have been told that the problem persists, though it is very
> > infrequent.  I propose the attached patch.  Kyotaro proposed a slightly
> > different patch which also protects write(), but I think that's not
> > necessary.
> 
> What is the reason for the || ProcDiePending || QueryCancelPending bit? What
> if there's dsm operations intentionally done while QueryCancelPending?

That mirrors the test for the other block in that function, which was
added by 63efab4ca139, whose commit message explains:

Allow DSM allocation to be interrupted.

Chris Travers reported that the startup process can repeatedly try to
cancel a backend that is in a posix_fallocate()/EINTR loop and cause it
to loop forever.  Teach the retry loop to give up if an interrupt is
pending.  Don't actually check for interrupts in that loop though,
because a non-local exit would skip some clean-up code in the caller.

Thanks for looking!

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Issue with pg_stat_subscription_stats

2022-07-01 Thread Andres Freund

Hi,

On 2022-07-01 16:08:48 +0900, Masahiko Sawada wrote:
> Yes, my point is that it may be misleading that the subscription stats
> are created when a subscription is created.

I think it's important to create stats at that time, because otherwise it's
basically impossible to ensure that stats are dropped when a transaction rolls
back. If some / all columns should return something else before stats are
reported that can be addressed easily by tracking that in a separate field.


> On the other hand, I'm not sure we agreed that the behavior that
> Melanie reported is not a problem. The user might get confused since
> the subscription stats works differently than other stats when a
> reset. Previously, the primary reason why I hesitated to create the
> subscription stats when creating a subscription is that CREATE
> SUBSCRIPTION (with create_slot = false) can be rolled back. But with
> the shmem stats, we can easily resolve it by using
> pgstat_create_transactional().

Yep.


> > > The second problem is that the following code in DropSubscription()
> > > should be updated:
> > >
> > > /*
> > >  * Tell the cumulative stats system that the subscription is getting
> > >  * dropped. We can safely report dropping the subscription statistics 
> > > here
> > >  * if the subscription is associated with a replication slot since we
> > >  * cannot run DROP SUBSCRIPTION inside a transaction block.  
> > > Subscription
> > >  * statistics will be removed later by (auto)vacuum either if it's not
> > >  * associated with a replication slot or if the message for dropping 
> > > the
> > >  * subscription gets lost.
> > >  */
> > > if (slotname)
> > > pgstat_drop_subscription(subid);
> > >
> > > I think we can call pgstat_drop_subscription() even if slotname is
> > > NULL and need to update the comment.
> > >
> >
> > +1.
> >
> > > IIUC autovacuum is no longer
> > > responsible for garbage collection.
> > >
> >
> > Right, this is my understanding as well.
> 
> Thank you for the confirmation.

Want to propose a patch?

Greetings,

Andres Freund

Re: Issue with pg_stat_subscription_stats

2022-07-01 Thread Andres Freund

Hi,

On 2022-07-01 10:41:55 +0900, Masahiko Sawada wrote:
> While looking at this issue again, I realized there seems to be two
> problems with subscription stats on shmem stats:
>
> Firstly, we call pgstat_create_subscription() when creating a
> subscription but the subscription stats are reported by apply workers.

Why is it relevant where the stats are reported?


> And pgstat_create_subscription() just calls
> pgstat_create_transactional():
>
> void
> pgstat_create_subscription(Oid subid)
> {
> pgstat_create_transactional(PGSTAT_KIND_SUBSCRIPTION,
> InvalidOid, subid);
> }
>
> I guess calling pgstat_create_subscription() is not necessary for the
> current usage.

It ensures that the stats are dropped if the subscription fails to be created
partway through / the transaction is aborted. There's probably no way for that
to happen today, but it still seems the right thing.


> On the other hand, if we create the subscription stats
> there we can resolve the issue Melanie reported in this thread.

I am confused what the place of creation addresses?


> The second problem is that the following code in DropSubscription()
> should be updated:
>
> /*
>  * Tell the cumulative stats system that the subscription is getting
>  * dropped. We can safely report dropping the subscription statistics here
>  * if the subscription is associated with a replication slot since we
>  * cannot run DROP SUBSCRIPTION inside a transaction block.  Subscription
>  * statistics will be removed later by (auto)vacuum either if it's not
>  * associated with a replication slot or if the message for dropping the
>  * subscription gets lost.
>  */
> if (slotname)
> pgstat_drop_subscription(subid);
>
> I think we can call pgstat_drop_subscription() even if slotname is
> NULL and need to update the comment. IIUC autovacuum is no longer
> responsible for garbage collection.

Yep, that needs to be updated.

Greetings,

Andres Freund

Re: EINTR in ftruncate()

2022-07-01 Thread Andres Freund

Hi,

On 2022-07-01 17:41:05 +0200, Alvaro Herrera wrote:
> Nicola Contu reported two years ago to pgsql-general[1] that they were
> having sporadic query failures, because EINTR is reported on some system
> call.  I have been told that the problem persists, though it is very
> infrequent.  I propose the attached patch.  Kyotaro proposed a slightly
> different patch which also protects write(), but I think that's not
> necessary.

What is the reason for the || ProcDiePending || QueryCancelPending bit? What
if there's dsm operations intentionally done while QueryCancelPending?

Greetings,

Andres Freund

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2022-07-01 Thread Andres Freund

Hi,

On 2022-07-01 01:23:01 -0700, Lukas Fittl wrote:
> On Fri, Jun 12, 2020 at 4:28 PM Andres Freund  wrote:
> 
> > The attached patches are really just a prototype. I'm also not really
> > planning to work on getting this into a "production ready" patchset
> > anytime soon. I developed it primarily because I found it the overhead
> > made it too hard to nail down in which part of a query tree performance
> > changed.  If somebody else wants to continue from here...
> >
> > I do think it's be a pretty significant improvement if we could reduce
> > the timing overhead of EXPLAIN ANALYZE by this much. Even if requires a
> > bunch of low-level code.
> >
> 
> Based on an off-list conversation with Andres, I decided to dust off this
> old
> patch for using rdtsc directly. The significant EXPLAIN ANALYZE performance
> improvements (especially when using rdtsc instead of rdtsc*p*) seem to
> warrant
> giving this a more thorough look.
> 
> See attached an updated patch (adding it to the July commitfest), with a few
> changes:
> 
> - Keep using clock_gettime() as a fallback if we decide to not use rdtsc

Yep.


> - Fallback to /proc/cpuinfo for clock frequency, if cpuid(0x16) doesn't work

I suspect that this might not be needed anymore. Seems like it'd be ok to just
fall back to clock_gettime() in that case.


> - In an abundance of caution, for now I've decided to only enable this if we
>   are on Linux/x86, and the current kernel clocksource is TSC (the kernel
> has
>   quite sophisticated logic around making this decision, see [1])

I think our requirements are a bit lower than the kernel's - we're not
tracking wall clock over an extended period...


> Note that if we implemented the decision logic ourselves (instead of relying
> on the Linux kernel), I'd be most worried about older virtualization
> technology. In my understanding getting this right is notably more
> complicated
> than just checking cpuid, see [2].


> Known WIP problems with this patch version:
> 
> * There appears to be a timing discrepancy I haven't yet worked out, where
>   the \timing data reported by psql doesn't match what EXPLAIN ANALYZE is
>   reporting. With Andres' earlier test case, I'm seeing a consistent ~700ms
>   higher for \timing than for the EXPLAIN ANALYZE time reported on the
> server
>   side, only when rdtsc measurement is used -- its likely there is a problem
>   somewhere with how we perform the cycles to time conversion

Could you explain a bit more what you're seeing? I just tested your patches
and didn't see that here.


> * Possibly related, the floating point handling for the cycles_to_sec
> variable
>   is problematic in terms of precision (see FIXME, taken over from Andres'
> POC)

And probably also performance...


> Open questions from me:
> 
> 1) Do we need to account for different TSC offsets on different CPUs in SMP
>systems? (the Linux kernel certainly has logic to that extent, but [3]
>suggests this is no longer a problem on Nehalem and newer chips, i.e.
> those
>having an invariant TSC)

I don't think we should cater to systems where we need that.


> 2) Should we have a setting "--with-tsc" for configure? (instead of always
>enabling it when on Linux/x86 with a TSC clocksource)

Probably not worth it.


> 3) Are there cases where we actually want to use rdtsc*p*? (i.e. wait for
>current instructions to finish -- the prior discussion seemed to suggest
>we don't want it for node instruction measurements, but possibly we do
> want
>this in other cases?)

I was wondering about that too... Perhaps we should add a
INSTR_TIME_SET_CURRENT_BARRIER() or such?


> 4) Should we support using the "mrs" instruction on ARM? (which is similar
> to
>rdtsc, see [4])

I'd leave that for later personally.



>  #define NS_PER_S INT64CONST(10)
>  #define US_PER_S INT64CONST(100)
>  #define MS_PER_S INT64CONST(1000)
> @@ -95,17 +104,37 @@ typedef int64 instr_time;
>  
>  #define INSTR_TIME_SET_ZERO(t)   ((t) = 0)
>  
> -static inline instr_time pg_clock_gettime_ns(void)
> +extern double cycles_to_sec;
> +
> +bool use_rdtsc;

This should be extern and inside the ifdef below.


> +#if defined(__x86_64__) && defined(__linux__)
> +extern void pg_clock_gettime_initialize_rdtsc(void);
> +#endif
> +
> +static inline instr_time pg_clock_gettime_ref_cycles(void)
>  {
>   struct timespec tmp;
>  
> +#if defined(__x86_64__) && defined(__linux__)
> + if (use_rdtsc)
> + return __rdtsc();
> +#endif
> +
>   clock_gettime(PG_INSTR_CLOCK, );
>  
>   return tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
>  }
>  

Greetings,

Andres Freund

Re: real/float example for testlibpq3

2022-07-01 Thread Mark Wong

On Thu, Jun 16, 2022 at 03:41:50PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Tue, Jun 14, 2022 at 1:40 PM Mark Wong  wrote:
> >> I've created a function for each data type with the idea that an example
> >> for handling a specific data type can be more easily reviewed by looking
> >> in a single place.
> >> I've added examples for REAL, TIMESTAMP WITHOUT TIME ZONE, and BOOLEAN
> >> to try to illustrate how testlibpq3.sql and testlibpq3.c will grow if
> >> this is a good way to go.
> 
> > I'm not sure that we want to let these test programs grow very large.
> > In particular, I don't think we should let ourselves get sucked into
> > adding an example for every data type under the sun -- if that's
> > wanted, the solution is perhaps to add documentation for the binary
> > formats, not hide impromptu documentation inside a test program.  But
> > doing this much seems OK to me.
> 
> Yeah, "hiding impromptu documentation inside a test program" is what
> this looks like, and I'm not sure that's a reasonable way to go.
> 
> (1) Who's going to think to look in src/test/examples/testlibpq3.c for
> documentation of binary formats?
> 
> (2) The useful details are likely to get buried in notational and
> portability concerns, as I think your build failure illustrates.
> 
> (3) I bet few if any packagers install these files, so that the new
> info would be unavailable to many people.
> 
> I think some new appendix in the main SGML docs would be the appropriate
> place if we want to provide real documentation.

Just wanted to do another quick check in before I go too far.  How do
does this start look?  Extending the libpq section with a code snippet
and description per data type?

Regards,
Mark
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 37ec3cb4e5..44c60223ee 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -8874,6 +8874,109 @@ int PQisthreadsafe();
   
  
 
+ 
+  Binary Format
+  
+Various libpq functions support sending or
+receiving data in binary format.  Data in binary format requires the
+application to be handle any differences between the system and the network
+byte order.
+  
+
+  
+ demonstrates how to write a complete
+program that uses libpq functions that request
+data in binary foramt.  This example shows how to handle only a few data
+types.  This section will detail the handling of as many of the data types
+as possible.  See  for descriptions of each of 
the
+built-in data types.
+  
+
+  
+   boolean
+
+   A boolean is transmitted as single byte that, when cast to
+   an int, will be 0 for
+   false and 1 for
+   true.
+
+
+
+
+  
+
+  
+   real
+
+  A real is composed of 4 bytes and needs to be handled
+  correctly for byte order.
+
+
+
+
+  
+
+  
+   timestamp without time zone
+
+  A timestamp without time zone is a 64-bit data type
+  representing the number of microseconds since January 1, 2000.  It can be
+  converted into a broken-down time representation by converting the time
+  into seconds and saving the microseconds elsewhere.
+
+
+  Note that in C time is counted from January 1, 1970 so this discrepency
+  needs to be accounted for in addition to handling the network byte order.
+
+
+
+
+  
+
+ 
 
  
   Building libpq Programs

Re: [PATCH] Fix pg_upgrade test from v10

2022-07-01 Thread Justin Pryzby

Would you add this to to the (next) CF ?

It's silly to say that v9.2 will be supported potentially for a handful more
years, but that the upgrade-testing script itself doesn't support that, so
developers each have to reinvent its fixups.

See also 20220122183749.go23...@telsasoft.com, where I proposed some of the
same things.

-- 
Justin

Re: First draft of the PG 15 release notes

2022-07-01 Thread Peter Geoghegan

On Fri, Jul 1, 2022 at 9:41 AM Bruce Momjian  wrote:
> > It might be worth explaining the shift directly in the release notes.
> > The new approach is simpler and makes a lot more sense -- why should
> > the relfrozenxid be closely tied to freezing? We don't necessarily
>
> I don't think this is an appropriate detail for the release notes.

Okay. What about saying something about relminmxid advancement where
the database consumes lots of multixacts?

-- 
Peter Geoghegan

Re: Support logical replication of DDLs

2022-07-01 Thread vignesh C

On Wed, Jun 29, 2022 at 3:25 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, June 29, 2022 11:07 AM Amit Kapila  
> wrote:
> >
> > On Tue, Jun 28, 2022 at 5:43 PM Amit Kapila 
> > wrote:
> > >
> > > 5.
> > > +static ObjTree *
> > > +deparse_CreateStmt(Oid objectId, Node *parsetree)
> > > {
> > > ...
> > > + tmp = new_objtree_VA("TABLESPACE %{tablespace}I", 0); if
> > > + (node->tablespacename) append_string_object(tmp, "tablespace",
> > > + node->tablespacename); else { append_null_object(tmp, "tablespace");
> > > + append_bool_object(tmp, "present", false); }
> > > + append_object_object(createStmt, "tablespace", tmp);
> > > ...
> > > }
> > >
> > > Why do we need to append the objects (tablespace, with clause, etc.)
> > > when they are not present in the actual CREATE TABLE statement? The
> > > reason to ask this is that this makes the string that we want to send
> > > downstream much longer than the actual statement given by the user on
> > > the publisher.
> > >
> >
> > After thinking some more on this, it seems the user may want to optionally
> > change some of these attributes, for example, on the subscriber, it may 
> > want to
> > associate the table with a different tablespace. I think to address that we 
> > can
> > append these additional attributes optionally, say via an additional 
> > parameter
> > (append_all_options/append_all_attributes or something like that) in exposed
> > APIs like deparse_utility_command().
>
> I agree and will research this part.
>
> And here is the new version patch set.
> Most of changes are in the deparser which include:
>
> support CREATE PARTITIONED TABLE
> support ALTER ATTACH/DETACH PARTITION
> support CREATE/ALTER TABLE with ACCESS METHOD
> support CREATE TABLE OF
> support CREATE/ALTER TABLE with GENERATED COLUMN
> support CREATE/ALTER TABLE with DENTITY COLUMN
> support CREATE/ALTER TABLE with COMPRESSION METHOD
> support ALTER COLUMN numofcol SET STATISTICS (mentioned by sawada-san [1])
> support ALTER SCHEMA
> support CRAETE/DROP INDEX
>
> Note that, for ATTACH/DETACH PARTITION, I haven't added extra logic on
> subscriber to handle the case where the table on publisher is a PARTITIONED
> TABLE while the target table on subscriber side is NORMAL table. We will
> research this more and improve this later.
>
> Besides, the new version event trigger won't WAL log the DDL whose target
> table is a temporary table so that the problem reported by Vignesh[2] is
> fixed.
>
> About the recent comment from Amit[3] and Vignesh[4], I will investigate the
> comments and address them in next version.
>
> [1] 
> https://www.postgresql.org/message-id/CAD21AoBVCoPPRKvU_5-%3DwEXsa92GsNJFJOcYyXzvoSEJCx5dKw%40mail.gmail.com
> [2] 
> https://www.postgresql.org/message-id/CALDaNm33W35pcBE3zOpJhwnYBdBoZDpKxssemAN21NwVhJuong%40mail.gmail.com
> [3] 
> https://www.postgresql.org/message-id/CAA4eK1K88SMoBq%3DDRA4XU-F3FG6qyzCjGMMKsPpcRBPRcrELrw%40mail.gmail.com
> [4] 
> https://www.postgresql.org/message-id/CALDaNm3rEA_zmnDMOCT7NqK4aAffhAgooLf8rXjUN%3DYwA8ASFw%40mail.gmail.com

Thanks for the updated patch.
Few comments on 0002 patch:
1) When we create a subscription for a publication with the existing
default PUBLISH parameter having default value as
'insert,update,delete,truncate', we do an initial table sync to get
the initial table data from the publisher to the subscriber. But in
case of a publication created with 'ddl', the subscription expects the
existing initial tables present in the publisher to be created
beforehand in the subscriber. Should this be the default behavior?
Should we do a ddl dump for all the tables and restore the ddl to the
subscription while creating the subscription? Or is this planned as an
option for the later version. If we could do this as part of ddl
logical replication, it will help in reducing the steps further for
logical replication setup. If this will not be supported in this patch
series, probably we could document the behavior and add comments for
this at an appropriate place.

 2) When a publication is created with ddl enabled, event triggers
will be created implicitly. Currently these event triggers are also
getting dumped. These should not be dumped. Currently while the
publication is restored, the event trigger will be created and also
the explicit event triggers which were dumped will also  get created
resulting in multiple event triggers. The event trigger should not be
dumped.
 @@ -4016,6 +4026,15 @@ dumpPublication(Archive *fout, const
PublicationInfo *pubinfo)
first = false;
}

+   if (pubinfo->pubddl)
+   {
+   if (!first)
+   appendPQExpBufferStr(query, ", ");
+
+   appendPQExpBufferStr(query, "ddl");
+   first = false;
+   }
+

3) SYNTAX Support:
Currently creation of "FOR TABLE" publication with ddl is supported.
Should we allow support of ddl for "FOR TABLE" publication. The
creation of the subscription will fail saying the table does

Re: First draft of the PG 15 release notes

2022-07-01 Thread Bruce Momjian

On Tue, Jun 28, 2022 at 05:32:26PM -0700, Peter Geoghegan wrote:
> The main enhancement to VACUUM for Postgres 15 was item 1, which
> taught VACUUM to dynamically track the oldest remaining XID (and the
> oldest remaining MXID) that will remain in the table at the end of the
> same VACUUM operation. These final/oldest XID/MXID values are what we
> now use to set relfrozenxid and relminmxid in pg_class. Previously we
> just set relfrozenxid/relminmxid to whatever XID/MXID value was used
> to determine which XIDs/MXIDs needed to be frozen. These values often
> indicated more about VACUUM implementation details (like the
> vacuum_freeze_min_age GUc's value) than the actual true contents of the
> table at the end of the most recent VACUUM.
> 
> It might be worth explaining the shift directly in the release notes.
> The new approach is simpler and makes a lot more sense -- why should
> the relfrozenxid be closely tied to freezing? We don't necessarily

I don't think this is an appropriate detail for the release notes.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-07-01 Thread Ranier Vilela

Em qui., 30 de jun. de 2022 às 19:37, Peter Eisentraut <
peter.eisentr...@enterprisedb.com> escreveu:

> I have also committed a patch that gets rid of MemSet() calls where the
> value is a constant not-0, because that just falls back to memset() anyway.
>
Peter there are some missing paths in this commit.

Despite having included the attached patch, there is no need to credit me
as the author, just as a report.

regards,
Ranier Vilela


001-avoid-unecessary-MemSet-calls.patch
Description: Binary data

EINTR in ftruncate()

2022-07-01 Thread Alvaro Herrera

Nicola Contu reported two years ago to pgsql-general[1] that they were
having sporadic query failures, because EINTR is reported on some system
call.  I have been told that the problem persists, though it is very
infrequent.  I propose the attached patch.  Kyotaro proposed a slightly
different patch which also protects write(), but I think that's not
necessary.

Thomas M. produced some more obscure theories for other things that
could fail, but I think we should patch this problem first, which seems
the most obvious one, and deal with others if and when they are
reported.

[1] 
https://www.postgresql.org/message-id/CAMTZZh2V%2B0wJVgSqTVvXUAVMduF57Uxubvvw58%3DkbOae%2B53%2BQQ%40mail.gmail.com

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Use it up, wear it out, make it do, or do without"
>From 466ed63b9b399c2914aa44f56f56e044341a77f5 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 1 Jul 2022 17:16:33 +0200
Subject: [PATCH v2] retry ftruncate

---
 src/backend/storage/ipc/dsm_impl.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 873867e856..4f35e7e3d1 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -362,8 +362,21 @@ dsm_impl_posix_resize(int fd, off_t size)
 {
 	int			rc;
 
-	/* Truncate (or extend) the file to the requested size. */
-	rc = ftruncate(fd, size);
+	/*
+	 * Truncate (or extend) the file to the requested size.  We may get
+	 * interrupted.  If so, just retry unless there is an interrupt pending.
+	 * This avoids the possibility of looping forever if another backend is
+	 * repeatedly trying to interrupt us.
+	 */
+	for (;;)
+	{
+		errno = 0;
+		rc = ftruncate(fd, size);
+		if (rc == 0)
+			break;
+		if (errno != EINTR || ProcDiePending || QueryCancelPending)
+			return rc;
+	}
 
 	/*
 	 * On Linux, a shm_open fd is backed by a tmpfs file.  After resizing with
-- 
2.30.2

Re: [PoC/RFC] Multiple passwords, interval expirations

2022-07-01 Thread Stephen Frost

Greetings,

On Fri, Jul 1, 2022 at 10:51 Brindle, Joshua  wrote:

>
> On 6/30/22 8:20 PM, Stephen Frost wrote:
> > * Gurjeet Singh (gurj...@singh.im) wrote:
> >> I am planning on picking it up next week; right now picking up steam,
> >> and reviewing a different, smaller patch.
> > Great!  Glad that others are interested in this.
> >
> >> At his behest, I had a conversation with Joshua (OP), and have his
> >> support to pick up and continue working on this patch. I have a some
> >> ideas of my own, on what this patch should do, but since I haven't
> >> fully reviewed the (bulky) patch, I'll reserve my proposals until I
> >> wrap my head around it.
> > I'd be curious as to your thought as to what the patch should be doing.
> > Joshua and I had discussed it at some length as he was working on it.
>
>
> Adding myself to the CC list here /waves

Hi!

I gave Gurjeet a bit of a brain dump on what I had planned (and what
> we'd talked about), though he's free to take it in a different direction
> if he wants.

Perhaps though would certainly like this to patch to be useful for the
use-cases that we had discussed, naturally. :)

>> Please expect some activity on this patch towards the end of next week.
> > I've gone ahead and updated it, cleaned up a couple things, and make it
> > so that check-world actually passes with it.  Attached is an updated
> > version and I'll add it to the July commitfest.
>
> Ah, thanks. Hopefully it wasn't too horrible of a rebase.

Wasn’t too bad.. needs more clean-up, there was some white space issues and
some simple re-base stuff, but then the support for “md5” pg_hba option was
broken for users with SCRAM passwords because we weren’t checking if there
was a SCRAM pw stored and upgrading to SCRAM in that case.  That’s the main
case that I fixed.  We will need to document this though, of course.  The
patch I submitted should basically do:

pg_hba md5 + md5-only pws -> md5 auth used
pg_hba md5 + scram-only pws -> scram
pg_hba md5 + md5 and scram pws -> scram
pg_hba scram -> scram

Not sure if we need to try and do something to make it possible to have
pg_hba md5 + mixed pws and have md5 used but it’s tricky as we would have
to know on the server side early on if that’s what we want to do.  We could
add an option to md5 to say “only do md5” maybe but I’m also inclined to
not bother and tell people to just get moved to scram already.

For my 2c, I’d also like to move to having a separate column for the PW
type from the actual secret but that’s largely an independent change.

Thanks!

Stephen

>

Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-07-01 Thread David G. Johnston

On Fri, Jul 1, 2022 at 7:58 AM Peter Geoghegan  wrote:

> On Fri, Jul 1, 2022 at 6:01 AM Robert Haas  wrote:
> > What would probably help more is adding something like this to the
> > error message:
> >
> > HINT: column "b" could refer to any of these relations: "foo", "excluded"
> >
> > That could also help people who encounter this error in other
> > situations. I'm not 100% sure this is a good idea, but I feel like it
> > would have a much better chance of helping someone in this situation
> > than the proposed doc patch.
>
> I agree with everything you've said here, though I am 100% sure that
> something like your proposed HINT would be a real usability win.
>
> The "Perhaps you meant to reference the column" HINT displayed when
> the user misspells a column is a surprisingly popular feature. I'm
> aware of quite a few people singing its praises on Twitter, for
> example. That hardly ever happens, even with features that we think of
> as high impact major features. So evidently users really value these
> kinds of quality of life improvements.
>
>

+1 to this better approach to address the specific confusion regarding
ambiguity.

Without any other changes being made I'm content with the status quo
calling excluded a table a reasonable approximation that hasn't been seen
to be confusing to our readers.

That said, I still think that the current wording should be tweak with
respect to row vs. rows (especially if we continue to call it a table):

Current:
"The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the
existing row using the table's name (or an alias), and to [rows] proposed
for insertion using the special excluded table."

Change [rows] to:

"the row"


I'm undecided whether "FROM excluded" should be something that works - but
I also don't think it would actually be used in any case.

David J.

Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2022-07-01 Thread Matthias van de Meent

On Tue, 21 Jun 2022 at 03:45, Michael Paquier  wrote:
> +   /*
> +* Ensure that xlogreader.c can read the record by ensuring that the
> +* data section of the WAL record can be allocated.
> +*/
> +   if (unlikely(!AllocSizeIsValid(total_len)))
> +   XLogErrorDataLimitExceeded();
>
> By the way, while skimming through the patch, the WAL reader seems to
> be a bit more pessimistic than this estimation, calculating the amount
> to allocate as of DecodeXLogRecordRequiredSpace(), based on the
> xl_tot_len given by a record.

I see, thanks for notifying me about that.

PFA a correction for that issue. It does copy over the value for
MaxAllocSize from memutils.h into xlogreader.h, because we need that
value in FRONTEND builds too, and memutils.h can't be included in
FRONTEND builds. One file suffixed with .backpatch that doesn't
include the function signature changes, but it is not optimized for
any stable branch[15].

-Matthias

PS. I'm not amused by the double copy we do in the xlogreader, as I
had expected we'd just read the record and point into that single
xl_rec_len-sized buffer. Apparently that's not how it works...

[15] it should apply to stable branches all the way back to
REL_15_STABLE and still work as expected. Any older than that I
haven't tested, but probably only require some updates for
XLogRecMaxLength in xlogreader.h.

v5-0001-Add-protections-in-xlog-record-APIs-against-large.patch
Description: Binary data

v5-0001-Add-protections-in-xlog-record-APIs-against-large.patch.backpatch
Description: Binary data

[Commitfest 2022-07] Begins Now

2022-07-01 Thread Jacob Champion

Hello!

It's been July everywhere on Earth for a few hours, so the July
commitfest is now in progress:

https://commitfest.postgresql.org/38/

New patches may be registered for the next commitfest in September.
Pick some patches to review and have fun!

Happy hacking,
--Jacob

Re: Refactor construct_array() and deconstruct_array() for built-in types

2022-07-01 Thread Peter Eisentraut


On 01.07.22 15:37, Tom Lane wrote:

Perhaps a good compromise could be to turn the duplicated code into
a macro that's instantiated in both places?  But I don't actually
see anything much wrong with the code as Peter has it.


There are opportunities to refine this further.  For example, there is 
similar code in TupleDescInitBuiltinEntry(), and bootstrap.c also 
contains hardcoded info on built-in types, and GetCCHashEqFuncs() is 
also loosely related.  As I mentioned earlier in the thread, one could 
have genbki.pl generate support code for this.

Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-07-01 Thread Peter Geoghegan

On Fri, Jul 1, 2022 at 6:01 AM Robert Haas  wrote:
> What would probably help more is adding something like this to the
> error message:
>
> HINT: column "b" could refer to any of these relations: "foo", "excluded"
>
> That could also help people who encounter this error in other
> situations. I'm not 100% sure this is a good idea, but I feel like it
> would have a much better chance of helping someone in this situation
> than the proposed doc patch.

I agree with everything you've said here, though I am 100% sure that
something like your proposed HINT would be a real usability win.

The "Perhaps you meant to reference the column" HINT displayed when
the user misspells a column is a surprisingly popular feature. I'm
aware of quite a few people singing its praises on Twitter, for
example. That hardly ever happens, even with features that we think of
as high impact major features. So evidently users really value these
kinds of quality of life improvements.

--
Peter Geoghegan

Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-07-01 Thread Peter Geoghegan

On Fri, Jul 1, 2022 at 6:40 AM Tom Lane  wrote:
> Robert Haas  writes:
> > What would probably help more is adding something like this to the
> > error message:
> > HINT: column "b" could refer to any of these relations: "foo", "excluded"
>
> +1, that seems like it could be handy across the board.

The user *will* get a similar HINT if they happen to *also* spell the
would-be ambiguous column name slightly incorrectly:

ERROR:  column "barr" does not exist
LINE 1: ...lict (bar) do update set bar = excluded.bar where barr != 5;
 ^
HINT:  Perhaps you meant to reference the column "foo.bar" or the
column "excluded.bar".

-- 
Peter Geoghegan

Re: drop support for v9.3 ?

2022-07-01 Thread Tom Lane

Justin Pryzby  writes:
> Is it time to drop support for the oldest release ?
> As in cf0cab868  30e7c175b  e469f0aaf  c03b7f526  492046fa9

I'm not really in favor of moving that goalpost forward when
there's no concrete reason to do so.  The amount of work involved
in a sweep for "what code can be removed" is more or less constant,
so that doing this in (say) three years will be much less work
than doing it every year because the calendar says to.

The reason we pushed up the minimum to 9.2 was that we found that
versions older than that don't build readily on modern toolchains.
I've not yet heard of similar bit-rot in 9.2 ... and when it does
happen it'll probably affect a number of branches at once.

regards, tom lane

Re: generate_series for timestamptz and time zone problem

2022-07-01 Thread Przemysław Sztoch


Gurjeet Singh wrote on 01.07.2022 06:35:

On Tue, Jun 21, 2022 at 7:56 AM Przemysław Sztoch  wrote:

Please give me feedback on how to properly store the timezone name in the 
function context structure. I can't finish my work without it.

The way I see it, I don't think you need to store the tz-name in the
function context structure, like you're currently doing. I think you
can remove the additional member from the
generate_series_timestamptz_fctx struct, and refactor your code in
generate_series_timestamptz() to work without it.; you seem to  be
using the tzname member almost as a boolean flag, because the actual
value you pass to DFCall3() can be calculated without first storing
anything in the struct.
Do I understand correctly that functions that return SET are executed 
multiple times?

Is access to arguments available all the time?
I thought PG_GETARG_ could only be used when SRF_IS_FIRSTCALL () is true 
- was I right or wrong?

Can you please explain why you chose to remove the provolatile
attribute from the existing entry of date_trunc in pg_proc.dat.

I believe it was a mistake in PG code.
All timestamptz functions must be STABLE as they depend on the current: 
SHOW timezone.
If new functions are created that pass the zone as a parameter, they 
become IMMUTABLE.
FIrst date_trunc function implementaion was without time zone parameter 
and someone who
added second variant (with timezone as parameter) copied the definition 
without removing the STABLE flag.

It seems like you've picked/reused code from neighboring functions
(e.g. from timestamptz_trunc_zone()). Can you please see if you can
turn such code into a function, and  call the function, instead of
copying it.

Ok. Changed.

Also, according to the comment at the top of pg_proc.dat,

  # Once upon a time these entries were ordered by OID.  Lately it's often
  # been the custom to insert new entries adjacent to related older entries.

So instead of adding your entries at the bottom of the file, please
each entry closer to an existing entry that's relevant to it.

Ok. Changed.

Some regression tests has been added.

I have problem with this:
-- Considering only built-in procs (prolang = 12), look for multiple uses
-- of the same internal function (ie, matching prosrc fields).  It's OK to
-- have several entries with different pronames for the same internal 
function,

-- but conflicts in the number of arguments and other critical items should
-- be complained of.  (We don't check data types here; see next query.)
-- Note: ignore aggregate functions here, since they all point to the same
-- dummy built-in function.
SELECT p1.oid, p1.proname, p2.oid, p2.proname (...):
 oid  | proname | oid  | proname
--+-+--+-
 1189 | timestamptz_pl_interval | 8800 | date_add
  939 | generate_series | 8801 | generate_series
(2 rows)

--
Przemysław Sztoch | Mobile +48 509 99 00 66
diff --git a/src/backend/utils/adt/timestamp.c 
b/src/backend/utils/adt/timestamp.c
index f70f829d83..415d91dfce 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -69,6 +69,7 @@ typedef struct
TimestampTz finish;
Intervalstep;
int step_sign;
+   chartzname[TZ_STRLEN_MAX + 1];
 } generate_series_timestamptz_fctx;
 
 
@@ -547,6 +548,48 @@ parse_sane_timezone(struct pg_tm *tm, text *zone)
return tz;
 }
 
+static pg_tz *
+lookup_timezone(text *zone)
+{
+   chartzname[TZ_STRLEN_MAX + 1];
+   char   *lowzone;
+   int type,
+   val;
+   pg_tz  *tzp;
+   /*
+* Look up the requested timezone (see notes in timestamptz_zone()).
+*/
+   text_to_cstring_buffer(zone, tzname, sizeof(tzname));
+
+   /* DecodeTimezoneAbbrev requires lowercase input */
+   lowzone = downcase_truncate_identifier(tzname,
+   
   strlen(tzname),
+   
   false);
+
+   type = DecodeTimezoneAbbrev(0, lowzone, , );
+
+   if (type == TZ || type == DTZ)
+   {
+   /* fixed-offset abbreviation, get a pg_tz descriptor for that */
+   tzp = pg_tzset_offset(-val);
+   }
+   else if (type == DYNTZ)
+   {
+   /* dynamic-offset abbreviation, use its referenced timezone */
+   }
+   else
+   {
+   /* try it as a full zone name */
+   tzp = pg_tzset(tzname);
+   if (!tzp)
+   ereport(ERROR,
+   
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("time zone \"%s\" not 
recognized", tzname)));
+   }
+
+   return tzp;
+}
+
 /*
  * make_timestamp_internal
  * workhorse for

Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-07-01 Thread Tom Lane

Robert Haas  writes:
> I think that the issue here is simply that because both the updated
> table and the "excluded" pseudo-table are visible here, and have the
> same columns, an unqualified name is ambiguous. I don't really think
> that it's worth documenting. The error message you get if you fail to
> do it is actually pretty good:

> ERROR:  column reference "b" is ambiguous

> Now you could read that and not understand that the ambiguity is
> between the target table and the "excluded" pseudo-table, for sure.

Agreed.  It doesn't help that there's no explicit use of "excluded"
anywhere, as there is in more usual ambiguous-column cases.

> What would probably help more is adding something like this to the
> error message:
> HINT: column "b" could refer to any of these relations: "foo", "excluded"

+1, that seems like it could be handy across the board.

regards, tom lane

Re: Refactor construct_array() and deconstruct_array() for built-in types

2022-07-01 Thread Tom Lane

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Dagfinn Ilmari Mannsåker  writes:
>> I dind't pay much attention to this thread earlier, but I was struck by
>> the duplication of the switch statement determining the elemlen,
>> elembyval, and elemalign values between the construct and deconstruct
>> functions.  How about a common function they can both call?

I was wondering about that too while reading the committed version.
However, adding an additional function call would weaken the argument
that this adds just a tolerable amount of overhead, primarily because
you'd need to return a record or introduce pointers or the like.

> I just realised that this would require the error message to not include
> the function name (which isn't really that critical, since it's a
> developer-facing message), but an option would to make it return false
> for unknown types, so each of the calling functions can emit their own
> error message.

Nah, because the point of that was just to direct people to where
to fix it when they need to.  So the message need only refer to
the common function, if we were to change it.

Perhaps a good compromise could be to turn the duplicated code into
a macro that's instantiated in both places?  But I don't actually
see anything much wrong with the code as Peter has it.

regards, tom lane

Re: pg_checkpointer is not a verb or verb phrase

2022-07-01 Thread Magnus Hagander

On Fri, Jul 1, 2022 at 3:18 PM Robert Haas  wrote:

> On Thu, Jun 30, 2022 at 9:22 PM Michael Paquier 
> wrote:
> > "checkpoint" is not a verb (right?), so would something like
> > "pg_perform_checkpoint" rather than "pg_checkpoint" fit better in the
> > larger picture?
>
> It's true that the dictionary describes checkpoint as a noun, but I
> think in a database context it is often used as a verb. One example is
> the CHECKPOINT command itself: command names are all verbs, and
> CHECKPOINT is a command name, so we're using it as a verb. Similarly I
> think you can talk about needing to checkpoint the database. Therefore
> I think pg_checkpoint is OK, and it has the advantage of being short.
>

+1 for pg_checkpoint on that -- let's not make it longer than necessary.

And yes, +1 for actually changing it. It's a lot cheaper to change it now
than it will be in the future.  Yes, it would've been even cheaper to have
already changed it, but we can't go back in time...

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Hardening PostgreSQL via (optional) ban on local file system access

2022-07-01 Thread Robert Haas

On Fri, Jul 1, 2022 at 5:15 AM Hannu Krosing  wrote:
> This is the eternal problem with security - more security always
> includes more inconvenience.

But the same amount of security can be more or less inconvenient, and
I don't think your proposal does very well there. More inconvenience
doesn't mean more security.

I actually think this whole line of attack is probably a dead end. My
preferred approach is to find ways of delegating a larger subset of
superuser privileges to non-superusers, or to prevent people from
assuming the superuser role in the first place. Trying to restrict
what superusers can do seems like a much more difficult path, and I
think it might be a dead end. But if such an approach has any hope of
success, I think it's going to have to try to create a situation where
most of the administration that you need to do can be done most of the
time with some sort of restricted superuser privileges, and only in
extreme scenarios do you need to change the cluster state to allow
full superuser access. There's no such nuance in your proposal. It's
just a great big switch that makes superuser mean either nothing, or
all the things it means today. I don't think that's really a
meaningful step forward.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

drop support for v9.3 ?

2022-07-01 Thread Justin Pryzby

Is it time to drop support for the oldest release ?
As in cf0cab868  30e7c175b  e469f0aaf  c03b7f526  492046fa9

If we also did the same thing next year, it'd be possible to use ::regnamespace
with impunity...

Re: pg_checkpointer is not a verb or verb phrase

2022-07-01 Thread Robert Haas

On Thu, Jun 30, 2022 at 9:22 PM Michael Paquier  wrote:
> "checkpoint" is not a verb (right?), so would something like
> "pg_perform_checkpoint" rather than "pg_checkpoint" fit better in the
> larger picture?

It's true that the dictionary describes checkpoint as a noun, but I
think in a database context it is often used as a verb. One example is
the CHECKPOINT command itself: command names are all verbs, and
CHECKPOINT is a command name, so we're using it as a verb. Similarly I
think you can talk about needing to checkpoint the database. Therefore
I think pg_checkpoint is OK, and it has the advantage of being short.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: margay fails assertion in stats/dsa/dsm code

2022-07-01 Thread Robert Haas

On Thu, Jun 30, 2022 at 10:34 PM Thomas Munro  wrote:
> So... I think we should select a different default
> dynamic_shared_memory_type in initdb.c if defined(__sun__).  Which is
> the least terrible?  For sysv, it looks like all the relevant sysctls
> that used to be required to use sysv memory became obsolete/automatic
> in Sol 10 (note: Sol 9 is long EOL'd), so it should just work AFAICT,
> whereas for mmap mode your shared memory data is likely to cause file
> I/O because we put the temporary files in your data directory.  I'm
> thinking perhaps we should default to dynamic_shared_memory_type=sysv
> for 15+.  I don't really want to change it in the back branches, since
> nobody has actually complained about "posix" performance and it might
> upset someone if we change it for newly initdb'd DBs in a major
> release series.  But I'm not an expert or even user of this OS, I'm
> just trying to fix the build farm; better ideas welcome.

Boy, relying on DSM for critical stuff sure is a lot of fun! This is
exactly why I hate adding new facilities that have to be implemented
in OS-dependent ways.

Changing the default on certain platforms to 'posix' or 'sysv'
according to what works best on that platform seems reasonable to me.
I agree that defaulting to 'mmap' doesn't seem like a lot of fun,
although I think it could be a reasonable choice on a platform where
everything else is broken. You could alternatively try to fix 'posix'
by adding some kind of code to work around that platform's
deficiencies. Insert handwaving here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: replacing role-level NOINHERIT with a grant-level option

2022-07-01 Thread Robert Haas

On Fri, Jul 1, 2022 at 8:22 AM Joe Conway  wrote:
> Hmm, maybe I am misunderstanding something, but what I mean is something
> like:
>
> 8<
> CREATE TABLE t1(f1 int);
> CREATE TABLE t2(f1 int);
>
> CREATE USER A; --defaults to INHERIT
> CREATE USER B;
> CREATE USER C;
>
> GRANT select ON TABLE t1 TO B;
> GRANT select ON TABLE t2 TO C;
>
> GRANT B TO A;
> GRANT C TO A;
>
> SET SESSION AUTHORIZATION A;
>
> -- works
> SELECT * FROM t1;
> -- works
> SELECT * FROM t2;
>
> RESET SESSION AUTHORIZATION;
> REVOKE INHERIT OPTION FOR C FROM A;
> SET SESSION AUTHORIZATION A;
>
> -- works
> SELECT * FROM t1;
> -- fails
> SELECT * FROM t2;
> 8<
>
> So now A has implicit inherited privs for t1 but not for t2.

Yeah, I anticipate that this would work in the way that you postulate here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-07-01 Thread Robert Haas

On Thu, Jun 30, 2022 at 6:40 PM Peter Geoghegan  wrote:
> My impression from reading this transcript is that the user was
> confused as to why they needed to qualify the target table name in the
> ON CONFLICT DO UPDATE's WHERE clause -- they didn't have to qualify it
> in the targetlist that appears in "SET ... ", so why the need to do it
> in the WHERE clause? This isn't something that upsert statements need
> to do all that often, just because adding additional conditions to the
> WHERE clause isn't usually necessary. That much makes sense to me -- I
> *can* imagine how that could cause confusion.

+1.

I think that the issue here is simply that because both the updated
table and the "excluded" pseudo-table are visible here, and have the
same columns, an unqualified name is ambiguous. I don't really think
that it's worth documenting. The error message you get if you fail to
do it is actually pretty good:

rhaas=# insert into foo values (1, 'frob') on conflict (a) do update
set b = (select b || 'nitz');
ERROR:  column reference "b" is ambiguous
LINE 1: ...'frob') on conflict (a) do update set b = (select b || 'nitz...
 ^

Now you could read that and not understand that the ambiguity is
between the target table and the "excluded" pseudo-table, for sure.
But, would you think to check the documentation at that point? I'm not
sure that's what people would really do. And if they did, I think that
David's proposed patch would be unlikely to make them less confused.
What would probably help more is adding something like this to the
error message:

HINT: column "b" could refer to any of these relations: "foo", "excluded"

That could also help people who encounter this error in other
situations. I'm not 100% sure this is a good idea, but I feel like it
would have a much better chance of helping someone in this situation
than the proposed doc patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Hardening PostgreSQL via (optional) ban on local file system access

2022-07-01 Thread Joe Conway

On 7/1/22 05:14, Hannu Krosing wrote:

On Thu, Jun 30, 2022 at 7:25 PM Bruce Momjian  wrote:

On Thu, Jun 30, 2022 at 11:52:20AM -0400, Robert Haas wrote:
> I don't think this would be very convenient in most scenarios,

This is the eternal problem with security - more security always
includes more inconvenience.

yep

This one would be for cases where you want best multi-layer
protections also against unknown threats and are ready to trade some
convenience for security. Also it would not be that bad once you use
automated deployment pipelines which just need an extra line to unlock
superuser for deployment.

+1

and I think it would also be difficult to implement correctly. I
don't think you can get by with just having superuser() return
false sometimes despite pg_authid.rolsuper being true. There's a
lot of subtle assumptions in the code to the effect that the
properties of a session are basically stable unless some SQL is
executed which changes things.

A good barrier SQL for this could be SET ROLE=... .
Maybe just have a mode where a superuser can not log in _or_ SET ROLE
unless this is explicitly allowed in pg_superuser.conf

Agreed.

In fact in a recent discussion with Joshua Brindle (CC'd) he wished for 
a way that we could designate the current session "tainted". For example 
if role joe with membership in postgres should always be logging in from 
192.168.42.0/24 when performing admin duties as postgres, but logs in 
from elsewhere their session should be marked tainted and escalating to 
postgres should be denied.

> I think if we start injecting hacks like this it may seem to work in
> light testing but we'll never get to the end of the bug reports.

In this case it looks like each of these bug reports would mean an
avoided security breach which for me looks preferable.

Again, this would be all optional, opt-in, DBA-needs-to-set-it-up
feature for the professionally paranoid and not something we suddenly
force on people who would like to run all their databases using
user=postgres database=postgres with trust specified in the
pg_hba.conf "because the firewall takes care of security" :)

Yeah, seems it would have to be specified per-session, but how would you
specify a specific session before the session starts?

One often recommended way to do superuser'y things in a secure
production database is to have a non-privileged NOINHERIT user for
logging in and then do
SET ROLE=;
when needed, similar to using su/sudo in shell. This practice both
reduces the attack surface and also provides auditability by knowing
who logged in for superuser work.

+many

--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-07-01 Thread Robert Haas

On Thu, Jun 30, 2022 at 5:05 PM Peter Geoghegan  wrote:
> On Thu, Jun 30, 2022 at 1:43 PM Robert Haas  wrote:
> > rhaas=# insert into foo values (1, 'frob') on conflict (a) do update
> > set b = (select b || 'nitz' from excluded);
> > ERROR:  relation "excluded" does not exist
> > LINE 1: ...ct (a) do update set b = (select b || 'nitz' from excluded);
> >
> > I do find that a bit of a curious error message, because that relation
> > clearly DOES exist in the range table.
>
> Let's say that we supported this syntax. I see some problems with that
> (you may have thought of these already). Thinking of "excluded" as a
> separate table creates some very thorny questions, such as:
>
> How would the user be expected to handle the case where there was more
> than a single "row" in "excluded"? How could the implementation know
> what the contents of the "excluded table" were ahead of time in the
> multi-row-insert case? We'd have to know about *all* of the conflicts
> for all rows proposed for insertion to do this, which seems impossible
> in the general case -- because some of those conflicts won't have
> happened yet, and cannot be predicted.

I was assuming it would just behave like a 1-row table i.e. these
would do the same thing:

insert into foo values (1, 'frob') on conflict (a) do update set b =
(select excluded.b || 'nitz');
insert into foo values (1, 'frob') on conflict (a) do update set b =
(select b || 'nitz' from excluded);

I'm actually kinda unsure why that doesn't already work. There may
well be a very good reason, but my naive thought would be that if
excluded doesn't have a range table entry, the first one would fail
because excluded can't be looked up in the range table, and if it does
have a range-table entry, then the second one would work because the
from-clause reference would find it just like the qualified column
reference did.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pluggable toaster

2022-07-01 Thread Matthias van de Meent

On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov  wrote:
>
> Hi hackers!
> Here is the patch set rebased onto current master (15 rel beta 2 with commit 
> from 29.06).

Thanks!

> Just to remind:
> With this patch set you will be able to develop and plug in your own TOAST 
> mechanics for table columns. Knowing internals and/or workflow and workload
> of data being TOASTed makes Custom Toasters much more efficient in 
> performance and storage.

The new toast API doesn't seem to be very well documented, nor are the
new features. Could you include a README or extend the comments on how
this is expected to work, and/or how you expect people to use (the
result of) `get_vtable`?

> Patch set consists of 9 incremental patches:
> [...]
> 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax allowing 
> creation of custom Toaster (CREATE TOASTER ...)
> and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE 
> EXTERNAL TOASTER bytea_toaster);)

This patch 0002 seems to include changes to log files (!) that don't
exist in current HEAD, but at the same time are not created by patch
0001. Could you please check and sanitize your patches to ensure that
the changes are actually accurate?

Like Robert Haas mentioned earlier[0], please create a branch in a git
repository that has a commit containing the changes for each patch,
and then use git format-patch to generate a single patchset, one that
shares a single version number. Keeping track of what patches are
needed to test this CF entry is already quite difficult due to the
amount of patches and their packaging (I'm having troubles managing
these seperate .patch.gz), and the different version tags definitely
don't help in finding the correct set of patches to apply once
downloaded.

Kind regards,

Matthias van de Meent

[0] 
https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com

Re: [PATCH] Add result_types column to pg_prepared_statements view

2022-07-01 Thread Dagfinn Ilmari Mannsåker

Peter Eisentraut  writes:

> On 19.05.22 17:34, Dagfinn Ilmari Mannsåker wrote:
>> Prompted by a question on IRC, here's a patch to add a result_types
>> column to the pg_prepared_statements view, so that one can see the types
>> of the columns returned by a prepared statement, not just the parameter
>> types.
>> I'm not quite sure about the column name, suggestions welcome.
>
> I think this patch is sensible.
>
> I see one issue: When you describe a prepared statement via the
> protocol, if a result field has a domain as its type, the RowDescription 
> message sends the underlying base type, not the domain type directly
> (see SendRowDescriptionMessage()).  But it doesn't do that for the 
> parameters (see exec_describe_statement_message()).  I don't know why
> that is; the protocol documentation doesn't mention it.  Might be worth 
> looking into, and checking whether the analogous information contained
> in this view should be made consistent.

A bit of git archaeology shows that the change was made by Tom (Cc-ed)
in 7.4:

commit d9b679c13a820eb7b464a1eeb1f177c3fea13ece
Author: Tom Lane 
Date:   2003-05-13 18:39:50 +

In RowDescription messages, report columns of domain datatypes as having
the type OID and typmod of the underlying base type.  Per discussions
a few weeks ago with Andreas Pflug and others.  Note that this behavioral
change affects both old- and new-protocol clients.

I can't find that discussion in the archive, but someone did complain
about it shortly after:

https://www.postgresql.org/message-id/flat/D71A1574-A772-11D7-913D-0030656EE7B2%40icx.net

I think in this case returning the domain type is more useful, since
it's easy to get from that to the base type, but not vice versa.

The arguments about client-side type-specific value handling for
RowDescription don't apply here IMO, since this view is more
user-facing.

- ilmari

Re: replacing role-level NOINHERIT with a grant-level option

2022-07-01 Thread Joe Conway


On 7/1/22 07:48, Robert Haas wrote:

On Fri, Jul 1, 2022 at 6:17 AM Joe Conway  wrote:

Would this allow for an explicit REVOKE to override a default INHERIT
along a specific path?


Can you give an example?

If you mean that A is granted to B which is granted to C which is
granted to D and you now want NOINHERIT behavior for the B->C link in
the chain, this would allow that. You could modify the existing grant
by saying either "REVOKE INHERIT OPTION FOR B FROM C" or "GRANT B TO C
WITH INHERIT FALSE".


Hmm, maybe I am misunderstanding something, but what I mean is something 
like:


8<
CREATE TABLE t1(f1 int);
CREATE TABLE t2(f1 int);

CREATE USER A; --defaults to INHERIT
CREATE USER B;
CREATE USER C;

GRANT select ON TABLE t1 TO B;
GRANT select ON TABLE t2 TO C;

GRANT B TO A;
GRANT C TO A;

SET SESSION AUTHORIZATION A;

-- works
SELECT * FROM t1;
-- works
SELECT * FROM t2;

RESET SESSION AUTHORIZATION;
REVOKE INHERIT OPTION FOR C FROM A;
SET SESSION AUTHORIZATION A;

-- works
SELECT * FROM t1;
-- fails
SELECT * FROM t2;
8<

So now A has implicit inherited privs for t1 but not for t2.

--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Pluggable toaster

2022-07-01 Thread Nikita Malakhov

Hi,
Alexander, thank you for your feedback!
I'd explain our thoughts:
We thought that refactoring default TOAST mechanics via TOAST API in p.
0002 would be too much because the API itself already
introduced a lot of changes, so we kept Default Toasters re-implementation
for later patch.
0002 introduces custom Toast pointers with corresponding macro set
(postgres.h), Dummy toaster as an example for developers
how the API should be used, but left default TOAST as-is. As I see, there
are no lots of custom TAMs, despite of pluggable storage
interface introduced several years ago, so we thought that some simple
example of how to use the new API would be nice to have.
We've done TOAST refactoring in 0003, but it has not replaced default
TOAST, it just routed it default via TOAST API, but it still
is left as part of the core, and is used for TOASTing (set
in atttoaster column) by default.

For performance testing we used a lot of manually corrected scripts, I have
to put them in order but I would provide them later as
additional side patch for this patch set.

Best regards,
--
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

On Fri, Jul 1, 2022 at 11:10 AM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Nikita,
>
> > Here is the patch set rebased onto current master (15 rel beta 2 with
> commit from 29.06).
>
> Thanks for the rebased patchset.
>
> This is a huge effort though. I suggest splitting it into several CF
> entries as we previously did with other patches (64-bit XIDs to name
> one, which BTW is arguably much simpler, but still we had to split
> it). This will simplify the review, limit the scope of the discussion
> and simplify passing the CI. Cfbot is already not happy with the
> patchset.
>
> 0001 - is already in a separate thread [1], that's good. I suggest
> marking it in the patch description for clarity.
> 0002, 0003 - I suggest focusing on these two in this thread and keep
> the rest of the changes for later discussion. Please submit 0004,
> 0005... next time, when we finish with 0001-0003.
>
> The order of proposed changes IMO is wrong.
>
> 0002 should refactor the default TOASTer in a manner similar to a
> pluggable one. Nothing should change from the user's perspective. If
> you do benchmarks, I suggest not to reference the previous talks. I
> familiarized myself with all the related talks linked before (took me
> some time...) and found them useless for the discussion since they
> don't provide exact steps to reproduce. Please provide exact scripts
> and benchmarks results for 0002 in this thread.
>
> 0003 should add an interface that allows replacing the default TOASTer
> with an alternative one. There is no need for contrib/dummy_toaster
> similarly as there is no contrib/ for a dummy TableAM. The provided
> example doesn't do much anyway since all the heavy lifting should be
> done in the callbacks implementation. For test purposes please use
> src/test/regress/.
>
> [1]:
> https://www.postgresql.org/message-id/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru
>
> --
> Best regards,
> Aleksander Alekseev
>

Re: replacing role-level NOINHERIT with a grant-level option

2022-07-01 Thread Robert Haas

On Fri, Jul 1, 2022 at 6:17 AM Joe Conway  wrote:
> Would this allow for an explicit REVOKE to override a default INHERIT
> along a specific path?

Can you give an example?

If you mean that A is granted to B which is granted to C which is
granted to D and you now want NOINHERIT behavior for the B->C link in
the chain, this would allow that. You could modify the existing grant
by saying either "REVOKE INHERIT OPTION FOR B FROM C" or "GRANT B TO C
WITH INHERIT FALSE".

-- 
Robert Haas
EDB: http://www.enterprisedb.com

POC: Lock updated tuples in tuple_update() and tuple_delete()

2022-07-01 Thread Alexander Korotkov

Hackers,

When working in the read committed transaction isolation mode
(default), we have the following sequence of actions when
tuple_update() or tuple_delete() find concurrently updated tuple.

1. tuple_update()/tuple_delete() returns TM_Updated
2. tuple_lock()
3. Re-evaluate plan qual (recheck if we still need to update/delete
and calculate the new tuple for update)
4. tuple_update()/tuple_delete() (this time should be successful,
since we've previously locked the tuple).

I wonder if we should merge steps 1 and 2. We could save some efforts
already done during tuple_update()/tuple_delete() for locking the
tuple. In heap table access method, we've to start tuple_lock() with
the first tuple in the chain, but tuple_update()/tuple_delete()
already visited it. For undo-based table access methods,
tuple_update()/tuple_delete() should start from the last version, why
don't place the tuple lock immediately once a concurrent update is
detected. I think this patch should have some performance benefits on
high concurrency.

Also, the patch simplifies code in nodeModifyTable.c getting rid of
the nested case. I also get rid of extra
table_tuple_fetch_row_version() in ExecUpdate. Why re-fetch the old
tuple, when it should be exactly the same tuple we've just locked.

I'm going to check the performance impact. Thoughts and feedback are welcome.

--
Regards,
Alexander Korotkov


0001-Lock-updated-tuples-in-tuple_update-and-tuple_del-v1.patch
Description: Binary data

Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-07-01 Thread Ranier Vilela

Em qui., 30 de jun. de 2022 às 19:37, Peter Eisentraut <
peter.eisentr...@enterprisedb.com> escreveu:

> On 19.05.22 18:09, Ranier Vilela wrote:
> > Taking it a step further.
> > Created a new patch into commitfest, targeting 16 version.
> > https://commitfest.postgresql.org/38/3645/
> > 
>
> I have committed your 001 patch, which was clearly a (harmless) mistake.
>
Thank you.


>
> I have also committed a patch that gets rid of MemSet() calls where the
> value is a constant not-0, because that just falls back to memset() anyway.
>
> I'm on board with trying to get rid of MemSet(), but first I need to
> analyze all the performance numbers and arguments that were shown in
> this thread.
>
One good argument  is that using memset, allows to compiler
analyze and remove completely memset call if he understands
that can do it, which with MemSet is not possible.

regards,
Ranier Vilela

Re: Refactor construct_array() and deconstruct_array() for built-in types

2022-07-01 Thread Dagfinn Ilmari Mannsåker

Dagfinn Ilmari Mannsåker  writes:

> I dind't pay much attention to this thread earlier, but I was struck by
> the duplication of the switch statement determining the elemlen,
> elembyval, and elemalign values between the construct and deconstruct
> functions.  How about a common function they can both call?  Something
> like:
>
> static void builtin_type_details(Oid elemtype,
>  int *elemlen,
>  bool *elembyval,
>  char *elemalign);

I just realised that this would require the error message to not include
the function name (which isn't really that critical, since it's a
developer-facing message), but an option would to make it return false
for unknown types, so each of the calling functions can emit their own
error message.

> - ilmari

Re: Refactor construct_array() and deconstruct_array() for built-in types

2022-07-01 Thread Dagfinn Ilmari Mannsåker

Peter Eisentraut  writes:

> On 02.05.22 16:48, Tom Lane wrote:
>> Peter Eisentraut  writes:
>>> There are many calls to construct_array() and deconstruct_array() for
>>> built-in types, for example, when dealing with system catalog columns.
>>> These all hardcode the type attributes necessary to pass to these functions.
>> 
>>> To simplify this a bit, add construct_array_builtin(),
>>> deconstruct_array_builtin() as wrappers that centralize this hardcoded
>>> knowledge.  This simplifies many call sites and reduces the amount of
>>> hardcoded stuff that is spread around.
>> 
>>> I also considered having genbki.pl generate lookup tables for these
>>> hardcoded values, similar to schemapg.h, but that ultimately seemed
>>> excessive.
>> +1 --- the added overhead of the switch statements is probably a
>> reasonable price to pay for the notational simplification and
>> bug-proofing.
>> One minor coding gripe is that compilers that don't know that
>> elog(ERROR)
>> doesn't return will certainly generate "use of possibly-uninitialized
>> variable" complaints.  Suggest inserting "return NULL;" or similar into
>> the default: cases.  I'd also use more specific error wording to help
>> people find where they need to add code when they make use of a new type;
>> maybe like "type %u not supported by construct_array_builtin".
>
> I have pushed this with the improvements you had suggested.

I dind't pay much attention to this thread earlier, but I was struck by
the duplication of the switch statement determining the elemlen,
elembyval, and elemalign values between the construct and deconstruct
functions.  How about a common function they can both call?  Something
like:

static void builtin_type_details(Oid elemtype,
 int *elemlen,
 bool *elembyval,
 char *elemalign);

- ilmari

Re: [PATCH] Add result_types column to pg_prepared_statements view

2022-07-01 Thread Peter Eisentraut


On 19.05.22 17:34, Dagfinn Ilmari Mannsåker wrote:

Prompted by a question on IRC, here's a patch to add a result_types
column to the pg_prepared_statements view, so that one can see the types
of the columns returned by a prepared statement, not just the parameter
types.

I'm not quite sure about the column name, suggestions welcome.


I think this patch is sensible.

I see one issue: When you describe a prepared statement via the 
protocol, if a result field has a domain as its type, the RowDescription 
message sends the underlying base type, not the domain type directly 
(see SendRowDescriptionMessage()).  But it doesn't do that for the 
parameters (see exec_describe_statement_message()).  I don't know why 
that is; the protocol documentation doesn't mention it.  Might be worth 
looking into, and checking whether the analogous information contained 
in this view should be made consistent.

Re: replacing role-level NOINHERIT with a grant-level option

2022-07-01 Thread Joe Conway


On 6/30/22 22:58, Nathan Bossart wrote:

On Thu, Jun 30, 2022 at 10:21:53PM -0400, Robert Haas wrote:

On Thu, Jun 30, 2022 at 7:29 PM Nathan Bossart  wrote:

IIUC you are suggesting that we'd leave rolinherit in pg_authid alone, but
we'd add the ability to specify a grant-level option that would always take
precedence.  The default (WITH INHERIT DEFAULT) would cause things to work
exactly as they do today (i.e., use rolinherit).  Does this sound right?


Yeah, that could be an alternative to the patch I proposed previously.
What do you (and others) think of that idea?


I like it.  If rolinherit is left in place, existing pg_dumpall scripts
will continue to work, and folks can continue to use the role-level option
exactly as they do today.  However, we'd be adding the ability to use a
grant-level option if desired, and it would be relatively easy to reason
about (i.e., the grant-level option always takes precedence over the
role-level option).  Also, AFAICT this strategy still provides the full set
of behavior that would be possible if only the grant-level option existed.


Would this allow for an explicit REVOKE to override a default INHERIT 
along a specific path?


--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Data is copied twice when specifying both child and parent table in publication

2022-07-01 Thread wangw.f...@fujitsu.com

On Wed, May 18, 2022 4:51 PM I wrote:
> Attach the new patch.

Since there are some new commits in HEAD (0ff20288, fd0b9dc and 52b5c53) that
improve the functions pg_get_publication_tables and fetch_table_list, we cannot
apply the patch cleanly. Therefore, I rebased the patch based on the changes in
HEAD.

I also rebased the patch for REL14 because the commit 52d5ea9 in branch
REL_14_STABLE. BTW, I made a slight adjustments in the function
fetch_table_list to the SQL used to get the publisher-side table information
for version 14.

Since we have REL_15_STABLE now, I also attach the patch for version 15.

Attach the patches.

Regards,
Wang wei


HEAD_v6-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch
Description:  HEAD_v6-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch


REL15_v6-0001-Fix-data-replicated-twice-when-specifying-PUBLISH_patch
Description:  REL15_v6-0001-Fix-data-replicated-twice-when-specifying-PUBLISH_patch


REL14_v6-0001-Fix-data-replicated-twice-when-specifying-PUBLISH_patch
Description:  REL14_v6-0001-Fix-data-replicated-twice-when-specifying-PUBLISH_patch

Re: Perform streaming logical transactions by background workers and parallel apply

2022-07-01 Thread Amit Kapila

On Fri, Jul 1, 2022 at 12:13 PM Peter Smith  wrote:
>
> ==
>
> 1.2 doc/src/sgml/protocol.sgml - Protocol constants
>
> Previously I wrote that since there are protocol changes here,
> shouldn’t there also be some corresponding LOGICALREP_PROTO_XXX
> constants and special checking added in the worker.c?
>
> But you said [1 comment #6] you think it is OK because...
>
> IMO, I still disagree with the reply. The fact is that the protocol
> *has* been changed, so IIUC that is precisely the reason for having
> those protocol constants.
>
> e.g I am guessing you might assign the new one somewhere here:
> --
> server_version = walrcv_server_version(LogRepWorkerWalRcvConn);
> options.proto.logical.proto_version =
> server_version >= 15 ? LOGICALREP_PROTO_TWOPHASE_VERSION_NUM :
> server_version >= 14 ? LOGICALREP_PROTO_STREAM_VERSION_NUM :
> LOGICALREP_PROTO_VERSION_NUM;
> --
>
> And then later you would refer to this new protocol version (instead
> of the server version) when calling to the apply_handle_stream_abort
> function.
>
> ==
>

One point related to this that occurred to me is how it will behave if
the publisher is of version >=16 whereas the subscriber is of versions
<=15? Won't in that case publisher sends the new fields but
subscribers won't be reading those which may cause some problems.

> ==
>
> 1.5 src/backend/commands/subscriptioncmds.c
>
> + /*
> + * If no parameter given, assume "true" is meant.
> + */
>
> Previously I suggested an update for this comment, but it was rejected
> [1 comment #12] saying you wanted consistency with defGetBoolean.
>
> Sure, that is one point of view. Another one is that "two wrongs don't
> make a right". IIUC that comment as it currently stands is incorrect
> because in this case there *is* a parameter given - it is just the
> parameter *value* that is missing.
>

You have a point but if we see this function in the vicinity then the
proposed comment also makes sense.

-- 
With Regards,
Amit Kapila.

Re: Refactor construct_array() and deconstruct_array() for built-in types

2022-07-01 Thread Peter Eisentraut


On 02.05.22 16:48, Tom Lane wrote:

Peter Eisentraut  writes:

There are many calls to construct_array() and deconstruct_array() for
built-in types, for example, when dealing with system catalog columns.
These all hardcode the type attributes necessary to pass to these functions.



To simplify this a bit, add construct_array_builtin(),
deconstruct_array_builtin() as wrappers that centralize this hardcoded
knowledge.  This simplifies many call sites and reduces the amount of
hardcoded stuff that is spread around.



I also considered having genbki.pl generate lookup tables for these
hardcoded values, similar to schemapg.h, but that ultimately seemed
excessive.


+1 --- the added overhead of the switch statements is probably a
reasonable price to pay for the notational simplification and
bug-proofing.

One minor coding gripe is that compilers that don't know that elog(ERROR)
doesn't return will certainly generate "use of possibly-uninitialized
variable" complaints.  Suggest inserting "return NULL;" or similar into
the default: cases.  I'd also use more specific error wording to help
people find where they need to add code when they make use of a new type;
maybe like "type %u not supported by construct_array_builtin".


I have pushed this with the improvements you had suggested.

Re: Hardening PostgreSQL via (optional) ban on local file system access

2022-07-01 Thread Hannu Krosing

And thanks to Robert and Bruce for bringing up good points about
potential pitfalls!

I think we do have a good discussion going on here :)

---
Hannu

On Fri, Jul 1, 2022 at 11:14 AM Hannu Krosing  wrote:
>
> On Thu, Jun 30, 2022 at 7:25 PM Bruce Momjian  wrote:
> >
> > On Thu, Jun 30, 2022 at 11:52:20AM -0400, Robert Haas wrote:
> > > I don't think this would be very convenient in most scenarios,
>
> This is the eternal problem with security - more security always
> includes more inconvenience.
>
> Unlocking your door when coming home is more inconvenient than not
> unlocking it, and the least inconvenient thing would be not having a
> door at all.
> Imagine coming to your door with a heavy shopping bag in each hand -
> at this moment almost anyone would prefer the door not being there :)
>
> This one would be for cases where you want best multi-layer
> protections also against unknown threats and are ready to trade some
> convenience for security. Also it would not be that bad once you use
> automated deployment pipelines which just need an extra line to unlock
> superuser for deployment.
>
> > > and I
> > > think it would also be difficult to implement correctly. I don't think
> > > you can get by with just having superuser() return false sometimes
> > > despite pg_authid.rolsuper being true. There's a lot of subtle
> > > assumptions in the code to the effect that the properties of a session
> > > are basically stable unless some SQL is executed which changes things.
>
> A good barrier SQL for this could be SET ROLE=... .
> Maybe just have a mode where a superuser can not log in _or_ SET ROLE
> unless this is explicitly allowed in pg_superuser.conf
>
> > > I think if we start injecting hacks like this it may seem to work in
> > > light testing but we'll never get to the end of the bug reports.
>
> In this case it looks like each of these bug reports would mean an
> avoided security breach which for me looks preferable.
>
> Again, this would be all optional, opt-in, DBA-needs-to-set-it-up
> feature for the professionally paranoid and not something we suddenly
> force on people who would like to run all their databases using
> user=postgres database=postgres with trust specified in the
> pg_hba.conf "because the firewall takes care of security" :)
>
> > Yeah, seems it would have to be specified per-session, but how would you
> > specify a specific session before the session starts?
>
> One often recommended way to do superuser'y things in a secure
> production database is to have a non-privileged NOINHERIT user for
> logging in and then do
> SET ROLE=;
> when needed, similar to using su/sudo in shell. This practice both
> reduces the attack surface and also provides auditability by knowing
> who logged in for superuser work.
>
> In this case one could easily get the pid and do the needed extra
> setup before escalating privileges to superuser.
>
> ---
> Hannu

Re: Hardening PostgreSQL via (optional) ban on local file system access

2022-07-01 Thread Hannu Krosing

On Thu, Jun 30, 2022 at 7:25 PM Bruce Momjian  wrote:
>
> On Thu, Jun 30, 2022 at 11:52:20AM -0400, Robert Haas wrote:
> > I don't think this would be very convenient in most scenarios,

This is the eternal problem with security - more security always
includes more inconvenience.

Unlocking your door when coming home is more inconvenient than not
unlocking it, and the least inconvenient thing would be not having a
door at all.
Imagine coming to your door with a heavy shopping bag in each hand -
at this moment almost anyone would prefer the door not being there :)

This one would be for cases where you want best multi-layer
protections also against unknown threats and are ready to trade some
convenience for security. Also it would not be that bad once you use
automated deployment pipelines which just need an extra line to unlock
superuser for deployment.

> > and I
> > think it would also be difficult to implement correctly. I don't think
> > you can get by with just having superuser() return false sometimes
> > despite pg_authid.rolsuper being true. There's a lot of subtle
> > assumptions in the code to the effect that the properties of a session
> > are basically stable unless some SQL is executed which changes things.

A good barrier SQL for this could be SET ROLE=... .
Maybe just have a mode where a superuser can not log in _or_ SET ROLE
unless this is explicitly allowed in pg_superuser.conf

> > I think if we start injecting hacks like this it may seem to work in
> > light testing but we'll never get to the end of the bug reports.

In this case it looks like each of these bug reports would mean an
avoided security breach which for me looks preferable.

Again, this would be all optional, opt-in, DBA-needs-to-set-it-up
feature for the professionally paranoid and not something we suddenly
force on people who would like to run all their databases using
user=postgres database=postgres with trust specified in the
pg_hba.conf "because the firewall takes care of security" :)

> Yeah, seems it would have to be specified per-session, but how would you
> specify a specific session before the session starts?

One often recommended way to do superuser'y things in a secure
production database is to have a non-privileged NOINHERIT user for
logging in and then do
SET ROLE=;
when needed, similar to using su/sudo in shell. This practice both
reduces the attack surface and also provides auditability by knowing
who logged in for superuser work.

In this case one could easily get the pid and do the needed extra
setup before escalating privileges to superuser.

---
Hannu

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2022-07-01 Thread Lukas Fittl

On Fri, Jun 12, 2020 at 4:28 PM Andres Freund  wrote:

> The attached patches are really just a prototype. I'm also not really
> planning to work on getting this into a "production ready" patchset
> anytime soon. I developed it primarily because I found it the overhead
> made it too hard to nail down in which part of a query tree performance
> changed.  If somebody else wants to continue from here...
>
> I do think it's be a pretty significant improvement if we could reduce
> the timing overhead of EXPLAIN ANALYZE by this much. Even if requires a
> bunch of low-level code.
>

Based on an off-list conversation with Andres, I decided to dust off this
old
patch for using rdtsc directly. The significant EXPLAIN ANALYZE performance
improvements (especially when using rdtsc instead of rdtsc*p*) seem to
warrant
giving this a more thorough look.

See attached an updated patch (adding it to the July commitfest), with a few
changes:

- Keep using clock_gettime() as a fallback if we decide to not use rdtsc
- Fallback to /proc/cpuinfo for clock frequency, if cpuid(0x16) doesn't work
- The decision to use rdtsc (or not) is made at runtime, in the new
  INSTR_TIME_INITIALIZE() -- we can't make this decision at compile time
  because this is dependent on the specific CPU in use, amongst other things
- In an abundance of caution, for now I've decided to only enable this if we
  are on Linux/x86, and the current kernel clocksource is TSC (the kernel
has
  quite sophisticated logic around making this decision, see [1])

Note that if we implemented the decision logic ourselves (instead of relying
on the Linux kernel), I'd be most worried about older virtualization
technology. In my understanding getting this right is notably more
complicated
than just checking cpuid, see [2].

Known WIP problems with this patch version:

* There appears to be a timing discrepancy I haven't yet worked out, where
  the \timing data reported by psql doesn't match what EXPLAIN ANALYZE is
  reporting. With Andres' earlier test case, I'm seeing a consistent ~700ms
  higher for \timing than for the EXPLAIN ANALYZE time reported on the
server
  side, only when rdtsc measurement is used -- its likely there is a problem
  somewhere with how we perform the cycles to time conversion
* Possibly related, the floating point handling for the cycles_to_sec
variable
  is problematic in terms of precision (see FIXME, taken over from Andres'
POC)

Open questions from me:

1) Do we need to account for different TSC offsets on different CPUs in SMP
   systems? (the Linux kernel certainly has logic to that extent, but [3]
   suggests this is no longer a problem on Nehalem and newer chips, i.e.
those
   having an invariant TSC)

2) Should we have a setting "--with-tsc" for configure? (instead of always
   enabling it when on Linux/x86 with a TSC clocksource)

3) Are there cases where we actually want to use rdtsc*p*? (i.e. wait for
   current instructions to finish -- the prior discussion seemed to suggest
   we don't want it for node instruction measurements, but possibly we do
want
   this in other cases?)

4) Should we support using the "mrs" instruction on ARM? (which is similar
to
   rdtsc, see [4])

Thanks,
Lukas

[1] https://github.com/torvalds/linux/blob/master/arch/x86/kernel/tsc.c
[2] http://oliveryang.net/2015/09/pitfalls-of-TSC-usage/
[3] https://stackoverflow.com/a/11060619/1652607
[4] https://cpufun.substack.com/p/fun-with-timers-and-cpuid

-- 
Lukas Fittl

v2-0002-WIP-Use-cpu-reference-cycles-via-rdtsc-to-measure.patch
Description: Binary data

v2-0001-WIP-Change-instr_time-to-just-store-nanoseconds-t.patch
Description: Binary data

Re: Support for grabbing multiple consecutive values with nextval()

2022-07-01 Thread Jille Timmermans


On 2022-04-08 15:33, Greg Stark wrote:

On Sun, 27 Feb 2022 at 07:09, Jille Timmermans  wrote:

First time PostgreSQL contributor here :)


I wish I had noticed this patch during the CF. It seems like a nice
self-contained feature that could have been easily reviewed and
committed and it's always good to see first-time contributions.
Hopefully it'll get committed early in the next cycle.


If anyone is looking for a small patch to review, here's one for you :)

(https://commitfest.postgresql.org/38/3577/)

Re: Pluggable toaster

2022-07-01 Thread Aleksander Alekseev

Hi Nikita,

> Here is the patch set rebased onto current master (15 rel beta 2 with commit 
> from 29.06).

Thanks for the rebased patchset.

This is a huge effort though. I suggest splitting it into several CF
entries as we previously did with other patches (64-bit XIDs to name
one, which BTW is arguably much simpler, but still we had to split
it). This will simplify the review, limit the scope of the discussion
and simplify passing the CI. Cfbot is already not happy with the
patchset.

0001 - is already in a separate thread [1], that's good. I suggest
marking it in the patch description for clarity.
0002, 0003 - I suggest focusing on these two in this thread and keep
the rest of the changes for later discussion. Please submit 0004,
0005... next time, when we finish with 0001-0003.

The order of proposed changes IMO is wrong.

0002 should refactor the default TOASTer in a manner similar to a
pluggable one. Nothing should change from the user's perspective. If
you do benchmarks, I suggest not to reference the previous talks. I
familiarized myself with all the related talks linked before (took me
some time...) and found them useless for the discussion since they
don't provide exact steps to reproduce. Please provide exact scripts
and benchmarks results for 0002 in this thread.

0003 should add an interface that allows replacing the default TOASTer
with an alternative one. There is no need for contrib/dummy_toaster
similarly as there is no contrib/ for a dummy TableAM. The provided
example doesn't do much anyway since all the heavy lifting should be
done in the callbacks implementation. For test purposes please use
src/test/regress/.

[1]: 
https://www.postgresql.org/message-id/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru

-- 
Best regards,
Aleksander Alekseev

Re: Issue with pg_stat_subscription_stats

2022-07-01 Thread Masahiko Sawada

On Fri, Jul 1, 2022 at 3:01 PM Amit Kapila  wrote:
>
> On Fri, Jul 1, 2022 at 7:12 AM Masahiko Sawada  wrote:
> >
> > On Wed, Mar 16, 2022 at 11:34 PM Masahiko Sawada  
> > wrote:
> > >
> >
> > While looking at this issue again, I realized there seems to be two
> > problems with subscription stats on shmem stats:
> >
> > Firstly, we call pgstat_create_subscription() when creating a
> > subscription but the subscription stats are reported by apply workers.
> > And pgstat_create_subscription() just calls
> > pgstat_create_transactional():
> >
> > void
> > pgstat_create_subscription(Oid subid)
> > {
> > pgstat_create_transactional(PGSTAT_KIND_SUBSCRIPTION,
> > InvalidOid, subid);
> > }
> >
> > I guess calling pgstat_create_subscription() is not necessary for the
> > current usage. On the other hand, if we create the subscription stats
> > there we can resolve the issue Melanie reported in this thread.
> >
>
> It won't create the stats entry in the shared hash table, so the
> behavior should be the same as without shared stats.

Yes, my point is that it may be misleading that the subscription stats
are created when a subscription is created. The initial behavior is
technically the same for pg_stat_database. That is, we don't create
the stats entry for them when creating the object. But we don’t call
pgstat_create_transactional when creating a database (we don’t have a
function like pgstat_create_database()) whereas we do for subscription
stats.

On the other hand, I'm not sure we agreed that the behavior that
Melanie reported is not a problem. The user might get confused since
the subscription stats works differently than other stats when a
reset. Previously, the primary reason why I hesitated to create the
subscription stats when creating a subscription is that CREATE
SUBSCRIPTION (with create_slot = false) can be rolled back. But with
the shmem stats, we can easily resolve it by using
pgstat_create_transactional().

>
> > The second problem is that the following code in DropSubscription()
> > should be updated:
> >
> > /*
> >  * Tell the cumulative stats system that the subscription is getting
> >  * dropped. We can safely report dropping the subscription statistics 
> > here
> >  * if the subscription is associated with a replication slot since we
> >  * cannot run DROP SUBSCRIPTION inside a transaction block.  
> > Subscription
> >  * statistics will be removed later by (auto)vacuum either if it's not
> >  * associated with a replication slot or if the message for dropping the
> >  * subscription gets lost.
> >  */
> > if (slotname)
> > pgstat_drop_subscription(subid);
> >
> > I think we can call pgstat_drop_subscription() even if slotname is
> > NULL and need to update the comment.
> >
>
> +1.
>
> > IIUC autovacuum is no longer
> > responsible for garbage collection.
> >
>
> Right, this is my understanding as well.

Thank you for the confirmation.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Perform streaming logical transactions by background workers and parallel apply

2022-07-01 Thread Peter Smith

Below are some review comments for patches v14-0001, and v14-0002:


v14-0001


1.1 Commit message

For now, 'parallel' means the streaming will be applied
via a apply background worker if available. 'on' means the streaming
transaction will be spilled to disk.  By the way, we do not change the default
behaviour.

SUGGESTION (minor tweaks)
The parameter value 'parallel' means the streaming will be applied via
an apply background worker, if available. The parameter value 'on'
means the streaming transaction will be spilled to disk.  The default
value is 'off' (same as current behaviour).

==

1.2 doc/src/sgml/protocol.sgml - Protocol constants

Previously I wrote that since there are protocol changes here,
shouldn’t there also be some corresponding LOGICALREP_PROTO_XXX
constants and special checking added in the worker.c?

But you said [1 comment #6] you think it is OK because...

IMO, I still disagree with the reply. The fact is that the protocol
*has* been changed, so IIUC that is precisely the reason for having
those protocol constants.

e.g I am guessing you might assign the new one somewhere here:
--
server_version = walrcv_server_version(LogRepWorkerWalRcvConn);
options.proto.logical.proto_version =
server_version >= 15 ? LOGICALREP_PROTO_TWOPHASE_VERSION_NUM :
server_version >= 14 ? LOGICALREP_PROTO_STREAM_VERSION_NUM :
LOGICALREP_PROTO_VERSION_NUM;
--

And then later you would refer to this new protocol version (instead
of the server version) when calling to the apply_handle_stream_abort
function.

==

1.3 doc/src/sgml/ref/create_subscription.sgml

+ 
+  If set to on, the changes of transaction are
+  written to temporary files and then applied at once after the
+  transaction is committed on the publisher.
+ 

Previously I suggested changing some text but it was rejected [1
comment #8] because you said there may be *multiple*  files, not just
one. That is fair enough, but there were some other changes to that
suggested text unrelated to the number of files.

SUGGESTION #2
If set to on, the incoming changes are written to temporary files and
then applied only after the transaction is committed on the publisher.

~~~

1.4

+ 
+  If set to parallel, incoming changes are directly
+  applied via one of the apply background workers, if available. If no
+  background worker is free to handle streaming transaction then the
+  changes are written to a file and applied after the transaction is
+  committed. Note that if an error happens when applying changes in a
+  background worker, the finish LSN of the remote transaction might
+  not be reported in the server log.
  

Should this also say "written to temporary files" instead of "written
to a file"?

==

1.5 src/backend/commands/subscriptioncmds.c

+ /*
+ * If no parameter given, assume "true" is meant.
+ */

Previously I suggested an update for this comment, but it was rejected
[1 comment #12] saying you wanted consistency with defGetBoolean.

Sure, that is one point of view. Another one is that "two wrongs don't
make a right". IIUC that comment as it currently stands is incorrect
because in this case there *is* a parameter given - it is just the
parameter *value* that is missing. Maybe see what other people think?

==

1.6. src/backend/replication/logical/Makefile

It seems to me like these files were intended to be listed in
alphabetical order, so you should move this new file accordingly.

==

1.7 .../replication/logical/applybgworker.c

+/* queue size of DSM, 16 MB for now. */
+#define DSM_QUEUE_SIZE 16000

The comment should start uppercase.

~~~

1.8 .../replication/logical/applybgworker.c - apply_bgworker_can_start

Maybe this is just my opinion but it sounds a bit strange to over-use
"we" in all the comments.

1.8.a
+/*
+ * Confirm if we can try to start a new apply background worker.
+ */
+static bool
+apply_bgworker_can_start(TransactionId xid)

SUGGESTION
Check if starting a new apply background worker is allowed.

1.8.b
+ /*
+ * We don't start new background worker if we are not in streaming parallel
+ * mode.
+ */

SUGGESTION
Don't start a new background worker if not in streaming parallel mode.

1.8.c
+ /*
+ * We don't start new background worker if user has set skiplsn as it's
+ * possible that user want to skip the streaming transaction. For
+ * streaming transaction, we need to spill the transaction to disk so that
+ * we can get the last LSN of the transaction to judge whether to skip
+ * before starting to apply the change.
+ */

SUGGESTION
Don't start a new background worker if...

~~~

1.9 .../replication/logical/applybgworker.c - apply_bgworker_start

+/*
+ * Try to start worker inside ApplyWorkersHash for requested xid.
+ */
+ApplyBgworkerState *
+apply_bgworker_start(TransactionId xid)

The comment seems not quite right.

SUGGESTION
Try to

Re: Backup command and functions can cause assertion failure and segmentation fault

2022-07-01 Thread Michael Paquier

On Fri, Jul 01, 2022 at 03:32:50PM +0900, Fujii Masao wrote:
> Sounds good idea to me. I updated the patch in that way. Attached.

Skimming quickly through the thread, this failure requires a
termination of a backend running BASE_BACKUP.  This is basically
something done by the TAP test added in 0475a97f with a WAL sender
killed, and MAX_RATE being used to make sure that we have enough time
to kill the WAL sender even on fast machines.  So you could add a
regression test, no?
--
Michael


signature.asc
Description: PGP signature

1 2 >

1 - 100 of 105 matches

Mail list logo