Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-09 Thread Alexander Korotkov
On Tue, Mar 8, 2022 at 2:05 AM Alexander Korotkov  wrote:
> On Sun, Mar 6, 2022 at 8:28 PM Tomas Vondra
>  wrote:
> > On 3/6/22 08:09, Alexander Korotkov wrote:
> > > Sorry for this terrible oversight by me.
> > >
> > > On Sat, Mar 5, 2022 at 10:13 AM Tomas Vondra
> > >  wrote:
> > >> On 3/4/22 23:09, Nikita Glukhov wrote:
> > >>> On 04.03.2022 23:28, Tom Lane wrote:
> > >>>
> >  Tomas Vondra  writes:
> > > On 3/4/22 20:29, Nikita Glukhov wrote:
> > >> So, we probably have corrupted indexes that were updated since such
> > >> "incomplete" upgrade of ltree.
> > > IIRC pg_upgrade is not expected to upgrade extensions - it keeps the
> > > installed version of the extension, and that's intentional.
> >  Yeah, exactly.  But this opens up an additional consideration we
> >  have to account for: whatever we do needs to work with either 1.1
> >  or 1.2 SQL-level versions of the extension.
> > 
> >   regards, tom lane
> > >>>
> > >>> It becomes clear that ltree upgrade 1.1 => 1.2 is broken, the problem
> > >>> is not so much related to PG12 => PG13+ upgrades.
> > >
> > > So, it seems that ltree 1.1 in PG13+ is incompatible with ltree on
> > > PG12 and ltree 1.2 on PG13+.  And there are many scenarios involving.
> > >
> > > It seems too difficult to identify all the broken cases in the release
> > > notes.  What about applying a patch and asking all ltree users to
> > > reindex their indexes?
> > >
> >
> > Yeah. I think this is getting so complicated that there's little chance
> > we'd be able to clearly explain when to reindex.
>
> Good.  The revised patch is attached.  Instead of adding argument to
> LTREE_GET_ASIGLEN(), it introduces separate LTREE_GET_SIGLEN() and
> LTREE_GET_ASIGLEN() macros.

No feedback yet.  I'm going to push this if no objections.

--
Regards,
Alexander Korotkov




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-09 Thread Amit Kapila
On Wed, Mar 9, 2022 at 7:57 PM osumi.takami...@fujitsu.com
 wrote:
>
> Hi, attached v32 removed my additional code for maybe_reread_subscription.
>

Thanks, the patch looks good to me. I have made minor edits in the
attached. I am planning to commit this early next week (Monday) unless
there are any other major comments.

-- 
With Regards,
Amit Kapila.


v33-0001-Optionally-disable-subscriptions-on-error.patch
Description: Binary data


Re: [Proposal] vacuumdb --schema only

2022-03-09 Thread Gilles Darold
Le 09/03/2022 à 22:10, Justin Pryzby a écrit :
> On Mon, Mar 07, 2022 at 08:38:04AM +0100, Gilles Darold wrote:
>>> Maybe it's clearer to write this with =ANY() / != ALL() ?
>>> See 002.
>> I have applied your changes and produced a new version v3 of the patch,
>> thanks for the improvements. The patch have been added to commitfest
>> interface, see here https://commitfest.postgresql.org/38/3587/
> I wondered whether my patches were improvements, and it occurred to me that
> your patch didn't fail if the specified schema didn't exist.  That's arguably
> preferable, but that's the pre-existing behavior for tables.  So I think the
> behavior of my patch is more consistent.

+1

-- 
Gilles Darold


RE: Skipping logical replication transactions on subscriber side

2022-03-09 Thread osumi.takami...@fujitsu.com
On Wednesday, March 2, 2022 12:01 AM Masahiko Sawada  
wrote:
> I've attached an updated patch along with two patches for cfbot tests since 
> the
> main patch (0003) depends on the other two patches. Both
> 0001 and 0002 patches are the same ones I attached on another thread[2].
Hi, few comments on 
v12-0003-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch.


(1) doc/src/sgml/ref/alter_subscription.sgml


+SKIP ( skip_option = valuepg_subscription.subskiplsn)
+  is cleared.  See  for
+  the details of logical replication conflicts.
+ 
...
+lsn (pg_lsn)
+
+ 
+  Specifies the commit LSN of the remote transaction whose changes are 
to be skipped
+  by the logical replication worker.  Skipping
+  individual subtransactions is not supported.  Setting 
NONE
+  resets the LSN.


I think we'll extend the SKIP option choices in the future besides the 'lsn' 
option.
Then, one sentence "After logical replication successfully skips the 
transaction or commits non-empty
transaction, the LSN .. is cleared" should be moved to the explanation for 
'lsn' section,
if we think this behavior to reset LSN is unique for 'lsn' option ?


(2) doc/src/sgml/catalogs.sgml

+ 
+  
+   subskiplsn pg_lsn
+  
+  
+   Commit LSN of the transaction whose changes are to be skipped, if a 
valid
+   LSN; otherwise 0/0.
+  
+ 
+

We need to cover the PREPARE that keeps causing errors on the subscriber.
This would apply to the entire patch (e.g. the rename of skip_xact_commit_lsn)

(3) apply_handle_commit_internal comments

 /*
  * Helper function for apply_handle_commit and apply_handle_stream_commit.
+ * Return true if the transaction was committed, otherwise return false.
  */

If we want to make the new added line alinged with other functions in worker.c,
we should insert one blank line before it ?


(4) apply_worker_post_transaction

I'm not sure if the current refactoring is good or not.
For example, the current HEAD calls pgstat_report_stat(false)
for a commit case if we are in a transaction in apply_handle_commit_internal.
On the other hand, your refactoring calls pgstat_report_stat unconditionally
for apply_handle_commit path. I'm not sure if there
are many cases to call apply_handle_commit without opening a transaction,
but is that acceptable ?

Also, the name is a bit broad.
How about making a function only for stopping and resetting LSN at this stage ?


(5) comments for clear_subscription_skip_lsn

How about changing the comment like below  ?

From:
Clear subskiplsn of pg_subscription catalog
To:
Clear subskiplsn of pg_subscription catalog with origin state update


Best Regards,
Takamichi Osumi



Re: Adding CI to our tree

2022-03-09 Thread Thomas Munro
On Thu, Mar 10, 2022 at 4:33 PM Andres Freund  wrote:
> On 2022-03-10 15:43:16 +1300, Thomas Munro wrote:
> > I'm confused.
>
> The "terrible IO wait" thing was before we reduced the number of CPUs and
> concurrent jobs. It makes sense to me that with just two CPUs we're CPU bound,
> in which case -Og obviously can make a difference.

Oh, duh, yeah, that makes sense when you put it that way and
considering the CPU graph:

-O0: https://cirrus-ci.com/task/4578631912521728
-Og: https://cirrus-ci.com/task/5239486182326272




Re: Adding CI to our tree

2022-03-09 Thread Andres Freund
Hi,

On 2022-03-10 15:43:16 +1300, Thomas Munro wrote:
> Wow, I see the effect on Cirrus -- test_world ran in 8:55 instead of
> 12:43 when I tried (terrible absolute times, but fantastic
> improvement!).  Hmm, on my local FreeBSD 13 box I saw 5:07 -> 5:03
> with this change.  My working theory had been that there is something
> bad happening in the I/O stack under concurrency making FreeBSD on
> Cirrus/GCP very slow (ie patterns to stall on slow cloud I/O waits,
> something I hope to dig into when post-freeze round tuits present
> themselves), but that doesn't gel with this huge improvement from
> noodling with optimiser details, and I don't know why I don't see
> something similar locally.  I'm confused.

The "terrible IO wait" thing was before we reduced the number of CPUs and
concurrent jobs. It makes sense to me that with just two CPUs we're CPU bound,
in which case -Og obviously can make a difference.


> Just BTW it's kinda funny that we say -ggdb for macOS and then we use
> lldb to debug cores in cores_backtrace.sh.  I suppose it would be more
> correct to say -glldb, but doubt it matters much...

Yea. I used -ggdb because I didn't know -glldb existed :). And there's also
the advantage that -ggdb works both with gcc and clang, whereas -glldb doesn't
seem to be known to gcc.

Greetings,

Andres Freund




Re: Column Filtering in Logical Replication

2022-03-09 Thread Amit Kapila
On Wed, Mar 9, 2022 at 3:33 PM Amit Kapila  wrote:
>
> On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
>  wrote:
>
> > OK, I reworked this to do the same thing as the row filtering patch.
> >
>
> Thanks, I'll check this.
>

Some assorted comments:
=
1. We don't need to send a column list for the old tuple in case of an
update (similar to delete). It is not required to apply a column
filter for those cases because we ensure that RI must be part of the
column list for updates and deletes.
2.
+ /*
+ * Check if all columns referenced in the column filter are part of
+ * the REPLICA IDENTITY index or not.

I think this comment is reverse. The rule we follow here is that
attributes that are part of RI must be there in a specified column
list. This is used at two places in the patch.
3. get_rel_sync_entry()
{
/* XXX is there a danger of memory leak here? beware */
+ oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+ for (int i = 0; i < nelems; i++)
...
}

Similar to the row filter, I think we need to use
entry->cache_expr_cxt to allocate this. There are other usages of
CacheMemoryContext in this part of the code but I think those need to
be also changed and we can do that as a separate patch. If we do the
suggested change then we don't need to separately free columns.
4. I think we don't need the DDL changes in AtExecDropColumn. Instead,
we can change the dependency of columns to NORMAL during publication
commands.
5. There is a reference to check_publication_columns but that function
is removed from the patch.
6.
/*
* If we know everything is replicated and the row filter is invalid
* for update and delete, there is no point to check for other
* publications.
*/
if (pubdesc->pubactions.pubinsert && pubdesc->pubactions.pubupdate &&
pubdesc->pubactions.pubdelete && pubdesc->pubactions.pubtruncate &&
!pubdesc->rf_valid_for_update && !pubdesc->rf_valid_for_delete)
break;

/*
* If we know everything is replicated and the column filter is invalid
* for update and delete, there is no point to check for other
* publications.
*/
if (pubdesc->pubactions.pubinsert && pubdesc->pubactions.pubupdate &&
pubdesc->pubactions.pubdelete && pubdesc->pubactions.pubtruncate &&
!pubdesc->cf_valid_for_update && !pubdesc->cf_valid_for_delete)
break;

Can we combine these two checks?

I feel this patch needs a more thorough review.

-- 
With Regards,
Amit Kapila.




Re: Adding CI to our tree

2022-03-09 Thread Thomas Munro
On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby  wrote:
> On Wed, Mar 09, 2022 at 10:12:54AM -0800, Andres Freund wrote:
> > On 2022-03-09 11:47:23 -0600, Justin Pryzby wrote:
> > > I'm curious what you think of this patch.
> > >
> > > It makes check-world on freebsd over 30% faster - saving 5min.
> >
> > That's nice! While -Og makes interactive debugging noticeably harder IME, 
> > it's
> > not likely to be a large enough difference just for backtraces etc.
>
> Yeah.  gcc(1) claims that -Og can improve debugging.

Wow, I see the effect on Cirrus -- test_world ran in 8:55 instead of
12:43 when I tried (terrible absolute times, but fantastic
improvement!).  Hmm, on my local FreeBSD 13 box I saw 5:07 -> 5:03
with this change.  My working theory had been that there is something
bad happening in the I/O stack under concurrency making FreeBSD on
Cirrus/GCP very slow (ie patterns to stall on slow cloud I/O waits,
something I hope to dig into when post-freeze round tuits present
themselves), but that doesn't gel with this huge improvement from
noodling with optimiser details, and I don't know why I don't see
something similar locally.  I'm confused.

Just BTW it's kinda funny that we say -ggdb for macOS and then we use
lldb to debug cores in cores_backtrace.sh.  I suppose it would be more
correct to say -glldb, but doubt it matters much...




RE: Data is copied twice when specifying both child and parent table in publication

2022-03-09 Thread houzj.f...@fujitsu.com
Hi,

When reviewing some logical replication related features. I noticed another
possible problem if the subscriber subscribes multiple publications which
publish parent and child table.

For example:

pub
create table t (a int, b int, c int) partition by range (a);
create table t_1 partition of t for values from (1) to (10);

create publication pub1 for table t
  with (PUBLISH_VIA_PARTITION_ROOT);
create publication pub2 for table t_1
  with (PUBLISH_VIA_PARTITION_ROOT);

sub
 prepare table t and t_1
CREATE SUBSCRIPTION sub CONNECTION 'port=1 dbname=postgres' PUBLICATION 
pub1, pub2;

select * from pg_subscription_rel ;
 srsubid | srrelid | srsubstate | srsublsn
-+-++---
   16391 |   16385(t) | r  | 0/150D100
   16391 |   16388(t_1) | r  | 0/150D138

If subscribe two publications one of them publish parent table with
(pubviaroot=true) and another publish child table. Both the parent table and
child table will exist in pg_subscription_rel which also means we will do
initial copy for both tables.

But after initial copy, we only publish change with the schema of the parent
table(t). It looks a bit inconsistent.

Based on the document of PUBLISH_VIA_PARTITION_ROOT option. I think the
expected behavior could be we only store the top most parent(table t) in
pg_subscription_rel and do initial copy for it if pubviaroot is on. I haven't
thought about how to fix this and will investigate this later.

Best regards,
Hou zj


Re: Add index scan progress to pg_stat_progress_vacuum

2022-03-09 Thread Imseih (AWS), Sami
> >   BTreeShmemInit();
> >   SyncScanShmemInit();
> >   AsyncShmemInit();
> >   +   vacuum_worker_init();

> >   Don't we also need to add the size of the hash table to
> >   CalculateShmemSize()?

> No, ShmemInitHash takes the min and max size of the hash and in turn calls 
> ShmemInitStruct to setup the shared memory.

Sorry,  I am wrong here. The size needs to be accounted for at startup. 

 --
 Sami Imseih
 Amazon Web Services




Re: Add index scan progress to pg_stat_progress_vacuum

2022-03-09 Thread Imseih (AWS), Sami
>I took a look at the latest patch set.

>+  
>+   indexes_total bigint
>+  
>+  
>+   The number of indexes to be processed in the
>+   vacuuming indexes
>+   or cleaning up indexes phase. It is set to
>+   0 when vacuum is not in any of these phases.
>+  

>Could we avoid resetting it to 0 unless INDEX_CLEANUP was turned off or
>failsafe kicked in?  It might be nice to know how many indexes the vacuum
>intends to process.  I don't feel too strongly about this, so if it would
>add a lot of complexity, it might be okay as is.

Your suggestion is valid. On INDEX_CLEANUP it is set to 0 from the start and 
when failsafe kicks in it will be reset to 0. I Will remove the reset call for 
the common index vacuum path. 

 >   BTreeShmemInit();
 >   SyncScanShmemInit();
 >   AsyncShmemInit();
 >   +   vacuum_worker_init();

 >   Don't we also need to add the size of the hash table to
 >   CalculateShmemSize()?

No, ShmemInitHash takes the min and max size of the hash and in turn calls 
ShmemInitStruct to setup the shared memory.

>+ * A command type can optionally define a callback function
>+ * which will derive Datum values rather than use values
>+ * directly from the backends progress array.

>I think this can be removed.

Good catch.

--
Sami Imseih
Amazon Web Services



Re: Add index scan progress to pg_stat_progress_vacuum

2022-03-09 Thread Nathan Bossart
I took a look at the latest patch set.

+  
+   indexes_total bigint
+  
+  
+   The number of indexes to be processed in the 
+   vacuuming indexes 
+   or cleaning up indexes phase. It is set to
+   0 when vacuum is not in any of these phases.
+  

Could we avoid resetting it to 0 unless INDEX_CLEANUP was turned off or
failsafe kicked in?  It might be nice to know how many indexes the vacuum
intends to process.  I don't feel too strongly about this, so if it would
add a lot of complexity, it might be okay as is.

BTreeShmemInit();
SyncScanShmemInit();
AsyncShmemInit();
+   vacuum_worker_init();

Don't we also need to add the size of the hash table to
CalculateShmemSize()?

+ * A command type can optionally define a callback function
+ * which will derive Datum values rather than use values
+ * directly from the backends progress array.

I think this can be removed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: support for MERGE

2022-03-09 Thread Zhihong Yu
On Wed, Mar 9, 2022 at 9:38 AM Alvaro Herrera 
wrote:

> I attach MERGE v14.  This includes a fix from Amit Langote for the
> problem I described previously, with EvalPlanQual not working correctly.
> (I had failed to short-circuit the cross-partition update correctly.)
> Now the test case is enabled again, and it passes with the correct
> output.
>
> 0001 is as before; the only change is that I've added a commit message.
> Since this just moves some code around, I intend to get it pushed very
> soon.
>
> 0002 is the ExecUpdate/ExecDelete split.  I cleaned up the struct
> definitions a bit more, but it's pretty much as in the previous version.
>
> 0003 is the actual MERGE implementation.  Many previous review comments
> have been handled, and several other things are cleaner than before.
> I haven't made any changes for work_mem handling in ModifyTable's
> transition tables, though, and haven't attempted to address concurrent
> INSERT.
>
> --
> Álvaro Herrera   39°49'30"S 73°17'W  —
> https://www.EnterpriseDB.com/


Hi,
For v14-0002-Split-ExecUpdate-and-ExecDelete-in-reusable-piec.patch :

+   TupleTableSlot *insertProjectedTuple;

With `insert` at the beginning of the variable name, someone may think it
is a verb. How about naming the variable projectedTupleFromInsert (or
something similar) ?

+   if (!ExecBRDeleteTriggers(context->estate, context->epqstate,
+ resultRelInfo, tupleid, oldtuple,
+ epqreturnslot))
+   /* some trigger made the delete a no-op; let caller know */
+   return false;

It seems you can directly return the value returned
from ExecBRDeleteTriggers().

+   if (!ExecBRUpdateTriggers(context->estate, context->epqstate,
+ resultRelInfo, tupleid, oldtuple, slot))
+   /* some trigger made the update a no-op; let caller know */
+   return false;

Similar comment as above (the comment is good and should be kept).

Cheers


Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-09 Thread David Steele

On 3/9/22 18:06, Nathan Bossart wrote:

On Wed, Mar 09, 2022 at 06:11:19PM -0500, Chapman Flack wrote:

I think the listitem

   In the same connection as before, issue the command:
   SELECT * FROM pg_backup_stop(true);

would be clearer if it used named-parameter form, wait_for_archive => true.

This is not strictly necessary, of course, for a function with a single
IN parameter, but it's good documentation (and also could save us headaches
like these if there is ever another future need to give it more parameters).

That listitem doesn't say anything about what the parameter means, which
is a little weird, but probably ok because the next listitem does go into
it in some detail. I don't think a larger reorg is needed to bring that
language one listitem earlier. Just naming the parameter is probably
enough to make it less puzzling (or adding in that listitem, at most,
"the effect of the wait_for_archive parameter is explained below").

For consistency (and the same futureproofing benefit), I'd go to
fast => false in the earlier pg_backup_start as well.

I'm more ambivalent about label => 'label'. It would be consistent,
but should we just agree for conciseness that there will always be
a label and it will always be first?

You can pretty much tell in a call what's a label; it's those anonymous
trues and falses that are easier to read with named notation.


Done.  I went ahead and added "label => 'label'" for consistency.


+1 from me.

Regards,
-David





Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-09 Thread Nathan Bossart
On Wed, Mar 09, 2022 at 06:11:19PM -0500, Chapman Flack wrote:
> I think the listitem
> 
>   In the same connection as before, issue the command:
>   SELECT * FROM pg_backup_stop(true);
> 
> would be clearer if it used named-parameter form, wait_for_archive => true.
> 
> This is not strictly necessary, of course, for a function with a single
> IN parameter, but it's good documentation (and also could save us headaches
> like these if there is ever another future need to give it more parameters).
> 
> That listitem doesn't say anything about what the parameter means, which
> is a little weird, but probably ok because the next listitem does go into
> it in some detail. I don't think a larger reorg is needed to bring that
> language one listitem earlier. Just naming the parameter is probably
> enough to make it less puzzling (or adding in that listitem, at most,
> "the effect of the wait_for_archive parameter is explained below").
> 
> For consistency (and the same futureproofing benefit), I'd go to
> fast => false in the earlier pg_backup_start as well.
> 
> I'm more ambivalent about label => 'label'. It would be consistent,
> but should we just agree for conciseness that there will always be
> a label and it will always be first?
> 
> You can pretty much tell in a call what's a label; it's those anonymous
> trues and falses that are easier to read with named notation.

Done.  I went ahead and added "label => 'label'" for consistency.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 904d5b9f77262fe0ea740de86ca41c7dbaf210ef Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 1 Dec 2021 23:50:49 +
Subject: [PATCH v6 1/1] remove exclusive backup mode

---
 doc/src/sgml/backup.sgml  | 222 +---
 doc/src/sgml/func.sgml|  99 +---
 doc/src/sgml/high-availability.sgml   |   6 +-
 doc/src/sgml/monitoring.sgml  |   4 +-
 doc/src/sgml/ref/pgupgrade.sgml   |   2 +-
 src/backend/access/transam/xlog.c | 493 ++
 src/backend/access/transam/xlogfuncs.c| 253 ++---
 src/backend/access/transam/xlogrecovery.c |   2 +-
 src/backend/catalog/system_functions.sql  |  18 +-
 src/backend/postmaster/postmaster.c   |  46 +-
 src/backend/replication/basebackup.c  |  20 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl  |   4 +
 src/bin/pg_ctl/pg_ctl.c   |   4 +-
 src/bin/pg_rewind/filemap.c   |   6 +-
 src/include/access/xlog.h |   7 +-
 src/include/catalog/pg_control.h  |   2 +-
 src/include/catalog/pg_proc.dat   |  28 +-
 src/include/miscadmin.h   |   4 -
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  56 +-
 .../t/010_logical_decoding_timelines.pl   |   4 +-
 20 files changed, 189 insertions(+), 1091 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0d69851bb1..c8b914c1aa 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -857,16 +857,8 @@ test ! -f /mnt/server/archivedir/000100A90065  cp pg_wal/0
 sequence, and that the success of a step is verified before
 proceeding to the next step.

-   
-Low level base backups can be made in a non-exclusive or an exclusive
-way. The non-exclusive method is recommended and the exclusive one is
-deprecated and will eventually be removed.
-   
-
-   
-Making a Non-Exclusive Low-Level Backup
 
- A non-exclusive low level backup is one that allows other
+ A low level backup allows other
  concurrent backups to be running (both those started using
  the same backup API and those started using
  ).
@@ -881,19 +873,19 @@ test ! -f /mnt/server/archivedir/000100A90065  cp pg_wal/0

 
  Connect to the server (it does not matter which database) as a user with
- rights to run pg_start_backup (superuser, or a user who has been granted
+ rights to run pg_backup_start (superuser, or a user who has been granted
  EXECUTE on the function) and issue the command:
 
-SELECT pg_start_backup('label', false, false);
+SELECT pg_backup_start(label => 'label', fast => false);
 
  where label is any string you want to use to uniquely
  identify this backup operation. The connection
- calling pg_start_backup must be maintained until the end of
+ calling pg_backup_start must be maintained until the end of
  the backup, or the backup will be automatically aborted.
 
 
 
- By default, pg_start_backup can take a long time to finish.
+ By default, pg_backup_start can take a long time to finish.
  This is because it performs a checkpoint, and the I/O
  required for the checkpoint will be spread out over a significant
  period of time, by default half your inter-checkpoint interval
@@ -905,10 +897,6 @@ SELECT pg_start_backup('label', false, false);

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-09 Thread Chapman Flack
On 03/09/22 17:21, Nathan Bossart wrote:
> Great.  Is there any additional feedback on this patch?  Should we add an
> example of using pg_basebackup in the "Standalone Hot Backups" section, or
> should we leave all documentation additions like this for Chap's new
> thread?

I'm composing something longer for the new thread, but on the way
I noticed something we might fit into this one.

I think the listitem

  In the same connection as before, issue the command:
  SELECT * FROM pg_backup_stop(true);

would be clearer if it used named-parameter form, wait_for_archive => true.

This is not strictly necessary, of course, for a function with a single
IN parameter, but it's good documentation (and also could save us headaches
like these if there is ever another future need to give it more parameters).

That listitem doesn't say anything about what the parameter means, which
is a little weird, but probably ok because the next listitem does go into
it in some detail. I don't think a larger reorg is needed to bring that
language one listitem earlier. Just naming the parameter is probably
enough to make it less puzzling (or adding in that listitem, at most,
"the effect of the wait_for_archive parameter is explained below").

For consistency (and the same futureproofing benefit), I'd go to
fast => false in the earlier pg_backup_start as well.

I'm more ambivalent about label => 'label'. It would be consistent,
but should we just agree for conciseness that there will always be
a label and it will always be first?

You can pretty much tell in a call what's a label; it's those anonymous
trues and falses that are easier to read with named notation.

Regards,
-Chap




Re: Add index scan progress to pg_stat_progress_vacuum

2022-03-09 Thread Imseih (AWS), Sami
>I think if it's a better approach we can do that including adding a
>new infrastructure for it.

+1 This is a beneficial idea, especially to other progress reporting, but I see 
this as a separate thread targeting the next major version.





Re: role self-revocation

2022-03-09 Thread Tom Lane
"David G. Johnston"  writes:
> So CREATE ROLE will assign ownership of AND membership in the newly created
> role to the session_user

I would NOT have it automatically assign membership in the new role,
even though the SQL spec says so.  We've not done that historically
and it doesn't seem desirable.  In particular, it's *really* not
desirable for a user (role with LOGIN).

> I'm fine with this.  It does introduce an OWNER concept to roles and so at
> minimum we need to add:
> ALTER ROLE foo OWNER TO { new_owner | CURRENT_ROLE | CURRENT_USER |
> SESSION_USER }

Agreed.

regards, tom lane




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-09 Thread Nathan Bossart
On Wed, Mar 09, 2022 at 02:32:24PM -0500, Chapman Flack wrote:
> On 03/09/22 12:19, Stephen Frost wrote:
>> Let's avoid hijacking this thread, which is about this
>> patch, for an independent debate about what our documentation should or
>> shouldn't include.
> 
> Agreed. New thread here:
> 
> https://www.postgresql.org/message-id/6228FFE4.3050309%40anastigmatix.net

Great.  Is there any additional feedback on this patch?  Should we add an
example of using pg_basebackup in the "Standalone Hot Backups" section, or
should we leave all documentation additions like this for Chap's new
thread?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: parse/analyze API refactoring

2022-03-09 Thread Nathan Bossart
On Wed, Mar 09, 2022 at 11:35:32AM +0100, Peter Eisentraut wrote:
> I have committed my original patches.  I'll leave the above-mentioned topic
> as ideas for the future.

Sounds good.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: role self-revocation

2022-03-09 Thread David G. Johnston
On Wed, Mar 9, 2022 at 2:31 PM Tom Lane  wrote:

> Robert Haas  writes:
> > Well, the problem is that as far as I can see, the admin option is an
> > optional feature of membership. You can grant someone membership
> > without admin option, or with admin option, but you can't grant them
> > the admin option without membership, just like you can't purchase an
> > upgrade to first class without the underlying plane ticket. What would
> > the syntax look even like for this? GRANT foo TO bar WITH ADMIN OPTION
> > BUT WITHOUT MEMBERSHIP? Yikes.
>
> I don't think we need syntax to describe it.  As I just said in my
> other reply, we have a perfectly good precedent for this already
> in ordinary object permissions.  That is: an object owner always,
> implicitly, has GRANT OPTION for all the object's privileges, even
> if she revoked the corresponding plain privilege from herself.
>

So CREATE ROLE will assign ownership of AND membership in the newly created
role to the session_user UNLESS the OWNER clause is present in which case
the named role, so long as the session_user can SET ROLE to the named role,
becomes the owner & member.  Subsequent to that the owner can issue: REVOKE
new_role FROM role_name where role_name is again the session_user role or
one that can be SET ROLE to.


> Yeah, this does mean that we're effectively deciding that the creator
> of a role is its owner.  What's the problem with that?
>

I'm fine with this.  It does introduce an OWNER concept to roles and so at
minimum we need to add:

ALTER ROLE foo OWNER TO { new_owner | CURRENT_ROLE | CURRENT_USER |
SESSION_USER }

And similar for CREATE ROLE
And keep the USER alias commands in sync.
GROUP commands are only present for backward compatibility and so don't get
updated with new features by design.

Obviously a superuser can change ownership.

Playing with table ownership I find this behavior:
-- superuser
CREATE ROLE tblowner;
CREATE TABLE tblowner_test (id serial primary key);
ALTER TABLE tblowner_test OWNER TO tblowner;

CREATE ROLE boss;
GRANT boss TO tblowner;

SET SESSION AUTHORIZATION tblowner;
ALTER TABLE tblowner_test OWNER TO boss; --works

So tblowner can push their ownership attribute to any group they are a
member of.  Is that the behavior we want for roles as well?

David J.


Re: Commitfest 2022-03 One Week in. 3 Commits 213 Patches Remaining

2022-03-09 Thread Greg Stark
On Wed, 9 Mar 2022 at 16:46, Greg Stark  wrote:
> Many of them seem to mostly have gotten
> feedback from committers already and the type of feedback that leads
> me to think it's ready for commit.

Er. I meant *not* the type of feedback that leads me to think it's
ready for commit. I mostly see patches with early design feedback or
even feedback questioning whether they're good ideas at all.

-- 
greg




Re: Commitfest 2022-03 One Week in. 3 Commits 213 Patches Remaining

2022-03-09 Thread Greg Stark
On Wed, 9 Mar 2022 at 15:44, David Steele  wrote:
>
> On 3/9/22 13:38, Greg Stark wrote:
> Should I do a round-robin style assignment for any of these?
>
> I don't think this is a good idea. Committers pick the patches they are
> going to commit.
>
> What prefer to do is bump any committers that have been involved in a
> patch thread to see if they are willing to commit it.

Well yes, I suppose that's probably what I had in mind despite calling it that.

But I've been skimming the set of "Ready for Committer" patches and
I'm a bit down on them. Many of them seem to mostly have gotten
feedback from committers already and the type of feedback that leads
me to think it's ready for commit.

I suppose people mark patches "Ready for Committer" when the level of
feedback they require is more in depth or more design feedback that
they think requires a committer even if it's not ready for commit.

So I'm going to go through the patches and ask the committers who have
already commented if they think the patch is on track to be committed
this release or should be pushed to the next commitfest.

-- 
greg




Re: role self-revocation

2022-03-09 Thread Tom Lane
Robert Haas  writes:
> Well, the problem is that as far as I can see, the admin option is an
> optional feature of membership. You can grant someone membership
> without admin option, or with admin option, but you can't grant them
> the admin option without membership, just like you can't purchase an
> upgrade to first class without the underlying plane ticket. What would
> the syntax look even like for this? GRANT foo TO bar WITH ADMIN OPTION
> BUT WITHOUT MEMBERSHIP? Yikes.

I don't think we need syntax to describe it.  As I just said in my
other reply, we have a perfectly good precedent for this already
in ordinary object permissions.  That is: an object owner always,
implicitly, has GRANT OPTION for all the object's privileges, even
if she revoked the corresponding plain privilege from herself.

Yeah, this does mean that we're effectively deciding that the creator
of a role is its owner.  What's the problem with that?

> But do we really have to solve this problem before we can clean up
> this session exception?

I think we need a plan for where we're going.  I don't see "clean up
the session exception" as an end in itself; it's part of re-examining
how all of this ought to work.  I don't say that we have to have a
complete patch right away, only that we need a coherent end goal.

regards, tom lane




Re: role self-revocation

2022-03-09 Thread Tom Lane
I wrote:
> This seems like a reasonable answer to me too: the creating role has admin
> option implicitly, and can then choose to grant that to other roles.
> Obviously some work needs to be done to make that happen (and we should
> see whether the SQL spec has some different idea).

Ah, here we go: it's buried under CREATE ROLE.  SQL:2021 12.4  saith that when role A executes CREATE ROLE ,
then

1) A grantable role authorization descriptor is created whose role name
is , whose grantor is "_SYSTEM", and whose grantee is A.

Since nobody is _SYSTEM, this grant can't be deleted except by dropping
the new role (or, maybe, dropping A?).  So that has nearly the same
end result as "the creating role has admin option implicitly".  The main
difference I can see is that it also means the creating role is a *member*
implicitly, which is something I'd argue we don't want to enforce.  This
is analogous to the way we let an object owner revoke her own ordinary
permissions, which the SQL model doesn't allow since those permissions
were granted to her by _SYSTEM.

regards, tom lane




Re: role self-revocation

2022-03-09 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Mar 9, 2022 at 4:01 PM Tom Lane  wrote:
> > > In my opinion, the right to
> > > administer a role - regardless of whether or not it is a login role -
> > > most naturally vests in the role that created it, or something in that
> > > direction at least, if not that exact thing.
> >
> > This seems like a reasonable answer to me too: the creating role has admin
> > option implicitly, and can then choose to grant that to other roles.
> > Obviously some work needs to be done to make that happen (and we should
> > see whether the SQL spec has some different idea).
> 
> Well, the problem is that as far as I can see, the admin option is an
> optional feature of membership. You can grant someone membership
> without admin option, or with admin option, but you can't grant them
> the admin option without membership, just like you can't purchase an
> upgrade to first class without the underlying plane ticket. What would
> the syntax look even like for this? GRANT foo TO bar WITH ADMIN OPTION
> BUT WITHOUT MEMBERSHIP? Yikes.

I've been meaning to reply to your other email regarding this, but I
don't really agree that the syntax ends up being so terrible or
difficult to deal with, considering we have these same general things
for ALTER ROLE already and there hasn't been all that much complaining.
That is, we have LOGIN and NOLOGIN, CREATEROLE and NOCREATEROLE, and we
could have MEMBERSHIP and NOMEMBERSHIP pretty easily here if we wanted
to.

> But do we really have to solve this problem before we can clean up
> this session exception? I hope not, because I think that's a much
> bigger can of worms than this is.

I do believe we can deal with the above independently and at a later
time and go ahead and clean up the session excepton bit without dealing
with the above at the same time.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: role self-revocation

2022-03-09 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > On Mar 7, 2022, at 12:16 PM, Tom Lane  wrote:
> > tgl> Having said that, one thing that I find fishy is that it's not clear
> > tgl> where the admin privilege for a role originates.  After "CREATE ROLE
> > tgl> alice", alice has no members, therefore none that have admin privilege,
> > tgl> therefore the only way that the first member could be added is via
> > tgl> superuser deus ex machina.  This does not seem clean.
> 
> > I agree with that, but I don't think it's a sufficient reason for
> > keeping the self-admin exception, because the same problem exists for
> > non-login roles. I don't even think it's the right idea conceptually
> > to suppose that the power to administer a role originates from the
> > role itself.
> 
> Actually, that's the same thing I was trying to say.  But if it doesn't
> originate from the role itself, where does it originate from?
> 
> > In my opinion, the right to
> > administer a role - regardless of whether or not it is a login role -
> > most naturally vests in the role that created it, or something in that
> > direction at least, if not that exact thing.
> 
> This seems like a reasonable answer to me too: the creating role has admin
> option implicitly, and can then choose to grant that to other roles.

I agree that this has some appeal, but it's not desirable in all cases
and so I wouldn't want it to be fully baked into the system ala the role
'owner' concept.

> Obviously some work needs to be done to make that happen (and we should
> see whether the SQL spec has some different idea).

Agreed on this, though I don't recall it having much to say on it.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: role self-revocation

2022-03-09 Thread Robert Haas
On Wed, Mar 9, 2022 at 4:01 PM Tom Lane  wrote:
> > In my opinion, the right to
> > administer a role - regardless of whether or not it is a login role -
> > most naturally vests in the role that created it, or something in that
> > direction at least, if not that exact thing.
>
> This seems like a reasonable answer to me too: the creating role has admin
> option implicitly, and can then choose to grant that to other roles.
> Obviously some work needs to be done to make that happen (and we should
> see whether the SQL spec has some different idea).

Well, the problem is that as far as I can see, the admin option is an
optional feature of membership. You can grant someone membership
without admin option, or with admin option, but you can't grant them
the admin option without membership, just like you can't purchase an
upgrade to first class without the underlying plane ticket. What would
the syntax look even like for this? GRANT foo TO bar WITH ADMIN OPTION
BUT WITHOUT MEMBERSHIP? Yikes.

But do we really have to solve this problem before we can clean up
this session exception? I hope not, because I think that's a much
bigger can of worms than this is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [Proposal] vacuumdb --schema only

2022-03-09 Thread Justin Pryzby
On Mon, Mar 07, 2022 at 08:38:04AM +0100, Gilles Darold wrote:
> > Maybe it's clearer to write this with =ANY() / != ALL() ?
> > See 002.
> 
> I have applied your changes and produced a new version v3 of the patch,
> thanks for the improvements. The patch have been added to commitfest
> interface, see here https://commitfest.postgresql.org/38/3587/

I wondered whether my patches were improvements, and it occurred to me that
your patch didn't fail if the specified schema didn't exist.  That's arguably
preferable, but that's the pre-existing behavior for tables.  So I think the
behavior of my patch is more consistent.

$ ./src/bin/scripts/vacuumdb -h /tmp -d postgres --table foo
vacuumdb: vacuuming database "postgres"
2022-03-09 15:04:06.922 CST client backend[25540] vacuumdb ERROR:  relation 
"foo" does not exist at character 60

$ ./src/bin/scripts/vacuumdb -h /tmp -d postgres --schema foo
vacuumdb: vacuuming database "postgres"
2022-03-09 15:02:59.926 CST client backend[23516] vacuumdb ERROR:  schema "foo" 
does not exist at character 335




Re: role self-revocation

2022-03-09 Thread Tom Lane
Robert Haas  writes:
> On Mar 7, 2022, at 12:16 PM, Tom Lane  wrote:
> tgl> Having said that, one thing that I find fishy is that it's not clear
> tgl> where the admin privilege for a role originates.  After "CREATE ROLE
> tgl> alice", alice has no members, therefore none that have admin privilege,
> tgl> therefore the only way that the first member could be added is via
> tgl> superuser deus ex machina.  This does not seem clean.

> I agree with that, but I don't think it's a sufficient reason for
> keeping the self-admin exception, because the same problem exists for
> non-login roles. I don't even think it's the right idea conceptually
> to suppose that the power to administer a role originates from the
> role itself.

Actually, that's the same thing I was trying to say.  But if it doesn't
originate from the role itself, where does it originate from?

> In my opinion, the right to
> administer a role - regardless of whether or not it is a login role -
> most naturally vests in the role that created it, or something in that
> direction at least, if not that exact thing.

This seems like a reasonable answer to me too: the creating role has admin
option implicitly, and can then choose to grant that to other roles.
Obviously some work needs to be done to make that happen (and we should
see whether the SQL spec has some different idea).

regards, tom lane




Re: role self-revocation

2022-03-09 Thread Robert Haas
On Mon, Mar 7, 2022 at 11:14 PM Mark Dilger
 wrote:
> > On Mar 7, 2022, at 12:16 PM, Tom Lane  wrote:
> > What would be
> > lost if we drop it?
>
> I looked into this a bit.  Removing that bit of code, the only regression 
> test changes for "check-world" are the expected ones, with nothing else 
> breaking.  Running installcheck+pg_upgrade to the patched version of HEAD 
> from each of versions 11, 12, 13 and 14 doesn't turn up anything untoward.

I looked into this a bit, too. I attach a draft patch for removing the
self-admin exception.

I found that having is_admin_of_role() return true matters in three
ways: (1) It lets you grant membership in the role to some other role.
(2) It lets you revoke membership in the role from some other role.
(3) It changes the return value of pg_role_aclcheck(), which is used
in the implementation of various SQL-callable functions all invoked
via the name pg_has_role(). We've mostly been discussing (2) as an
issue, but (1) and (3) are pretty interesting too. Regarding (3),
there is a comment in the code indicating that Noah considered the
self-admin exception something of a wart as far as pg_has_role() is
concerned. As to (1), I discovered that today you can do this:

rhaas=# create user foo;
CREATE ROLE
rhaas=# create user bar;
CREATE ROLE
rhaas=# \q
[rhaas ~]$ psql -U foo rhaas
psql (15devel)
Type "help" for help.

rhaas=> grant foo to bar with admin option;
GRANT ROLE

I don't know why I didn't realize that before. It's a natural result
of treating the logged-in user as if they had admin option. But it's
weird that you can't even be granted WITH ADMIN OPTION on your own
login role, but at the same time without having it you can grant it to
someone else!

I believe there are three other points worth some consideration here.

First, in the course of my investigation I re-discovered what Tom
already did a good job articulating:

tgl> Having said that, one thing that I find fishy is that it's not clear
tgl> where the admin privilege for a role originates.  After "CREATE ROLE
tgl> alice", alice has no members, therefore none that have admin privilege,
tgl> therefore the only way that the first member could be added is via
tgl> superuser deus ex machina.  This does not seem clean.

I agree with that, but I don't think it's a sufficient reason for
keeping the self-admin exception, because the same problem exists for
non-login roles. I don't even think it's the right idea conceptually
to suppose that the power to administer a role originates from the
role itself. If that were so, then it would be inherited by all
members of the role along with all the rest of the role's privileges,
which is so clearly not right that we've already prohibited a role
from having WITH ADMIN OPTION on itself. In my opinion, the right to
administer a role - regardless of whether or not it is a login role -
most naturally vests in the role that created it, or something in that
direction at least, if not that exact thing. Today, that means the
superuser or a CREATEROLE user who could hack superuser if they
wished. In the future, I hope for other alternatives, as recently
argued on other threads. But we need not resolve the question of how
that should work exactly in order to agree (as I hope we do) that
doubling down on the self-administration exception is not the answer.

Second, it occured to me to wonder what implications a change like
this might have for dump and restore. If privilege restoration somehow
relied on this behavior, then we'd have a problem. But I don't think
it does, because (a) pg_dump can SET ROLE but can't change the session
user without reconnecting, so it's unclear how we could be relying on
it; (b) it wouldn't work for non-login roles, and it's unlikely that
we would treat login and non-login roles different in terms of
restoring privileges, and (c) when I execute the example shown above
and then run pg_dump, there's no attempt to change the current user,
it just dumps "GRANT foo TO bar WITH ADMIN OPTION GRANTED BY foo".

Third, it occurred to me to wonder whether some users might be using
and relying upon this behavior. It's certainly possible, and it does
suck that we'd be removing it without providing a workable substitute.
But it's probably not a LOT of users because most people who have
commented on this topic on this mailing list seem to find granting
membership in a login role a super-weird thing to do, because a lot of
people really seem to want every role to be a user or a group, and a
login role with members feels like it's blurring that line. I'm
inclined to think that the small number of people who may be unhappy
is an acceptable price to pay for removing this wart, but it's a
judgement call and if someone has information to suggest that I'm
wrong, it'd be good to hear about that.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com


remove-self-admin.patch
Description: Binary data


Re: Commitfest 2022-03 One Week in. 3 Commits 213 Patches Remaining

2022-03-09 Thread David Steele

On 3/9/22 13:38, Greg Stark wrote:


Is there anything I can do to get committers assigned to these
patches? Should I do a round-robin style assignment for any of these?


I don't think this is a good idea. Committers pick the patches they are 
going to commit.


What prefer to do is bump any committers that have been involved in a 
patch thread to see if they are willing to commit it.


Regards,
-David





Re: Document what is essential and undocumented in pg_basebackup

2022-03-09 Thread David Steele

On 3/9/22 13:46, Stephen Frost wrote:



I don't think it is as reasonable to say, effectively, that you learn
what the irreducibly essential steps of an online base backup are by
reading the source of pg_basebackup, and then intuiting which of the
details you find there are the essential ones and which are outgrowths
of its particular design choices.


Documenting absolutely everything needed to write a good backup tool for
PG strikes me as unlikely to end up actually being useful.  Those who
write backup tools for PG are reading the source for PG and likely
wouldn't find such documentation helpful as not everything needed would
be included even if we did try to document everything, making such an
effort a waste of time.  The idea that we could document everything
needed and that someone could then write a simple shell script or even a
simple perl script (as pgbackrest started out as ...) from that
documentation that did everything necessary is a fiction that we need to
accept as such and move on from.


I would argue that the "Making a Non-Exclusive Low-Level Backup" and 
"Backing Up the Data Directory" sections do contain the minimal 
information you need to create a valid backup. I (and others) work hard 
to keep these sections up to date.


Arguably it is a bit confusing that "Backing Up the Data Directory" is a 
separate section, but that's because we have two backup methods and it 
needs to be kept separate. But since it is linked in the appropriate 
part of "Making a Non-Exclusive Low-Level Backup" I don't think it is 
too big a deal.


If you see something missing then let's add it. But I agree with Stephen 
that it is not a good idea to include a simplistic pseudo-solution to a 
problem that is anything but simple.


Regards,
-David




Re: Adding CI to our tree

2022-03-09 Thread Justin Pryzby
On Wed, Mar 09, 2022 at 10:12:54AM -0800, Andres Freund wrote:
> On 2022-03-09 11:47:23 -0600, Justin Pryzby wrote:
> > I'm curious what you think of this patch.
> > 
> > It makes check-world on freebsd over 30% faster - saving 5min.
> 
> That's nice! While -Og makes interactive debugging noticeably harder IME, it's
> not likely to be a large enough difference just for backtraces etc.

Yeah.  gcc(1) claims that -Og can improve debugging.

I should've mentioned that this seems to mitigate the performance effect of
--coverage on linux, too.

> I'm far less convinced that using "MaxSpeed" for the msvc build is a good
> idea. I've looked at one or two backtraces of optimized msvc builds and
> backtraces were quite a bit worse - and they're not great to start with.  What
> was the win there?

Did you compare FULL optimization or "favor speed/size" or "default"
optimization ?

It's worth trading some some build time (especially with a compiler cache) for
test time (especially with alltaptests).  But I didn't check backtraces, and I
didn't compare the various optimization options.  The argument may not be as
strong for windows, since it has no build cache (and it has no -Og).  We'd save
a bit more when also running the other tap tests.

CI runs are probably not very consistent, but I've just run
https://cirrus-ci.com/task/5236145167532032
master is the average of 4 patches at the top of cfbot.

/ master / patched / change
subscription/ 197s   / 195s/ +2s
recovery/ 234s   / 212s/ -22s
bin / 383s   / 373s/ -10s

-- 
Justin




Re: [Proposal] vacuumdb --schema only

2022-03-09 Thread Gilles Darold
Hi,

New version v4 of the patch to fix a typo in a comment.

-- 
Gilles Darold
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..378328afb3 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+	   
+	
+	 
+	  -n
+	  --schema
+	 
+	 schema
+	
+	   
+
+	   
+	
+	 
+	  -N
+	  --exclude-schema
+	 
+	 schema
+	
+	   
+ 
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +278,28 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in schema only.
+Multiple schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
+ 
+  -N schema
+  --exclude-schema=schema
+  
+   
+Clean or analyze all tables NOT in schema.
+Multiple schemas can be excluded from the vacuum by writing multiple -N switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +675,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the Foo and bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='"Foo"' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..4c4f47e32a 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,12 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+	qr/VACUUM "Foo".*/,
+	'vacuumdb --schema schema only');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-t pg_class', 'postgres' ],
+	'cannot vacuum all tables in schema(s) and specific table(s) at the same time');
 
 done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 1dcf411767..b122c995b1 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -15,5 +15,8 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '-a' ],
 	qr/statement: VACUUM.*statement: VACUUM/s,
 	'vacuum all databases');
+$node->command_fails(
+	[ 'vacuumdb', '-a',  '-n', 'pg_catalog' ],
+	'cannot vacuum specific schema(s) in all databases');
 
 done_testing();
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 4f6917fd39..d94f7459d5 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -46,10 +46,12 @@ typedef struct vacuumingOptions
 	bool		process_toast;
 } vacuumingOptions;
 
+enum trivalue schema_is_exclude = TRI_DEFAULT;
 
 static void vacuum_one_database(ConnParams *cparams,
 vacuumingOptions *vacopts,
 int stage,
+SimpleStringList *schemas,
 SimpleStringList *tables,
 int concurrentCons,
 const char *progname, bool echo, bool quiet);
@@ -94,6 +96,8 @@ main(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"jobs", required_argument, NULL, 'j'},
 		{"parallel", required_argument, NULL, 'P'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
 		{"maintenance-db", required_argument, NULL, 2},
 		{"analyze-in-stages", no_argument, NULL, 3},
 		{"disable-page-skipping", no_argument, NULL, 4},
@@ -125,6 +129,7 @@ main(int argc, char *argv[])
 	SimpleStringList tables = {NULL, NULL};
 	int			concurrentCons = 1;
 	int			tbl_count = 0;
+	SimpleStringList schemas = {NULL, NULL};
 
 	/* initialize options */
 	memset(, 0, sizeof(vacopts));
@@ -140,7 +145,7 @@ main(int argc, char *argv[])
 
 	handle_help_version_opts(argc, argv, "vacuumdb", help);
 
-	while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fvj:P:", long_options, )) != -1)
+	while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fvj:P:n:N:", long_options, )) != -1)
 	{
 		switch (c)
 		{
@@ -202,6 +207,26 @@ main(int argc, char *argv[])
 	  _workers))
 	exit(1);
 break;
+			case 'n':   /* include schema(s) */
+if (schema_is_exclude == TRI_YES)
+{
+	pg_log_error("cannot vacuum all tables in schema(s) and exclude specific schema(s) at the same time");
+	exit(1);
+}
+
+simple_string_list_append(, optarg);
+schema_is_exclude = TRI_NO;
+

Re: Document what is essential and undocumented in pg_basebackup

2022-03-09 Thread Stephen Frost
Greetings,

* Chapman Flack (c...@anastigmatix.net) wrote:
> On 03/09/22 12:19, Stephen Frost wrote:
> > Let's avoid hijacking [thread about other patch] [1]
> > for an independent debate about what our documentation should or
> > shouldn't include.
> 
> Agreed. New thread here.

Thanks.

> Stephen wrote:
> > Documenting everything that pg_basebackup does to make sure that the
> > backup is viable might be something to work on if someone is really
> > excited about this, but it's not 'dead-simple' and it's darn close to
> > the bare minimum,
> 
> I wrote:
> > if the claim is that an admin who relies on pg_basebackup is relying
> > on essential things pg_basebackup does that have not been enumerated
> > in our documentation yet, I would argue they should be.
> 
> Magnus wrote:
> > For the people who want to drive their backups from a shellscript and
> > for some reason *don't* want to use pg_basebackup, we need to come up
> > with a different API or a different set of tools. That is not a
> > documentation task. That is a "start from a list of which things
> > pg_basebackup cannot do that are still simple, or that tools like
> > pgbackrest cannot do if they're complicated".  And then design an API
> > that's actually safe and easy to use *for that usecase*.
> 
> I wrote:
> > That might also be a good thing, but I don't see it as a substitute
> > for documenting the present reality of what the irreducibly essential
> > behaviors of pg_basebackup (or of third-party tools like pgbackrest)
> > are, and why they are so.
> 
> Stephen wrote:
> > I disagree.  If we provided a tool then we'd document that tool and how
> > users can use it, not every single step that it does (see also:
> > pg_basebackup).
> 
> 
> I could grant, arguendo, that for most cases where we've "provided a tool"
> that's enough, and still distinguish pg_basebackup from those. In no
> particular order:
> 
> - pg_basebackup comes late to the party. It appears in 9.1 as a tool that
>   conveniently automates a process (performing an online base backup)
>   that has already been documented since 8.0 six and a half years earlier.
>   (While, yes, it streams the file contents over a newly-introduced
>   protocol, I don't think anyone has called that one of its irreducibly
>   essential behaviors, or claimed that any other way of reliably copying
>   those contents during the backup window would be inherently flawed.)
> 
> - By the release where pg_basebackup appears, anyone who is doing
>   online backup and PITR is already using some other tooling (third-party
>   or locally developed) to do so. There may be benefits and costs in
>   migrating those procedures to pg_basebackup. If one of the benefits is
>   "your current procedures may be missing essential steps we built into
>   pg_basebackup but left out of our documentation" then that is important
>   to know for an admin who is making that decision. Even better, knowing
>   what those essential steps are will allow that admin to make an informed
>   assessment of whether the existing procedures are broken or not.
> 
> - Typical tools are easy for an admin to judge the fitness of.
>   The tool does a thing, and you can tell right away if it did the thing
>   you needed or not. pg_basebackup, like any backup tool, does a thing,
>   and you don't find out if that was the thing you needed until later,
>   when failure isn't an option. That's a less-typical kind of a tool,
>   for which it's less ok to be a black box.
> 
> - Ultimately, an admin's job isn't "use pg_basebackup" (or "use pgbackrest"
>   or "use barman"). The job is "be certain that this cluster is recoverably
>   backed up, and for any tool you may be using to do it, that you have the
>   same grasp of what the tool has done as if you had done it yourself."
> 
> 
> In view of all that, I would think it perfectly reasonable to present
> pg_basebackup as one convenient and included reference implementation
> of the irreducibly essential steps of an online base backup, which we
> separately document.

... except that pg_basebackup isn't quite that, it just happens to do
the things that *it* needs to do to give some level of confidence that
the backup it took will be useable later.

> I don't think it is as reasonable to say, effectively, that you learn
> what the irreducibly essential steps of an online base backup are by
> reading the source of pg_basebackup, and then intuiting which of the
> details you find there are the essential ones and which are outgrowths
> of its particular design choices.

While reading the pg_basebackup source would be helpful to someone
developing a new backup tool for PG, it's not the only source you'd need
to read- you also need to read the PG source for things like what return
codes from archive_command and restore_command mean to PG or how a
promoted system finds a new timeline or what .partial or .backup files
mean.  Further, you'd need to understand that it's essential that all of
the files from the 

Re: Commitfest 2022-03 One Week in. 3 Commits 213 Patches Remaining

2022-03-09 Thread Greg Stark
So it's 8 days into the commitfest. So far 3 patches have been committed:

* parse/analyze API refactoring by Peter Eisentraut
* FUNCAPI tuplestore helper function by Melanie Plagemen committed by
Michael Paquier
* Typo in pgbench messages by Kawamoto Masay committed by Tatsuo Ishii

(There was also a flurry of five commits from Tom on Feb 28 including
one that needed to be backpatched subsequently)

In addition 5 patches have been moved moved to a future commitfest,
withdrawn, or rejected.

So that has removed 8 patches from the commitfest. There are now 189
"Needs Review" and 24 "Ready for Committer" patches.

Of those 24 "Ready for Committer" patches only 3 actually have
committers assigned andonly 6 have had any emails since the beginning
of the commitfest.

Is there anything I can do to get committers assigned to these
patches? Should I do a round-robin style assignment for any of these?




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-09 Thread Chapman Flack
On 03/09/22 12:19, Stephen Frost wrote:
> Let's avoid hijacking this thread, which is about this
> patch, for an independent debate about what our documentation should or
> shouldn't include.

Agreed. New thread here:

https://www.postgresql.org/message-id/6228FFE4.3050309%40anastigmatix.net

Regards,
-Chap




Document what is essential and undocumented in pg_basebackup

2022-03-09 Thread Chapman Flack
On 03/09/22 12:19, Stephen Frost wrote:
> Let's avoid hijacking [thread about other patch] [1]
> for an independent debate about what our documentation should or
> shouldn't include.

Agreed. New thread here.

Stephen wrote:
> Documenting everything that pg_basebackup does to make sure that the
> backup is viable might be something to work on if someone is really
> excited about this, but it's not 'dead-simple' and it's darn close to
> the bare minimum,

I wrote:
> if the claim is that an admin who relies on pg_basebackup is relying
> on essential things pg_basebackup does that have not been enumerated
> in our documentation yet, I would argue they should be.

Magnus wrote:
> For the people who want to drive their backups from a shellscript and
> for some reason *don't* want to use pg_basebackup, we need to come up
> with a different API or a different set of tools. That is not a
> documentation task. That is a "start from a list of which things
> pg_basebackup cannot do that are still simple, or that tools like
> pgbackrest cannot do if they're complicated".  And then design an API
> that's actually safe and easy to use *for that usecase*.

I wrote:
> That might also be a good thing, but I don't see it as a substitute
> for documenting the present reality of what the irreducibly essential
> behaviors of pg_basebackup (or of third-party tools like pgbackrest)
> are, and why they are so.

Stephen wrote:
> I disagree.  If we provided a tool then we'd document that tool and how
> users can use it, not every single step that it does (see also:
> pg_basebackup).


I could grant, arguendo, that for most cases where we've "provided a tool"
that's enough, and still distinguish pg_basebackup from those. In no
particular order:

- pg_basebackup comes late to the party. It appears in 9.1 as a tool that
  conveniently automates a process (performing an online base backup)
  that has already been documented since 8.0 six and a half years earlier.
  (While, yes, it streams the file contents over a newly-introduced
  protocol, I don't think anyone has called that one of its irreducibly
  essential behaviors, or claimed that any other way of reliably copying
  those contents during the backup window would be inherently flawed.)

- By the release where pg_basebackup appears, anyone who is doing
  online backup and PITR is already using some other tooling (third-party
  or locally developed) to do so. There may be benefits and costs in
  migrating those procedures to pg_basebackup. If one of the benefits is
  "your current procedures may be missing essential steps we built into
  pg_basebackup but left out of our documentation" then that is important
  to know for an admin who is making that decision. Even better, knowing
  what those essential steps are will allow that admin to make an informed
  assessment of whether the existing procedures are broken or not.

- Typical tools are easy for an admin to judge the fitness of.
  The tool does a thing, and you can tell right away if it did the thing
  you needed or not. pg_basebackup, like any backup tool, does a thing,
  and you don't find out if that was the thing you needed until later,
  when failure isn't an option. That's a less-typical kind of a tool,
  for which it's less ok to be a black box.

- Ultimately, an admin's job isn't "use pg_basebackup" (or "use pgbackrest"
  or "use barman"). The job is "be certain that this cluster is recoverably
  backed up, and for any tool you may be using to do it, that you have the
  same grasp of what the tool has done as if you had done it yourself."


In view of all that, I would think it perfectly reasonable to present
pg_basebackup as one convenient and included reference implementation
of the irreducibly essential steps of an online base backup, which we
separately document.

I don't think it is as reasonable to say, effectively, that you learn
what the irreducibly essential steps of an online base backup are by
reading the source of pg_basebackup, and then intuiting which of the
details you find there are the essential ones and which are outgrowths
of its particular design choices.

Regards,
-Chap


[1] https://www.postgresql.org/message-id/20220221172306.GA3698472%40nathanxps13




Re: [RFC] building postgres with meson

2022-03-09 Thread Andres Freund
Hi,

On 2021-10-22 11:55:05 -0400, John Naylor wrote:
> On Thu, Oct 21, 2021 at 5:48 PM Andres Freund  wrote:
> 
> > However, update-unicode is a bit harder.  Partially not directly because
> of
> > meson, but because update-unicode as-is afaict doesn't support VPATH
> builds,
> > and meson enforces those.
> 
> > make update-unicode
> > ...
> > make -C src/common/unicode update-unicode
> > '/usr/bin/perl' generate-unicode_norm_table.pl
> > Can't open perl script "generate-unicode_norm_table.pl": No such file or
> directory
> >
> > It's not too hard to fix. See attached for the minimal stuff that I
> > immediately found to be needed.
> 
> Thanks for doing that, it works well enough for demonstration. With your
> patch, and using an autoconf VPATH build, the unicode tables work fine, but
> it complains of a permission error in generate_unaccent_rules.py. That
> seems to be because the script is invoked directly rather than as an
> argument to the python interpreter.
>

> Yeah. I encountered a further issue: With autoconf on HEAD, with a source
> tree build executed in contrib/unaccent:

This seems to be the same issue as above?


> $ touch generate_unaccent_rules.py
> $ make update-unicode
> generate_unaccent_rules.py --unicode-data-file
> ../../src/common/unicode/UnicodeData.txt --latin-ascii-file Latin-ASCII.xml
> >unaccent.rules
> /bin/sh: generate_unaccent_rules.py: command not found
> make: *** [unaccent.rules] Error 127
> make: *** Deleting file `unaccent.rules'

This looks more like you're building without --with-python and you don't have
a 'python' binary (but a python3 binary)?

Independent of my changes the invocation of generate_unaccent_rules looks like

# Allow running this even without --with-python
PYTHON ?= python
...
$(PYTHON) $< --unicode-data-file $(word 2,$^) --latin-ascii-file $(word 
3,$^) >$@

so your failure should only happen if PYTHON somehow is empty, otherwise I'd
expect python in front of the failing line?


> Anyway, this can be put off until the very end, since it's not run often.
> You've demonstrated how these targets would work, and that's good enough
> for now.

I'd like to get this stuff out of the patch series, so I'm planning to get
this committable...

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-03-09 Thread Andres Freund
Hi,

On 2022-03-09 11:47:23 -0600, Justin Pryzby wrote:
> I'm curious what you think of this patch.
> 
> It makes check-world on freebsd over 30% faster - saving 5min.

That's nice! While -Og makes interactive debugging noticeably harder IME, it's
not likely to be a large enough difference just for backtraces etc.

I'm far less convinced that using "MaxSpeed" for the msvc build is a good
idea. I've looked at one or two backtraces of optimized msvc builds and
backtraces were quite a bit worse - and they're not great to start with.  What
was the win there?

Greetings,

Andres Freund




Re: Adding CI to our tree

2022-03-09 Thread Justin Pryzby
I'm curious what you think of this patch.

It makes check-world on freebsd over 30% faster - saving 5min.

Apparently gcc -Og was added in gcc 4.8 (c. 2013).

On Wed, Mar 02, 2022 at 02:50:58PM -0600, Justin Pryzby wrote:
> From d180953d273c221a30c5e9ad8d74b1b4dfc60bd1 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Sun, 27 Feb 2022 15:17:50 -0600
> Subject: [PATCH 7/7] cirrus: compile with -Og..
> 
> To improve performance of check-world, and improve debugging, without
> significantly slower builds (they're cached anyway).
> 
> This makes freebsd check-world run in 8.5 minutes rather than 15 minutes.
> ---
>  .cirrus.yml  | 12 +++-
>  src/tools/msvc/MSBuildProject.pm |  4 ++--
>  2 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/.cirrus.yml b/.cirrus.yml
> index 6f05d420c85..8b673bf58cf 100644
> --- a/.cirrus.yml
> +++ b/.cirrus.yml
> @@ -113,7 +113,7 @@ task:
>  \
>  CC="ccache cc" \
>  CXX="ccache c++" \
> -CFLAGS="-O0 -ggdb"
> +CFLAGS="-Og -ggdb"
>  EOF
>build_script: su postgres -c "gmake -s -j${BUILD_JOBS} world-bin"
>upload_caches: ccache
> @@ -208,8 +208,8 @@ task:
>  CC="ccache gcc" \
>  CXX="ccache g++" \
>  CLANG="ccache clang" \
> -CFLAGS="-O0 -ggdb" \
> -CXXFLAGS="-O0 -ggdb"
> +CFLAGS="-Og -ggdb" \
> +CXXFLAGS="-Og -ggdb"
>  EOF
>build_script: su postgres -c "make -s -j${BUILD_JOBS} world-bin"
>upload_caches: ccache
> @@ -329,8 +329,8 @@ task:
>CC="ccache cc" \
>CXX="ccache c++" \
>CLANG="ccache ${brewpath}/llvm/bin/ccache" \
> -  CFLAGS="-O0 -ggdb" \
> -  CXXFLAGS="-O0 -ggdb" \
> +  CFLAGS="-Og -ggdb" \
> +  CXXFLAGS="-Og -ggdb" \
>\
>LLVM_CONFIG=${brewpath}/llvm/bin/llvm-config \
>PYTHON=python3
> @@ -383,6 +383,8 @@ task:
>  # -fileLoggerParameters1: write to msbuild.warn.log.
>  MSBFLAGS: -m -verbosity:minimal 
> "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false 
> -nologo -fileLoggerParameters1:warningsonly;logfile=msbuild.warn.log
>  
> +MSBUILD_OPTIMIZE: MaxSpeed
> +
>  # If tests hang forever, cirrus eventually times out. In that case log
>  # output etc is not uploaded, making the problem hard to debug. Of course
>  # tests internally should have shorter timeouts, but that's proven to not
> diff --git a/src/tools/msvc/MSBuildProject.pm 
> b/src/tools/msvc/MSBuildProject.pm
> index 5e312d232e9..05e0c41eb5c 100644
> --- a/src/tools/msvc/MSBuildProject.pm
> +++ b/src/tools/msvc/MSBuildProject.pm
> @@ -85,7 +85,7 @@ EOF
>   $f, 'Debug',
>   {
>   defs=> "_DEBUG;DEBUG=1",
> - opt => 'Disabled',
> + opt => $ENV{MSBUILD_OPTIMIZE} || 'Disabled',
>   strpool => 'false',
>   runtime => 'MultiThreadedDebugDLL'
>   });
> @@ -94,7 +94,7 @@ EOF
>   'Release',
>   {
>   defs=> "",
> - opt => 'Full',
> + opt => $ENV{MSBUILD_OPTIMIZE} || 'Full',
>   strpool => 'true',
>   runtime => 'MultiThreadedDLL'
>   });
> -- 
> 2.17.1
> 




Re: support for MERGE

2022-03-09 Thread Álvaro Herrera
On 2022-Mar-07, Zhihong Yu wrote:

> For v13-0003-MERGE-SQL-Command-following-SQL-2016.patch :
> 
> +* Reset per-tuple memory context to free any expression evaluation
> +* storage allocated in the previous cycle.
> +*/
> +   ResetExprContext(econtext);
> 
> Why is the memory cleanup done in the next cycle ? Can the cleanup be done
> at the end of the current cycle ?

I have removed that, because Andres had already pointed out that it was
redundant with the reset done in the caller.

> +* XXX Should this explain why MERGE has the same logic as UPDATE?
> 
> I think explanation should be given.

Actually, the routine in question is only handling insert, not UPDATE,
so MERGE is not relevant to the function.  I have removed the comment.
This was probably a leftover from work prior to 86dc90056dfd; that
commit made it all irrelevant.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)




Re: Proposal: Support custom authentication methods using hooks

2022-03-09 Thread Magnus Hagander
On Tue, Mar 8, 2022 at 9:28 PM Stephen Frost  wrote:
>
> Greetings,
>
> * Jeff Davis (pg...@j-davis.com) wrote:
> > On Wed, 2022-03-02 at 10:54 -0500, Stephen Frost wrote:
> > > It's our decision what we want to support and maintain in the code
> > > base
> > > and what we don't.
> >
> > That might be an argument in favor of custom auth methods, because we
> > could move built-in methods that we don't like into a contrib module
> > that implements it as a custom auth method.
>
> Feel like I already answered this but just to be clear- I don't view
> that as actually addressing the issue since we'd still be maintaining
> and distributing insecure auth methods.

+1.

And contrib, in particular, is already a mix of very important, stable
ad useful things, and things that are just pure testing or examples
that nobody in their right mind should use. Putting something security
related there seems like a terrible idea on it's own, independent from
shipping things that are known insecure. (yes, I know sepgsql it
there. Which certainly doesn't help tell people if it's something that
could be relied on or not)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-09 Thread Stephen Frost
Greetings,

* Chapman Flack (c...@anastigmatix.net) wrote:
> On 03/09/22 11:22, Magnus Hagander wrote:
> >> It's more than just too confusing, it's actively bad because people will
> >> actually use it and then end up with backups that don't work.
> > 
> > +1.
> > 
> > Or even worse, backups that sometimes work, but not reliably and not
> > every time.
> > ...
> > Pretending something is simple when it's not, is not doing anybody a favor.
> 
> Okay, I bow to this reasoning, for the purpose of this patch. Let's
> just lose the example.

Great.

> >> Documenting everything that pg_basebackup does to make sure that the
> >> backup is viable might be something to work on if someone is really
> >> excited about this, but it's not 'dead-simple' and it's darn close to
> >> the bare minimum, something that none of these simple scripts will come
> >> anywhere close to being and instead they'll be far less than the
> >> minimum.
> > 
> > Yeah, having the full set of steps required documented certainly
> > wouldn't be a bad thing.
> 
> I'd say that qualifies as an understatement. While it certainly doesn't
> have to be part of this patch, if the claim is that an admin who relies
> on pg_basebackup is relying on essential things pg_basebackup does that
> have not been enumerated in our documentation yet, I would argue they
> should be.

It doesn't have to be part of this patch and we should move forward with
this patch.  Let's avoid hijacking this thread, which is about this
patch, for an independent debate about what our documentation should or
shouldn't include.

> > with a different API or a different set of tools. That is not a
> > documentation task. That is a "start from a list of which things
> > pg_basebackup cannot do that are still simple, or that tools like
> > pgbackrest cannot do if they're complicated".  And then design an API
> > that's actually safe and easy to use *for that usecase*.
> 
> That might also be a good thing, but I don't see it as a substitute
> for documenting the present reality of what the irreducibly essential
> behaviors of pg_basebackup (or of third-party tools like pgbackrest)
> are, and why they are so.

I disagree.  If we provided a tool then we'd document that tool and how
users can use it, not every single step that it does (see also:
pg_basebackup).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-09 Thread Chapman Flack
On 03/09/22 11:22, Magnus Hagander wrote:
>> It's more than just too confusing, it's actively bad because people will
>> actually use it and then end up with backups that don't work.
> 
> +1.
> 
> Or even worse, backups that sometimes work, but not reliably and not
> every time.
> ...
> Pretending something is simple when it's not, is not doing anybody a favor.

Okay, I bow to this reasoning, for the purpose of this patch. Let's
just lose the example.

>> Documenting everything that pg_basebackup does to make sure that the
>> backup is viable might be something to work on if someone is really
>> excited about this, but it's not 'dead-simple' and it's darn close to
>> the bare minimum, something that none of these simple scripts will come
>> anywhere close to being and instead they'll be far less than the
>> minimum.
> 
> Yeah, having the full set of steps required documented certainly
> wouldn't be a bad thing.

I'd say that qualifies as an understatement. While it certainly doesn't
have to be part of this patch, if the claim is that an admin who relies
on pg_basebackup is relying on essential things pg_basebackup does that
have not been enumerated in our documentation yet, I would argue they
should be.

> with a different API or a different set of tools. That is not a
> documentation task. That is a "start from a list of which things
> pg_basebackup cannot do that are still simple, or that tools like
> pgbackrest cannot do if they're complicated".  And then design an API
> that's actually safe and easy to use *for that usecase*.

That might also be a good thing, but I don't see it as a substitute
for documenting the present reality of what the irreducibly essential
behaviors of pg_basebackup (or of third-party tools like pgbackrest)
are, and why they are so.

Regards,
-Chap




Re: New developer papercut - Makefile references INSTALL

2022-03-09 Thread Tom Lane
Magnus Hagander  writes:
> On Mon, Mar 7, 2022 at 11:51 PM Tom Lane  wrote:
>> Doesn't really help people working from git, I think, because the
>> master branch is always going to claim to be "devel" even when you
>> rewind it to some old state.  Maybe we can assume people doing
>> such a thing have even more clue ... but on the whole I'd rather
>> not add the additional complication.

> Well it could per major version couldn't it? When we start working on
> v16, we stamp master as that, and we could use that for the links. It
> will work "for the past", but if will of course not be able to track
> how the docs changes between the individual commits -- since our
> website only has the latest release for each one. If we need that it
> needs to be in the source tree -- but is that actually a requirement?

I think that adds more complication than usefulness.  ISTM having the
master branch not identify itself more specifically than "devel" is
actually a good thing in this context, for precisely the reason that
the corresponding docs are likely to be in flux.  Seeing "v16" seems
likely to lull people into a false sense of certainty that whatever
they find on the web matches the code they actually have.

So I'm coming to the position that the README file ought not contain
any link more specific than https://www.postgresql.org/docs/
and that it should then tell you to look at the installation chapters
in the appropriate version's docs.  (Considering we have multiple
installation chapters nowadays, we couldn't provide an exact URL
anyway.)

regards, tom lane




Re: [RFC] building postgres with meson

2022-03-09 Thread Andres Freund
Hi,

One thing that's pretty cool with ninja based builds is that it contains a
dependency log of "discovered" dependencies as well as information about
dependencies "encoded" in the build specification. LLVM contains a script that
uses that dependency information to see whether the build specification is
missing any dependencies - this helped me find several "build bugs".

The script is at:
https://github.com/llvm/llvm-project/blob/main/llvm/utils/check_ninja_deps.py

It just needs to be invoked in the build directory after a build.

If I e.g. remove the dependency from libpgcommon_srv.a to the lwlocknames.h
generation (*) it complains with:
error: src/common/libpgcommon_srv.a.p/cryptohash_openssl.c.o requires 
src/include/storage/lwlocknames.h but has no dependency on it
error: src/common/libpgcommon_srv.a.p/hmac_openssl.c.o requires 
src/include/storage/lwlocknames.h but has no dependency on it


I wonder if it's worth having a build target invoking it? But how to get the
path to the script? We could just copy it into src/tools? It's permissively
licensed...

Greetings,

Andres Freund


(*) It seems architecturally pretty darn ugly for pgcommon to acquire lwlocks.




Re: New developer papercut - Makefile references INSTALL

2022-03-09 Thread Magnus Hagander
On Mon, Mar 7, 2022 at 11:51 PM Tom Lane  wrote:
>
> Magnus Hagander  writes:
> > But taking a step back, who is the actual audience for this? Do we
> > *need* a link pointing directly there, or is it enough to just point
> > to "use the docs on the web"? We can't link to the incorrect version,
> > but can we just link to /docs/ and leave it at that?
>
> Well, it's people compiling from source, so I guess we can assume some
> amount of cluefulness?  I think perhaps it'd be okay to say "go here
> and then navigate to the proper sub-page for your version".
>
> > If not, could we make the change of URL a part of the branching step?
> > Branch to a stable release would the include modifying README, and be
> > mad ea step of version_stamp.pl?
>
> Doesn't really help people working from git, I think, because the
> master branch is always going to claim to be "devel" even when you
> rewind it to some old state.  Maybe we can assume people doing
> such a thing have even more clue ... but on the whole I'd rather
> not add the additional complication.

Well it could per major version couldn't it? When we start working on
v16, we stamp master as that, and we could use that for the links. It
will work "for the past", but if will of course not be able to track
how the docs changes between the individual commits -- since our
website only has the latest release for each one. If we need that it
needs to be in the source tree -- but is that actually a requirement?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2022-03-09 Thread Justin Pryzby
Rebased over 9e9858389 (Michael may want to look at the tuplestore part?).

Fixing a comment typo.

I also changed pg_ls_dir_recurse() to handle concurrent removal of a dir, which
I noticed caused an infrequent failure on CI.  However I'm not including that
here, since the 2nd half of the patch set seems isn't ready due to lstat() on
windows.
>From 47dde043c27840ef43c037f968b268270dc3206d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 16 Mar 2020 14:12:55 -0500
Subject: [PATCH v34 01/15] Document historic behavior of links to
 directories..

Backpatch to 9.5: pg_stat_file
---
 doc/src/sgml/func.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8a802fb2253..0be4743e3ed 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27618,7 +27618,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 Returns a record containing the file's size, last access time stamp,
 last modification time stamp, last file status change time stamp (Unix
 platforms only), file creation time stamp (Windows only), and a flag
-indicating if it is a directory.
+indicating if it is a directory (or a symbolic link to a directory).


 This function is restricted to superusers by default, but other users
-- 
2.17.1

>From d263f772faa6d76fdf301664bc0b436ec417b2cb Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 17 Mar 2020 13:16:24 -0500
Subject: [PATCH v34 02/15] Add tests before changing pg_ls_*

---
 src/test/regress/expected/misc_functions.out | 59 
 src/test/regress/sql/misc_functions.sql  | 15 +
 2 files changed, 74 insertions(+)

diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 8567fcc2b45..b08e7c4a6d0 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -393,6 +393,65 @@ select count(*) > 0 from
  t
 (1 row)
 
+select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true
+ name 
+--
+ .
+(1 row)
+
+select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false
+ name 
+--
+(0 rows)
+
+select pg_ls_dir('does not exist', true, false); -- ok with missingok=true
+ pg_ls_dir 
+---
+(0 rows)
+
+select pg_ls_dir('does not exist'); -- fails with missingok=false
+ERROR:  could not open directory "does not exist": No such file or directory
+-- Check that expected columns are present
+select * from pg_ls_archive_statusdir() limit 0;
+ name | size | modification 
+--+--+--
+(0 rows)
+
+select * from pg_ls_logdir() limit 0;
+ name | size | modification 
+--+--+--
+(0 rows)
+
+select * from pg_ls_logicalmapdir() limit 0;
+ name | size | modification 
+--+--+--
+(0 rows)
+
+select * from pg_ls_logicalsnapdir() limit 0;
+ name | size | modification 
+--+--+--
+(0 rows)
+
+select * from pg_ls_replslotdir('') limit 0;
+ name | size | modification 
+--+--+--
+(0 rows)
+
+select * from pg_ls_tmpdir() limit 0;
+ name | size | modification 
+--+--+--
+(0 rows)
+
+select * from pg_ls_waldir() limit 0;
+ name | size | modification 
+--+--+--
+(0 rows)
+
+select * from pg_stat_file('.') limit 0;
+ size | access | modification | change | creation | isdir 
+--++--++--+---
+(0 rows)
+
 --
 -- Test replication slot directory functions
 --
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 3db3f8bade2..ebb0352ac68 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -132,6 +132,21 @@ select count(*) > 0 from
where spcname = 'pg_default') pts
   join pg_database db on pts.pts = db.oid;
 
+select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true
+select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false
+select pg_ls_dir('does not exist', true, false); -- ok with missingok=true
+select pg_ls_dir('does not exist'); -- fails with missingok=false
+
+-- Check that expected columns are present
+select * from pg_ls_archive_statusdir() limit 0;
+select * from pg_ls_logdir() limit 0;
+select * from pg_ls_logicalmapdir() limit 0;
+select * from pg_ls_logicalsnapdir() limit 0;
+select * from pg_ls_replslotdir('') limit 0;
+select * from pg_ls_tmpdir() limit 0;
+select * from pg_ls_waldir() limit 0;
+select * from pg_stat_file('.') limit 0;
+
 --
 -- Test replication slot directory functions
 --
-- 
2.17.1

>From 1c60ac0976f898eacd24881348b5f04b06dc7e1f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 9 Mar 2020 22:40:24 -0500
Subject: 

Re: [RFC] building postgres with meson -v7

2022-03-09 Thread Andres Freund
Hi,

On 2022-03-09 13:37:23 +0100, Peter Eisentraut wrote:
> I looked at this today mainly to consider whether some of the prereq
> work is ready for adoption now.

Thanks!


> A lot of the work has to do with
> making various scripts write the output to other directories.  I
> suspect this has something to do with how meson handles separate build
> directories and how we have so far handled files created in the
> distribution tarball.  But the whole picture isn't clear to me.

A big part of it is that when building with ninja tools are invoked in the
top-level build directory, but right now a bunch of our scripts put their
output in CWD.


> More generally, I don't see a distprep target in the meson build
> files.  I wonder what your plan for that is, or whether that would
> even work under meson.  In [0], I argued for getting rid of the
> distprep step.  Perhaps it is time to reconsider that now.
> 
> [0]: 
> https://www.postgresql.org/message-id/flat/cf0bec33-d965-664d-e0ec-fb15290f2273%402ndquadrant.com

I think it should be doable to add something roughly like the current distprep. 
The
cleanest way would be to use
https://mesonbuild.com/Reference-manual_builtin_meson.html#mesonadd_dist_script
to copy the files into the generated tarball.

Of course not adding it would be even easier ;)


> For the short term, I think the patches 0002, 0008, 0010, and 0011
> could be adopted, if they are finished as described.

Cool.


> Patch 0007 seems unrelated, or at least independently significant, and
> should be discussed separately.

It's related - it saves us from doing a lot of extra complexity on
windows. I've brought it up as a separate thread too:
https://postgr.es/m/20211101020311.av6hphdl6xbjbuif%40alap3.anarazel.de

I'm currently a bit stuck implementing this properly for the configure / make
system, as outlined in:
https://www.postgresql.org/message-id/20220111025328.iq5g6uck53j5qtin%40alap3.anarazel.de


> The rest is really all part of the same put-things-in-the-right-directory
> issue.
> 
> For the overall patch set, I did a quick test with
> 
> meson setup build
> cd build
> ninja
> 
> which failed with
> 
> Undefined symbols for architecture x86_64:
>   "_bbsink_zstd_new", referenced from:
>   _SendBaseBackup in replication_basebackup.c.o
> 
> So maybe your patch set is not up to date with this new zstd build
> option.

Yep, I posted it before "7cf085f077d - Add support for zstd base backup
compression."  went in, but after 6c417bbcc8f. So the meson build knew about
the zstd dependency, but didn't yet specify it for postgres /
pg_basebackup. So all that's needed was / is adding the dependency to those
two places.

Updated patches attached. This just contains the fix for this issue, doesn't
yet address review comments.

FWIW, I'd already pushed those fixes out to the git tree. There's frequent
enough small changes that reposting everytime seems too noisy.


> v6-0001-meson-prereq-output-and-depencency-tracking-work.patch.gz
> 
> This all looks kind of reasonable, but lacks explanation in some
> cases, so I can't fully judge it.

I'll try to clean it up.


> v6-0007-meson-prereq-Can-we-get-away-with-not-export-all-.patch.gz
> 
> This is a separate discussion.  It's not clear to me why this is part
> of this patch series.

See above.


> v6-0008-meson-prereq-Handle-DLSUFFIX-in-msvc-builds-simil.patch.gz
> 
> Part of this was already done in 0001, so check if these patches are
> split correctly.
> 
> I think the right way here is actually to go the other way around:
> Move DLSUFFIX into header files for all platforms.  Move the DLSUFFIX
> assignment from src/makefiles/ to src/templates/, have configure read
> it, and then substitute it into Makefile.global and pg_config.h.
> 
> Then we also don't have to patch the Windows build code a bunch of
> times to add the DLSUFFIX define everywhere.
> 
> There is code in configure already that would benefit from this, which
> currently says
> 
> # We don't know the platform DLSUFFIX here, so check 'em all.

I'll try it out.


> v6-0009-prereq-make-unicode-targets-work-in-vpath-builds.patch.gz
> 
> Another directory issue

I think it's a tad different, in that it's fixing something that's currently
broken in VPATH builds.


> v6-0010-ldap-tests-Don-t-run-on-unsupported-operating-sys.patch.gz
> 
> Not sure what this is supposed to do, but it looks independent of this
> patch series.  Does it currently not work on "unsupported" operating
> systems?

Right now if you run the ldap tests on windows, openbsd, ... the tests
fail. The only reason it doesn't cause trouble on the buildfarm is that we
currently don't run those tests by default...


> v6-0011-ldap-tests-Add-paths-for-openbsd.patch.gz
> 
> The more the merrier, although I'm a little bit worried about pointing
> to a /usr/local/share/examples/ directory.

It's where the files are in the package :/.


> v6-0012-wip-split-TESTDIR-into-two.patch.gz
> v6-0013-meson-Add-meson-based-buildsystem.patch.gz
> 

Re: PROXY protocol support

2022-03-09 Thread Magnus Hagander
On Wed, Mar 9, 2022 at 5:23 PM Peter Eisentraut
 wrote:
>
> A general question on this feature: AFAICT, you can only send the proxy
> header once at the beginning of the connection.  So this wouldn't be of
> use for PostgreSQL-protocol connection poolers (pgbouncer, pgpool),
> where the same server connection can be used for clients from different
> source addresses.  Do I understand that correctly?

Correct. It's only sent at connection startup, so if you're re-using
the connection it would also re-use the IP address of the first one.

For reusing the connection for multiple clients, you'd want something
different, like a "priviliged mode in the tunnel"  that the pooler can
handle.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: PROXY protocol support

2022-03-09 Thread Peter Eisentraut
A general question on this feature: AFAICT, you can only send the proxy 
header once at the beginning of the connection.  So this wouldn't be of 
use for PostgreSQL-protocol connection poolers (pgbouncer, pgpool), 
where the same server connection can be used for clients from different 
source addresses.  Do I understand that correctly?





Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-09 Thread Magnus Hagander
On Wed, Mar 9, 2022 at 4:42 PM Stephen Frost  wrote:
>
> Greetings,
>
> * Chapman Flack (c...@anastigmatix.net) wrote:
> > On 03/08/22 17:12, Nathan Bossart wrote:
> > > I spent some time trying to come up with a workable script to replace the
> > > existing one.  I think the main problem is that you need to write out both
> > > the backup label file and the tablespace map file, but I didn't find an
> > > easy way to write the different output columns of pg_backup_stop() to
> > > separate files via psql.
>
> Let's not confuse ourselves here- the existing script *doesn't* work in
> any reasonable way when we're talking about everything that needs to be
> done to perform a backup.  That a lot of people are using it because
> it's in the documentation is an actively bad thing.
>
> The same goes for the archive command example.
>
> > Something like this might work:
> >
> > SELECT * FROM pg_backup_stop(true) \gset
> >
> > \out /tmp/backup_label   \qecho :labelfile
> > \out /tmp/tablespace_map \qecho :spcmapfile
> > \out
> > \! ... tar command adding /tmp/{backup_label,tablespace_map} to the tarball
>
> ... this doesn't do what's needed either.  We could try to write down
> some minimum set of things that are needed for a backup tool to do but
> it's not something that a 3 line script is going to cover.  Indeed, it's
> a lot more like pg_basebackup and if we want to give folks a script to
> use, it should be "run pg_basebackup".
>
> > I notice the \qecho adds a final newline (and so if :spcmapfile is empty,
> > a file containing a single newline is made). In a quick test with a bogus
> > restore_command, I did not see any error messages specific to the format
> > of the backup_label or tablespace_map files, so maybe the final newline
> > isn't a problem.
> >
> > Assuming the newline isn't a problem, that might be simple enough to
> > use in an example, and maybe it's not a bad thing that it highlights a few
> > psql capabilities the reader might not have stumbled on before. Or, maybe
> > it is just too confusing to bother.
>
> It's more than just too confusing, it's actively bad because people will
> actually use it and then end up with backups that don't work.

+1.

Or even worse, backups that sometimes work, but not reliably and not every time.

> > While agreeing that pg_basebackup is the production-ready thing that
> > does it all for you (with tests for likely errors and so on), I think
> > there is also some value in a dead-simple example that concretely
> > shows you what "it" is, what the basic steps are that happen beneath
> > pg_basebackup's chrome.

I agree that having a dead simple script would be good.

The *only* dead simple script that's going to be possible is one that
calls pg_basebackup.

The current APIs don't make it *possible* to drive them directly with
a dead simple script.

Pretending something is simple when it's not, is not doing anybody a favor.


> Documenting everything that pg_basebackup does to make sure that the
> backup is viable might be something to work on if someone is really
> excited about this, but it's not 'dead-simple' and it's darn close to
> the bare minimum, something that none of these simple scripts will come
> anywhere close to being and instead they'll be far less than the
> minimum.

Yeah, having the full set of steps required documented certainly
wouldn't be a bad thing. But it's a very *different* thing.


> > If the added newline is a problem, I haven't thought of a way to exclude
> > it that doesn't take the example out of the realm of dead-simple.
>
> I disagree that there's really a way to provide 'dead-simple' backups
> with what's built into core without using pg_basebackup.  If we want a
> 'dead-simple' solution in core then we'd need to write an appropriate
> backup tool that does all the basic things and include and maintain
> that.  Writing a shell script isn't enough and we shouldn't encourage
> our users to do exactly that by having it in our documentation because
> then they'll think it's enough.

+1.


We need to accept that the current APIs are far too low level to be
driven by a shellscript. No matter how much documentation we write is
not going to change that fact.

For the people who want to drive their backups from a shellscript and
for some reason *don't* want to use pg_basebackup, we need to come up
with a different API or a different set of tools. That is not a
documentation task. That is a "start from a list of which things
pg_basebackup cannot do that are still simple, or that tools like
pgbackrest cannot do if they're complicated".  And then design an API
that's actually safe and easy to use *for that usecase*.

For example, if the use case is "i want to use filesystemor SAN
snapshots for my backups", we shouldn't try to write workarounds using
bash coprocs or whatever. Instead, we could write a tool that
interacts with the current api to start the backup, then launches a
shellscript that interacts with the snapshot system, and then 

Re: Printing backtrace of postgres processes

2022-03-09 Thread Justin Pryzby
rebased to appease cfbot.

+ couple of little fixes as 0002.

-- 
Justin
>From 4be93f2bab460682a0f5af9e1e3f4970709b3517 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Tue, 25 Jan 2022 08:21:22 +0530
Subject: [PATCH 1/2] Add function to log the backtrace of the specified
 postgres process.

ci-os-only: html

This commit adds pg_log_backtrace() function that requests to log
the backtrace of the specified backend or auxiliary process except
logger and statistic collector.

Only superusers are allowed to request to log the backtrace
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its backtrace at LOG_SERVER_ONLY level,
so that the backtrace will appear in the server log but not
be sent to the client.

Bump catalog version.

Authors: Vignesh C, Bharath Rupireddy
---
 doc/src/sgml/func.sgml| 62 +++
 src/backend/catalog/system_functions.sql  |  2 +
 src/backend/postmaster/autovacuum.c   |  4 ++
 src/backend/postmaster/checkpointer.c |  4 ++
 src/backend/postmaster/interrupt.c|  4 ++
 src/backend/postmaster/pgarch.c   | 10 +++-
 src/backend/postmaster/startup.c  |  4 ++
 src/backend/postmaster/walwriter.c|  4 ++
 src/backend/storage/ipc/procarray.c   | 56 +
 src/backend/storage/ipc/procsignal.c  | 42 +
 src/backend/storage/ipc/signalfuncs.c | 73 ---
 src/backend/tcop/postgres.c   |  4 ++
 src/backend/utils/adt/mcxtfuncs.c | 40 +++--
 src/backend/utils/error/elog.c| 19 --
 src/backend/utils/init/globals.c  |  1 +
 src/include/catalog/pg_proc.dat   |  5 ++
 src/include/miscadmin.h   |  1 +
 src/include/storage/procarray.h   |  2 +
 src/include/storage/procsignal.h  |  3 +-
 src/include/utils/elog.h  |  2 +
 src/test/regress/expected/backtrace.out   | 49 +++
 src/test/regress/expected/backtrace_1.out | 55 +
 src/test/regress/parallel_schedule|  2 +-
 src/test/regress/sql/backtrace.sql| 33 ++
 24 files changed, 416 insertions(+), 65 deletions(-)
 create mode 100644 src/test/regress/expected/backtrace.out
 create mode 100644 src/test/regress/expected/backtrace_1.out
 create mode 100644 src/test/regress/sql/backtrace.sql

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b727c3423aa..9f321fb019a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25355,6 +25355,30 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_backtrace
+
+pg_log_backtrace ( pid integer )
+boolean
+   
+   
+Requests to log the backtrace of the backend with the
+specified process ID.  This function can send the request to
+backends and auxiliary processes except logger and statistics
+collector.  These backtraces will be logged at LOG
+message level. They will appear in the server log based on the log
+configuration set (See  for
+more information), but will not be sent to the client regardless of
+. A backtrace identifies
+which function a process is currently executing and it may be useful
+for developers to diagnose stuck processes and other problems. This
+function is supported only if PostgreSQL was built with the ability to
+capture backtraces, otherwise it will emit a warning.
+   
+  
+
   

 
@@ -25574,6 +25598,44 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_backtrace can be used to log the backtrace of
+a backend process. For example:
+
+postgres=# select pg_log_backtrace(pg_backend_pid());
+ pg_log_backtrace
+--
+ t
+(1 row)
+
+The backtrace will be logged as specified by the logging configuration.
+For example:
+
+2021-01-27 11:33:50.247 IST [111735] LOG:  current backtrace:
+postgres: postgresdba postgres [local] SELECT(set_backtrace+0x38) [0xae06c5]
+postgres: postgresdba postgres [local] SELECT(ProcessInterrupts+0x788) [0x950c34]
+postgres: postgresdba postgres [local] SELECT() [0x761e89]
+postgres: postgresdba postgres [local] SELECT() [0x71bbda]
+postgres: postgresdba postgres [local] SELECT() [0x71e380]
+postgres: postgresdba postgres [local] SELECT(standard_ExecutorRun+0x1d6) [0x71c1fe]
+postgres: postgresdba postgres [local] SELECT(ExecutorRun+0x55) [0x71c026]
+postgres: postgresdba postgres [local] SELECT() [0x953fc5]
+postgres: postgresdba postgres [local] SELECT(PortalRun+0x262) [0x953c7e]
+ 

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-09 Thread Stephen Frost
Greetings,

* Chapman Flack (c...@anastigmatix.net) wrote:
> On 03/08/22 17:12, Nathan Bossart wrote:
> > I spent some time trying to come up with a workable script to replace the
> > existing one.  I think the main problem is that you need to write out both
> > the backup label file and the tablespace map file, but I didn't find an
> > easy way to write the different output columns of pg_backup_stop() to
> > separate files via psql.

Let's not confuse ourselves here- the existing script *doesn't* work in
any reasonable way when we're talking about everything that needs to be
done to perform a backup.  That a lot of people are using it because
it's in the documentation is an actively bad thing.

The same goes for the archive command example.

> Something like this might work:
> 
> SELECT * FROM pg_backup_stop(true) \gset
> 
> \out /tmp/backup_label   \qecho :labelfile
> \out /tmp/tablespace_map \qecho :spcmapfile
> \out
> \! ... tar command adding /tmp/{backup_label,tablespace_map} to the tarball

... this doesn't do what's needed either.  We could try to write down
some minimum set of things that are needed for a backup tool to do but
it's not something that a 3 line script is going to cover.  Indeed, it's
a lot more like pg_basebackup and if we want to give folks a script to
use, it should be "run pg_basebackup".

> I notice the \qecho adds a final newline (and so if :spcmapfile is empty,
> a file containing a single newline is made). In a quick test with a bogus
> restore_command, I did not see any error messages specific to the format
> of the backup_label or tablespace_map files, so maybe the final newline
> isn't a problem.
> 
> Assuming the newline isn't a problem, that might be simple enough to
> use in an example, and maybe it's not a bad thing that it highlights a few
> psql capabilities the reader might not have stumbled on before. Or, maybe
> it is just too confusing to bother.

It's more than just too confusing, it's actively bad because people will
actually use it and then end up with backups that don't work.

> While agreeing that pg_basebackup is the production-ready thing that
> does it all for you (with tests for likely errors and so on), I think
> there is also some value in a dead-simple example that concretely
> shows you what "it" is, what the basic steps are that happen beneath
> pg_basebackup's chrome.

Documenting everything that pg_basebackup does to make sure that the
backup is viable might be something to work on if someone is really
excited about this, but it's not 'dead-simple' and it's darn close to
the bare minimum, something that none of these simple scripts will come
anywhere close to being and instead they'll be far less than the
minimum.

> If the added newline is a problem, I haven't thought of a way to exclude
> it that doesn't take the example out of the realm of dead-simple.

I disagree that there's really a way to provide 'dead-simple' backups
with what's built into core without using pg_basebackup.  If we want a
'dead-simple' solution in core then we'd need to write an appropriate
backup tool that does all the basic things and include and maintain
that.  Writing a shell script isn't enough and we shouldn't encourage
our users to do exactly that by having it in our documentation because
then they'll think it's enough.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [PATCH] Add reloption for views to enable RLS

2022-03-09 Thread Laurenz Albe
On Tue, 2022-03-08 at 18:17 +0100, Christoph Heiss wrote:
> Since there don't seem to be any more objections to "security_invoker" I 
> attached v10 renaming it again.
> 
> I've tried to better clarify the whole invoker vs. definer thing in the 
> CREATE VIEW documentation by explicitly mentioning that 
> "security_invoker=false" is _not_ the same as "security definer", based 
> on the earlier discussions.
> 
> This should hopefully avoid any implicit associations.

I have only some minor comments:

> --- a/doc/src/sgml/ref/create_view.sgml
> +++ b/doc/src/sgml/ref/create_view.sgml
> @@ -387,10 +430,17 @@ CREATE VIEW vista AS SELECT text 'Hello World' AS hello;
> 
>  Note that the user performing the insert, update or delete on the view
>  must have the corresponding insert, update or delete privilege on the
> -view.  In addition the view's owner must have the relevant privileges on
> -the underlying base relations, but the user performing the update does
> -not need any permissions on the underlying base relations (see
> -).
> +view.
> +   
> +
> +   
> +Additionally, by default the view's owner must have the relevant 
> privileges
> +on the underlying base relations, but the user performing the update does
> +not need any permissions on the underlying base relations. (see
> +)  If the view has the
> +security_invoker property is set to
> +true, the invoking user will need to have the relevant
> +privileges rather than the view owner.
> 
>
>   

This paragraph contains a couple of grammatical errors.
How about

  
   Note that the user performing the insert, update or delete on the view
   must have the corresponding insert, update or delete privilege on the
   view.  Unless security_invoker is set to
   true, the view's owner must additionally have the
   relevant privileges on the underlying base relations, but the user
   performing the update does not need any permissions on the underlying
   base relations (see ).
   If security_invoker is set to true,
   it is the invoking user rather than the view owner that must have the
   relevant privileges on the underlying base relations.
  

Also, this:

> --- a/src/backend/utils/cache/relcache.c
> +++ b/src/backend/utils/cache/relcache.c
> @@ -838,8 +846,18 @@ RelationBuildRuleLock(Relation relation)
>  * the rule tree during load is relatively cheap (compared to
>  * constructing it in the first place), so we do it here.
>  */
> -   setRuleCheckAsUser((Node *) rule->actions, 
> relation->rd_rel->relowner);
> -   setRuleCheckAsUser(rule->qual, relation->rd_rel->relowner);
> +   if (rule->event == CMD_SELECT
> +   && relation->rd_rel->relkind == RELKIND_VIEW
> +   && RelationHasSecurityInvoker(relation))
> +   {
> +   setRuleCheckAsUser((Node *) rule->actions, InvalidOid);
> +   setRuleCheckAsUser(rule->qual, InvalidOid);
> +   }
> +   else
> +   {
> +   setRuleCheckAsUser((Node *) rule->actions, 
> relation->rd_rel->relowner);
> +   setRuleCheckAsUser(rule->qual, relation->rd_rel->relowner);
> +   }

could be written like this (introducing a new variable):

  if (rule->event == CMD_SELECT
  && relation->rd_rel->relkind == RELKIND_VIEW
  && RelationHasSecurityInvoker(relation))
  user_for_check = InvalidOid;
  else
  user_for_check = relation->rd_rel->relowner;

  setRuleCheckAsUser((Node *) rule->actions, user_for_check);
  setRuleCheckAsUser(rule->qual, user_for_check);

This might be easier to read.


Yours,
Laurenz Albe





RE: Optionally automatically disable logical replication subscriptions on error

2022-03-09 Thread osumi.takami...@fujitsu.com
On Wednesday, March 9, 2022 8:22 PM Amit Kapila  wrote:
> On Wed, Mar 9, 2022 at 2:21 PM Masahiko Sawada
>  wrote:
> >
> > On Wed, Mar 9, 2022 at 4:33 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Tuesday, March 8, 2022 10:23 PM Amit Kapila
>  wrote:
> > > > On Tue, Mar 8, 2022 at 1:37 PM osumi.takami...@fujitsu.com
> > > >  wrote:
> > > > >
> > >
> > >
> > > > 2. Is there a reason the patch doesn't allow workers to restart
> > > > via
> > > > maybe_reread_subscription() when this new option is changed, if
> > > > so, then let's add a comment for the same? We currently seem to be
> > > > restarting the worker on any change via Alter Subscription. If we
> > > > decide to change it for this option as well then I think we need
> > > > to accordingly update the current comment: "Exit if any parameter
> > > > that affects the remote connection was changed." to something like
> > > > "Exit if any parameter that affects the remote connection or a
> subscription option was changed..."
> > > I thought it's ok without the change at the beginning, but I was wrong.
> > > To make this new option aligned with others, I should add one check
> > > for this feature. Fixed.
> >
> > Why do we need to restart the apply worker when disable_on_error is
> > changed? It doesn't affect the remote connection at all. I think it
> > can be changed without restarting like synchronous_commit option.
> >
> 
> oh right, I thought that how will we update its value in MySubscription after 
> a
> change but as we re-read the pg_subscription table for the current
> subscription and update MySubscription, I feel we don't need to restart it. I
> haven't tested it but it should work without a restart.
Hi, attached v32 removed my additional code for maybe_reread_subscription.

Also, I judged that we don't need to add a comment for this feature in this 
patch.
It's because we can interpret this discussion from existing comments and codes.
(1) "Reread subscription info if needed. Most changes will be exit."
There are some cases we don't exit.
(2) Like "Exit if any parameter that affects the remote connection was 
changed.",
readers can understand no exit case matches the disable_on_error option 
change.

Kindly review the v32.

Best Regards,
Takamichi Osumi



v32-0001-Optionally-disable-subscriptions-on-error.patch
Description: v32-0001-Optionally-disable-subscriptions-on-error.patch


Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-03-09 Thread Julien Rouhaud
On Tue, Mar 08, 2022 at 08:57:23PM +0530, Nitin Jadhav wrote:
> 
> I just wanted to avoid extra calculations just to show the progress in
> the view. Since it's a good metric, I have added an additional field
> named 'next_flags' to the view which holds all possible flag values of
> the next checkpoint.

I still don't think that's ok.  IIUC the only way to know if the current
checkpoint is throttled or not is to be aware that the "next_flags" can apply
to the current checkpoint too, look for it and see if that changes the
semantics of what the view say the current checkpoint is.  Most users will get
it wrong.

> This gives more information than just saying
> whether the new checkpoint is requested or not with the same memory.

So that next_flags will be empty most of the time?  It seems confusing.

Again I would just display a bool flag saying whether a new checkpoint has been
explicitly requested or not, it seems enough.

If you're interested in that next checkpoint, you probably want a quick
completion of the current checkpoint first (and thus need to know if it's
throttled or not).  And then you will have to keep monitoring that view for the
next checkpoint anyway, and at that point the view will show the relevant
information.




Re: Changing "Hot Standby" to "hot standby"

2022-03-09 Thread Daniel Westermann (DWE)
>>Hmm.  Outside the title that had better use upper-case characters for
>>the first letter of each word, I can see references to the pattern you
>>are trying to eliminate in amcheck.sgml (1), config.sgml (3),
>>protocol.sgml (3) and mvcc.sgml (1).  Shouldn't you refresh these as
>>well if the point is to make the full set of docs consistent?

>>As of the full tree, I can see that:
>>
>>$ git grep "hot standby" | wc -l
>>259

>>$ git grep "Hot Standby" | wc -l
>>73

>>So there is a trend for one of the two.

>Thanks for looking at it. Yes, I am aware there are other places which would 
>need to be changed and I think I mentioned that in an >earlier Email. Are you 
>suggesting to change all at once? I wanted to start with the documentation and 
>then continue with the other >places.

Attached a new version which also modifies amcheck.sgml, config.sgml, 
protocol.sgml, and mvcc.sgml accordingly.

Regards
Danieldiff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 11d1eb5af2..5d61a33936 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -174,7 +174,7 @@ ORDER BY c.relpages DESC LIMIT 10;
   hypothetically, undiscovered bugs in the underlying B-Tree index
   access method code.  Note that
   bt_index_parent_check cannot be used when
-  Hot Standby mode is enabled (i.e., on read-only physical
+  hot standby mode is enabled (i.e., on read-only physical
   replicas), unlike bt_index_check.
  
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7ed8c82a9d..e2db2a789f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4550,7 +4550,7 @@ ANY num_sync ( .
@@ -4582,7 +4582,7 @@ ANY num_sync ( .
@@ -10887,7 +10887,7 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
 Enables logging of recovery-related debugging output that otherwise
 would not be logged. This parameter allows the user to override the
 normal setting of , but only for
-specific messages. This is intended for use in debugging Hot Standby.
+specific messages. This is intended for use in debugging hot standby.
 Valid values are DEBUG5, DEBUG4,
 DEBUG3, DEBUG2, DEBUG1, and
 LOG.  The default, LOG, does not affect
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b5b6042104..81fa26f985 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -548,8 +548,8 @@ protocol to make nodes agree on a serializable transactional order.
rollforward will take considerably longer, so that technique only
offers a solution for disaster recovery, not high availability.
A standby server can also be used for read-only queries, in which case
-   it is called a Hot Standby server. See  for
-   more information.
+   it is called a hot standby server. See 
+for more information.
   
 
   
@@ -1032,7 +1032,7 @@ primary_slot_name = 'node_a_slot'

 

-Hot Standby feedback propagates upstream, whatever the cascaded arrangement.
+Hot standby feedback propagates upstream, whatever the cascaded arrangement.

 

@@ -1499,16 +1499,16 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
   Hot Standby
 
   
-   Hot Standby
+   hot standby
   
 

-Hot Standby is the term used to describe the ability to connect to
+Hot standby is the term used to describe the ability to connect to
 the server and run read-only queries while the server is in archive
 recovery or standby mode. This
 is useful both for replication purposes and for restoring a backup
 to a desired state with great precision.
-The term Hot Standby also refers to the ability of the server to move
+The term hot standby also refers to the ability of the server to move
 from recovery through to normal operation while users continue running
 queries and/or keep their connections open.

@@ -1623,7 +1623,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
being executed during recovery.  This restriction applies even to
temporary tables, because table rows cannot be read or written without
assigning a transaction ID, which is currently not possible in a
-   Hot Standby environment.
+   hot standby environment.
   
  
  
@@ -1703,7 +1703,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'

 In normal operation, read-only transactions are allowed to
 use LISTEN and NOTIFY,
-so Hot Standby sessions operate under slightly tighter
+so hot standby sessions operate under slightly tighter
 restrictions than ordinary read-only sessions.  It is possible that some
 of these restrictions might be loosened in a future release.

@@ -1746,7 +1746,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'

 

-There are also additional types of conflict that can occur with Hot Standby.
+There are 

Re: Column Filtering in Logical Replication

2022-03-09 Thread Tomas Vondra
On 3/9/22 11:03, Amit Kapila wrote:
> ...
>> Hmm, yeah. That seems like a genuine problem - it should not depend on
>> the order of publications in the subscription, I guess.
>>
>> But is it an issue in the patch? Isn't that a pre-existing issue? AFAICS
>> the problem is that we initialize publish_as_relid=relid before the loop
>> over publications, and then just update it. So the first iteration
>> starts with relid, but the second iteration ends with whatever value is
>> set by the first iteration (e.g. the root).
>>
>> So with the example you posted, we start with
>>
>>   publish_as_relid = relid = test_part1
>>
>> but then if the first publication is pubviaroot=true, we update it to
>> parent. And in the second iteration, we fail to find the column filter,
>> because "parent" (publish_as_relid) is not part of the pub2.
>>
>> If we do it in the other order, we leave the publish_as_relid value as
>> is (and find the filter), and then update it in the second iteration
>> (and find the column filter too).
>>
>> Now, this can be resolved by re-calculating the publish_as_relid from
>> scratch in each iteration (start with relid, then maybe update it). But
>> that's just half the story - the issue is there even without column
>> filters. Consider this example:
>>
>> create table t (a int, b int, c int) partition by range (a);
>>
>> create table t_1 partition of t for values from (1) to (10)
>>partition by range (a);
>>
>> create table t_2 partition of t_1 for values from (1) to (10);
>>
>> create publication pub1 for table t(a)
>>   with (PUBLISH_VIA_PARTITION_ROOT);
>>
>> create publication pub2 for table t_1(a)
>>   with (PUBLISH_VIA_PARTITION_ROOT);
>>
>>
>> Now, is you change subscribe to "pub1, pub2" and "pub2, pub1", we'll end
>> up with different publish_as_relid values (t or t_1). Which seems like
>> the same ambiguity issue.
>>
> 
> I think we should fix this existing problem by always using the
> top-most table as publish_as_relid. Basically, we can check, if the
> existing publish_as_relid is an ancestor of a new rel that is going to
> replace it then we shouldn't replace it.

Right, using the topmost relid from all publications seems like the
correct solution.

> However, I think even if we
> fix the existing problem, we will still have the order problem for the
> column filter patch, and to avoid that instead of fetching column
> filters in the publication loop, we should use the final
> publish_as_relid. I think it will have another problem as well if we
> don't use final publish_as_relid which is that sometimes when we
> should not use any filter (say when pubviaroot is true and that
> publication has root partitioned table which has no filter) as per our
> rule of filters for a partitioned table, it can still use some filter
> from the non-root table.
> 

Yeah, the current behavior is just a consequence of how we determine
publish_as_relid now. If we rework that, we should first determine the
relid and then fetch the filter only for that single rel.

>>
>>> *
>>> Fetching column filter info in tablesync.c is quite expensive. It
>>> seems to be using four round-trips to get the complete info whereas
>>> for row-filter we use just one round trip. I think we should try to
>>> get both row filter and column filter info in just one round trip.
>>>
>>
>> Maybe, but I really don't think this is an issue.
>>
> 
> I am not sure but it might matter for small tables. Leaving aside the
> performance issue, I think the current way will get the wrong column
> list in many cases: (a) The ALL TABLES IN SCHEMA case handling won't
> work for partitioned tables when the partitioned table is part of one
> schema and partition table is part of another schema. (b) The handling
> of partition tables in other cases will fetch incorrect lists as it
> tries to fetch the column list of all the partitions in the hierarchy.
> 
> One of my colleagues has even tested these cases both for column
> filters and row filters and we find the behavior of row filter is okay
> whereas for column filter it uses the wrong column list. We will share
> the tests and results with you in a later email. We are trying to
> unify the column filter queries with row filter to make their behavior
> the same and will share the findings once it is done. I hope if we are
> able to achieve this that we will reduce the chances of bugs in this
> area.
> 

OK, I'll take a look at that email.

> Note: I think the first two patches for tests are not required after
> commit ceb57afd3c.
> 

Right. Will remove.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-03-09 Thread Ashutosh Sharma
On Tue, Mar 8, 2022 at 8:31 PM Nitin Jadhav
 wrote:
>
> > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint;
> > > -[ RECORD 1 ]-+-
> > > pid   | 22043
> > > type  | checkpoint
> > > kind  | immediate force wait requested time
> > >
> > > I think the output in the kind column can be displayed as {immediate,
> > > force, wait, requested, time}. By the way these are all checkpoint
> > > flags so it is better to display it as checkpoint flags instead of
> > > checkpoint kind as mentioned in one of my previous comments.
> >
> > I will update in the next patch.
>
> The current format matches with the server log message for the
> checkpoint start in LogCheckpointStart(). Just to be consistent, I
> have not changed the code.
>

See below, how flags are shown in other sql functions like:

ashu@postgres=# select * from heap_tuple_infomask_flags(2304, 1);
raw_flags| combined_flags
-+
 {HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID} | {}
(1 row)

This looks more readable and it's easy to understand for the
end-users.. Further comparing the way log messages are displayed with
the way sql functions display its output doesn't look like a right
comparison to me. Obviously both should show matching data but the way
it is shown doesn't need to be the same. In fact it is not in most of
the cases.

> I have taken care of the rest of the comments in v5 patch for which
> there was clarity.
>

Thank you very much. Will take a look at it later.

--
With Regards,
Ashutosh Sharma.




Re: logical decoding and replication of sequences

2022-03-09 Thread Tomas Vondra
On 3/9/22 12:41, Amit Kapila wrote:
> On Wed, Mar 9, 2022 at 4:14 AM Tomas Vondra
>  wrote:
>>
>> On 3/7/22 22:11, Tomas Vondra wrote:
>>>
>>> I've pushed this simple fix. Not sure it'll fix the assert failures on
>>> skink/locust, though. Given the lack of information it'll be difficult
>>> to verify. So let's wait a bit.
>>>
>>
>> I've done about 5000 runs of 'make check' in test_decoding, on two rpi
>> machines (one armv7, one aarch64). Not a single assert failure :-(
>>
>> How come skink/locust hit that in just a couple runs?
>>
> 
> Is it failed after you pushed a fix? I don't think so or am I missing
> something? I feel even if doesn't occur again it would have been
> better if we had some theory on how it occurred in the first place
> because that would make us feel more confident that we won't have any
> related problem left.
> 

I don't think it failed yet - we have to wait a bit longer to make any
conclusions, though. On skink it failed only twice over 1 month. I agree
it'd be nice to have some theory, but I really don't have one.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-09 Thread Robert Haas
On Wed, Mar 9, 2022 at 6:07 AM Dilip Kumar  wrote:
> Yeah you are correct about the local map, but I am not sure whether we
> can rely on not updating the shared map in the startup process.
> Because how can we guarantee that now or in future the startup process
> can never look into the map?  I agree that it is not connected to the
> database so it doesn't make sense to look into the local map but how
> we are going to ensure the shared map.  Said that I think there are
> only 3 function which must be looking at these maps directly
> RelationMapOidToFilenode(), RelationMapFilenodeToOid() and
> RelationMapUpdateMap() and these functions are called from a very few
> places and I don't think these should be called during recovery.  So
> probably we can put a elog saying they should never be called during
> recovery?

Yeah, that seems reasonable.

> Right, infact now also if you see the logic, the
> write_relmap_file_internal() is taking care of the actual update and
> the write_relmap_file() is doing update + setting the global
> variables.  So yeah we can rename as you suggested in 0001 and here
> also we can change the logic as you suggested that the recovery and
> createdb will only call the first function which is just doing the
> update.

But I think we want the path construction to be managed by the
function rather than the caller, too.

> I'd expect to
> > find it in storage.c, I think. And I think I'd be surprised to find
> > out that it doesn't actually know anything about copying; it's
> > basically just a loop over the forks to which you can supply your own
> > copy-function.
>
> Yeah but it eventually expects a function pointer to copy storage so
> we can not completely deny that it knows nothing about the copy?

Sure, I guess. It's just not obvious why the argument would actually
need to be a function that copies storage, or why there's more than
one way to copy storage. I'd rather keep all the code paths unified,
if we can, and set behavior via flags or something, maybe. I'm not
sure whether that's realistic, though.

> And the fact that it's got an argument of type
> > copy_relation_storage and the argument name is copy_storage and the
> > value is sometimes RelationCopyStorageis a terminological muddle, too.
> > So I feel like perhaps this needs more thought.
>
> One option is that we can duplicate this loop in dbcommand.c as well
> like we are having it already in tablecmds.c and heapam_handler.c?

Yeah, I think this is also worth considering.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: wal_compression=zstd

2022-03-09 Thread Justin Pryzby
On Fri, Mar 04, 2022 at 05:44:06AM -0600, Justin Pryzby wrote:
> On Fri, Mar 04, 2022 at 04:19:32PM +0900, Michael Paquier wrote:
> > On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote:
> > 
> > > As writen, this patch uses zstd level=1 (whereas the ZSTD's default 
> > > compress
> > > level is 6).
> > 
> > Why?  ZSTD using this default has its reasons, no?  And it would be
> > consistent to do the same for ZSTD as for the other two methods.
> 
> In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the
> cost.

Actually, my test used zstd-6, rather than the correct default of 3.

The comparison should have been:

postgres=# SET wal_compression='zstd-1';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; 
SELECT pg_stat_reset_shared('wal'); begin; CREATE INDEX ON t(a); rollback; 
SELECT * FROM pg_stat_wal;
Time: 2074.046 ms (00:02.074)
2763 |2758 |   6343591 |0 | 5 |5 |  
0 | 0 | 2022-03-05 05:04:08.599867-06


vs

postgres=# SET wal_compression='zstd-3';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; 
SELECT pg_stat_reset_shared('wal'); begin; CREATE INDEX ON t(a); rollback; 
SELECT * FROM pg_stat_wal;
Time: 2471.552 ms (00:02.472)
 wal_records | wal_fpi | wal_bytes | wal_buffers_full | wal_write | wal_sync | 
wal_write_time | wal_sync_time |  stats_reset
-+-+---+--+---+--++---+---
2762 |2746 |   6396890 |  274 |   274 |0 |  
0 | 0 | 2022-03-05 05:04:31.283432-06

=> zstd-1 actually wrote less than zstd-3 (which is odd) but by an
insignificant amount.  It's no surprise that zstd-1 is faster than zstd-3, but
(of course) by a smaller amount than zstd-6.

Anyway there's no compelling reason to not use the default.  If we were to use
a non-default default, we'd have to choose between 1 and 2 (or some negative
compression level).  My thinking was that zstd-1 would give the lowest-hanging
fruits for zstd, while minimizing performance tradeoff, since WAL affects
interactivity.  But choosing between 1 and 2 seems like bikeshedding.




Re: role self-revocation

2022-03-09 Thread Robert Haas
On Wed, Mar 9, 2022 at 7:55 AM Peter Eisentraut
 wrote:
> Do we have subtractive permissions today?

Not in the GRANT/REVOKE sense, I think, but you can put a user in a
group and then mention that group in pg_hba.conf. And that line might
be "reject" or whatever.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Synchronizing slots from primary to standby

2022-03-09 Thread Ashutosh Sharma
Hi,

I have spent little time trying to understand the concern raised by
Andres and while doing so I could think of a couple of issues which I
would like to share here. Although I'm not quite sure how inline these
are with the problems seen by Andres.

1) Firstly, what if we come across a situation where the failover
occurs when the confirmed flush lsn has been updated on primary, but
is yet to be updated on the standby? I believe this may very well be
the case especially considering that standby sends sql queries to the
primary to synchronize the replication slots at regular intervals and
if the primary dies just after updating the confirmed flush lsn of its
logical subscribers then the standby may not be able to get this
information/update from the primary which means we'll probably end up
having a broken logical replication slot on the new primary.

2) Secondly, if the standby goes down, the logical subscribers will
stop receiving new changes from the primary as per the design of this
patch OR if standby lags behind the primary for whatever reason, it
will have a direct impact on logical subscribers as well.

--
With Regards,
Ashutosh Sharma.

On Sat, Feb 19, 2022 at 3:53 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-02-11 15:28:19 +0100, Peter Eisentraut wrote:
> > On 05.02.22 20:59, Andres Freund wrote:
> > > On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
> > > >  From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
> > > > From: Peter Eisentraut
> > > > Date: Mon, 3 Jan 2022 14:43:36 +0100
> > > > Subject: [PATCH v3] Synchronize logical replication slots from primary 
> > > > to
> > > >   standby
> > > I've just skimmed the patch and the related threads. As far as I can tell 
> > > this
> > > cannot be safely used without the conflict handling in [1], is that 
> > > correct?
> >
> > This or similar questions have been asked a few times about this or similar
> > patches, but they always come with some doubt.
>
> I'm certain it's a problem - the only reason I couched it was that there could
> have been something clever in the patch preventing problems that I missed
> because I just skimmed it.
>
>
> > If we think so, it would be
> > useful perhaps if we could come up with test cases that would demonstrate
> > why that other patch/feature is necessary.  (I'm not questioning it
> > personally, I'm just throwing out ideas here.)
>
> The patch as-is just breaks one of the fundamental guarantees necessary for
> logical decoding, that no rows versions can be removed that are still required
> for logical decoding (signalled via catalog_xmin). So there needs to be an
> explicit mechanism upholding that guarantee, but there is not right now from
> what I can see.
>
> One piece of the referenced patchset is that it adds information about removed
> catalog rows to a few WAL records, and then verifies during replay that no
> record can be replayed that removes resources that are still needed. If such a
> conflict exists it's dealt with as a recovery conflict.
>
> That itself doesn't provide prevention against removal of required, but it
> provides detection. The prevention against removal can then be done using a
> physical replication slot with hot standby feedback or some other mechanism
> (e.g. slot syncing mechanism could maintain a "placeholder" slot on the
> primary for all sync targets or something like that).
>
> Even if that infrastructure existed / was merged, the slot sync stuff would
> still need some very careful logic to protect against problems due to
> concurrent WAL replay and "synchronized slot" creation. But that's doable.
>
> Greetings,
>
> Andres Freund
>
>




Re: role self-revocation

2022-03-09 Thread Peter Eisentraut

On 07.03.22 19:18, Robert Haas wrote:

That all said, permissions SHOULD BE strictly additive.  If boss doesn't want 
to be a member of pg_read_all_files allowing them to revoke themself from that 
role seems like it should be acceptable.  If there is fear in allowing someone 
to revoke (not add) themselves as a member of a different role that suggests we 
have a design issue in another feature of the system.  Today, they neither 
grant nor revoke, and the self-revocation doesn't seem that important to add.

I disagree with this on principle, and I also think that's not how it
works today. On the general principle, I do not see a compelling
reason why we should have two systems for maintaining groups of users,
one of which is used for additive things and one of which is used for
subtractive things.


Do we have subtractive permissions today?





Re: Time to drop plpython2?

2022-03-09 Thread Andrew Dunstan


On 3/8/22 15:02, Andres Freund wrote:
> Hi,
>
> On 2022-03-08 10:42:31 -0800, Andres Freund wrote:
>>> crake also failed. Looks like plpy_plpymodule.h needs to include 
>>> plpython.h. A
>>> pre-existing issue that just didn't happen to cause problems...
>> Fixed that.
> Hm. Now crake failed in XversionUpgrade-REL9_2_STABLE-HEAD:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake=2022-03-08%2018%3A47%3A22
>
> except that the log doesn't actually indicate any problem? Andrew, any hint?
>


That was a snafu on my part, as I was adding extension upgrade / amcheck
testing. It's fixed now, so please disregard this one.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [RFC] building postgres with meson -v6

2022-03-09 Thread Peter Eisentraut

On 08.03.22 03:56, Andres Freund wrote:

Attached is v6 of the meson patchset. There are a lots of changes since the
last version posted. These include:
- python2 removal is now committed, so not needed in here anymore
- CI changed to be based on the CI now merged into postgres
- CI also tests suse, rhel, fedora (Nazir Bilal Yavuz). Found several bugs. I
   don't think we'd merge all of those, but while working on the meson branch,
   it's really useful.
- all dependencies, except for pl/tcl (should be done soon)
- several missing options added (segsize, extra_{lib,include}_dirs, 
enable-tap-tests
- several portability fixes, builds on net/openbsd without changes now
- improvements to a number of "configure" tests
- lots of ongoing rebasing changes
- ...


I looked at this today mainly to consider whether some of the prereq
work is ready for adoption now.  A lot of the work has to do with
making various scripts write the output to other directories.  I
suspect this has something to do with how meson handles separate build
directories and how we have so far handled files created in the
distribution tarball.  But the whole picture isn't clear to me.

More generally, I don't see a distprep target in the meson build
files.  I wonder what your plan for that is, or whether that would
even work under meson.  In [0], I argued for getting rid of the
distprep step.  Perhaps it is time to reconsider that now.

[0]: 
https://www.postgresql.org/message-id/flat/cf0bec33-d965-664d-e0ec-fb15290f2273%402ndquadrant.com


For the short term, I think the patches 0002, 0008, 0010, and 0011
could be adopted, if they are finished as described.

Patch 0007 seems unrelated, or at least independently significant, and
should be discussed separately.

The rest is really all part of the same
put-things-in-the-right-directory issue.

For the overall patch set, I did a quick test with

meson setup build
cd build
ninja

which failed with

Undefined symbols for architecture x86_64:
  "_bbsink_zstd_new", referenced from:
  _SendBaseBackup in replication_basebackup.c.o

So maybe your patch set is not up to date with this new zstd build
option.


Details:

v6-0001-meson-prereq-output-and-depencency-tracking-work.patch.gz

This all looks kind of reasonable, but lacks explanation in some
cases, so I can't fully judge it.

v6-0002-meson-prereq-move-snowball_create.sql-creation-in.patch.gz

Looks like a reasonable direction, would be good to deduplicate with
Install.pm.

v6-0003-meson-prereq-add-output-path-arg-in-generate-lwlo.patch.gz

Ok.  Similar to 0001.  (But unlike 0001, nothing in this patch
actually uses the new output dir option.  That only comes in 0013.)

v6-0004-meson-prereq-add-src-tools-gen_versioning_script..patch.gz

This isn't used until 0013, and there it is patched again, so I'm not
sure if this is in the right position of this patch series.

v6-0005-meson-prereq-generate-errcodes.pl-accept-output-f.patch.gz

Also similar to 0001.

v6-0006-meson-prereq-remove-unhelpful-chattiness-in-snowb.patch.gz

Might as well include this into 0002.

v6-0007-meson-prereq-Can-we-get-away-with-not-export-all-.patch.gz

This is a separate discussion.  It's not clear to me why this is part
of this patch series.

v6-0008-meson-prereq-Handle-DLSUFFIX-in-msvc-builds-simil.patch.gz

Part of this was already done in 0001, so check if these patches are
split correctly.

I think the right way here is actually to go the other way around:
Move DLSUFFIX into header files for all platforms.  Move the DLSUFFIX
assignment from src/makefiles/ to src/templates/, have configure read
it, and then substitute it into Makefile.global and pg_config.h.

Then we also don't have to patch the Windows build code a bunch of
times to add the DLSUFFIX define everywhere.

There is code in configure already that would benefit from this, which
currently says

# We don't know the platform DLSUFFIX here, so check 'em all.

v6-0009-prereq-make-unicode-targets-work-in-vpath-builds.patch.gz

Another directory issue

v6-0010-ldap-tests-Don-t-run-on-unsupported-operating-sys.patch.gz

Not sure what this is supposed to do, but it looks independent of this
patch series.  Does it currently not work on "unsupported" operating
systems?

v6-0011-ldap-tests-Add-paths-for-openbsd.patch.gz

The more the merrier, although I'm a little bit worried about pointing
to a /usr/local/share/examples/ directory.

v6-0012-wip-split-TESTDIR-into-two.patch.gz
v6-0013-meson-Add-meson-based-buildsystem.patch.gz
v6-0014-meson-ci-Build-both-with-meson-and-as-before.patch.gz

I suggest in the interim to add a README.meson to show how to invoke
this.  Eventually, of course, we'd rewrite the installation
instructions.




Re: logical decoding and replication of sequences

2022-03-09 Thread Amit Kapila
On Wed, Mar 9, 2022 at 4:14 AM Tomas Vondra
 wrote:
>
> On 3/7/22 22:11, Tomas Vondra wrote:
> >
> > I've pushed this simple fix. Not sure it'll fix the assert failures on
> > skink/locust, though. Given the lack of information it'll be difficult
> > to verify. So let's wait a bit.
> >
>
> I've done about 5000 runs of 'make check' in test_decoding, on two rpi
> machines (one armv7, one aarch64). Not a single assert failure :-(
>
> How come skink/locust hit that in just a couple runs?
>

Is it failed after you pushed a fix? I don't think so or am I missing
something? I feel even if doesn't occur again it would have been
better if we had some theory on how it occurred in the first place
because that would make us feel more confident that we won't have any
related problem left.

-- 
With Regards,
Amit Kapila.




Re: Reducing power consumption on idle servers

2022-03-09 Thread Simon Riggs
On Sat, 26 Feb 2022 at 17:44, Tom Lane  wrote:
>
> Magnus Hagander  writes:
> >> Deprecating explicit file-based promotion is possible and simple, so
> >> that is the approach in the latest version of the patch.
>
> > Is there any actual use-case for this other than backwards
> > compatibility?
>
> The fundamental problem with signal-based promotion is that it's
> an edge-triggered rather than level-triggered condition, ie if you
> miss the single notification event you're screwed.  I'm not sure
> why we'd want to go there, considering all the problems we're having
> with edge-triggered APIs in nearby threads.
>
> We could combine the two approaches, perhaps: have "pg_ctl promote"
> create a trigger file and then send a signal.  The trigger file
> would record the existence of the promotion request, so that if the
> postmaster isn't running right now or misses the signal, it'd still
> promote when restarted or otherwise at the next opportunity.
> But with an approach like this, we could expect quick response to
> the signal in normal conditions, without need for constant wakeups
> to check for the file.  Also, this route would allow overloading
> of the signal, since it would just mean "wake up, I asked you to
> do something" rather than needing to carry the full semantics of
> what is to be done.

The above is how this works now, luckily, but few comments explain why.

This current state evolved because I first added file-based promotion,
for the reasons you give, but it was slow, so the signal approach was
added later, with new API.

What we are discussing deprecating is the ability to create the
trigger file directly (the original API). Once that is gone the
mechanism would be to request promotion via pg_ctl promote (the new
API), which then creates the trigger file and sends a signal. So in
both APIs the trigger file is still used.

In this patch, if you create the trigger file
directly/explicitly/yourself (the old API) then it will still trigger
when hibernating, but it will be slower. The new API is unaffected by
this change.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Changing "Hot Standby" to "hot standby"

2022-03-09 Thread Daniel Westermann (DWE)
Hi Michael,

>On Wed, Mar 09, 2022 at 07:45:32AM +, Daniel Westermann (DWE) wrote:
>> Thanks for having a look. Done that way.

>Hmm.  Outside the title that had better use upper-case characters for
>the first letter of each word, I can see references to the pattern you
>are trying to eliminate in amcheck.sgml (1), config.sgml (3),
>protocol.sgml (3) and mvcc.sgml (1).  Shouldn't you refresh these as
>well if the point is to make the full set of docs consistent?

>As of the full tree, I can see that:
>
>$ git grep "hot standby" | wc -l
>259

>$ git grep "Hot Standby" | wc -l
>73

>So there is a trend for one of the two.

Thanks for looking at it. Yes, I am aware there are other places which would 
need to be changed and I think I mentioned that in an earlier Email. Are you 
suggesting to change all at once? I wanted to start with the documentation and 
then continue with the other places.

Regards
Daniel



Re: Optionally automatically disable logical replication subscriptions on error

2022-03-09 Thread Amit Kapila
On Wed, Mar 9, 2022 at 2:21 PM Masahiko Sawada  wrote:
>
> On Wed, Mar 9, 2022 at 4:33 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Tuesday, March 8, 2022 10:23 PM Amit Kapila  
> > wrote:
> > > On Tue, Mar 8, 2022 at 1:37 PM osumi.takami...@fujitsu.com
> > >  wrote:
> > > >
> >
> >
> > > 2. Is there a reason the patch doesn't allow workers to restart via
> > > maybe_reread_subscription() when this new option is changed, if so, then 
> > > let's
> > > add a comment for the same? We currently seem to be restarting the worker 
> > > on
> > > any change via Alter Subscription. If we decide to change it for this 
> > > option as
> > > well then I think we need to accordingly update the current comment: 
> > > "Exit if
> > > any parameter that affects the remote connection was changed." to 
> > > something
> > > like "Exit if any parameter that affects the remote connection or a 
> > > subscription
> > > option was changed..."
> > I thought it's ok without the change at the beginning, but I was wrong.
> > To make this new option aligned with others, I should add one check
> > for this feature. Fixed.
>
> Why do we need to restart the apply worker when disable_on_error is
> changed? It doesn't affect the remote connection at all. I think it
> can be changed without restarting like synchronous_commit option.
>

oh right, I thought that how will we update its value in
MySubscription after a change but as we re-read the pg_subscription
table for the current subscription and update MySubscription, I feel
we don't need to restart it. I haven't tested it but it should work
without a restart.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Double declaration in pg_upgrade.h

2022-03-09 Thread Peter Eisentraut

On 09.03.22 11:29, Pavel Borisov wrote:

I've noticed that check_ok() in pg_upgrade.h has been declared two times.
Here's a one-line patch correcting this.


Fixed, thanks.




Re: Add pg_freespacemap extension sql test

2022-03-09 Thread Dong Wook Lee
2022년 3월 9일 (수) 오전 1:19, Tom Lane 님이 작성:

> Dong Wook Lee  writes:
> > [ 0001_add_test_pg_fsm.patch ]
>
> I think having some coverage here would be great, but I'm concerned that
> this patch doesn't look very portable.  Aren't the numbers liable to
> change on 32-bit machines, in particular?
>
> regards, tom lane
>

I agree with you, but I have no good idea how to deal with it.
Can the Perl TAP test be a good way?
Thought?


Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-09 Thread Dilip Kumar
On Wed, Mar 9, 2022 at 3:12 AM Robert Haas  wrote:
>
Thanks for reviewing and valuable feedback.

> Reviewing 0001, the boundaries of the critical section move slightly,
> but only over a memcpy, which can't fail, so that seems fine. But this
> comment looks ominous:
>
>  * Note: we're cheating a little bit here by assuming that mapped files
>  * are either in pg_global or the database's default tablespace.
>
> It's not clear to me how the code that follows relies on this
> assumption, but the overall patch set would make that not true any
> more, so there's some kind of an issue to think about there.

I think the comments are w.r.t choosing the file path, because here we
assume either it is in the global tablespace or default tablespace of
the database.  Here also the comment is partially true because we also
assume that it will be in the default tablespace of the database
(because we do not need to worry about the shared relations).  But I
think this comments can move to the caller function where we are
creating the file path.

if (shared)
{
snprintf(mapfilename, sizeof(mapfilename), "global/%s",
RELMAPPER_FILENAME);
}
else
{
snprintf(mapfilename, sizeof(mapfilename), "%s/%s",
dbpath, RELMAPPER_FILENAME);
}

> It's a little asymmetric that load_relmap_file() gets a subroutine
> read_relmap_file() while write_relmap_file() gets a subroutine
> write_relmap_file_internal(). Perhaps we could call the functions
> {load,write}_named_relmap_file() or something of that sort.

Yeah this should be changed.

> Reviewing 0002, your comment updates in relmap_redo() are not
> complete. Note that there's an unmodified comment that says "Write out
> the new map and send sinval" just above where you modify the code to
> only conditionally send sinval. I'm somewhat uncomfortable with the
> shape of this logic, too. It looks weird to be sometimes calling
> write_relmap_file and sometimes write_relmap_file_internal. You'd
> expect functions with those names to be called at different
> abstraction levels, rather than at parallel call sites. The renaming I
> proposed would help with this but it's not just a cosmetic issue: the
> logic to construct mapfilename is in this function in one case, but in
> the called function in the other case. I can't help but think that the
> write_relmap_file()/write_relmap_file_internal() split isn't entirely
> the right thing.
>
> I think part of the confusion here is that, pre-patch,
> write_relmap_file() gets called during both recovery and normal
> running, and the updates to shared_map or local_map are actually
> nonsense during recovery, because the local map at least is local to
> whatever our database is, and we don't have a database connection if
> we're the startup process.

Yeah you are correct about the local map, but I am not sure whether we
can rely on not updating the shared map in the startup process.
Because how can we guarantee that now or in future the startup process
can never look into the map?  I agree that it is not connected to the
database so it doesn't make sense to look into the local map but how
we are going to ensure the shared map.  Said that I think there are
only 3 function which must be looking at these maps directly
RelationMapOidToFilenode(), RelationMapFilenodeToOid() and
RelationMapUpdateMap() and these functions are called from a very few
places and I don't think these should be called during recovery.  So
probably we can put a elog saying they should never be called during
recovery?

After your patch, we're still going through
> write_relmap_file in recovery in some cases, but really those map
> updates don't seem like things that should be happening at all. And on
> the other hand it's not clear to me why the CRC stuff isn't needed in
> all cases, but that's only going to happen when we go through the
> non-internal version of the function. You've probably spent more time
> looking at this code than I have, but I'm wondering if the division
> should be like this: we have one function that does the actual update,
> and another function that does the update plus sets global variables.
> Recovery always uses the first one, and outside of recovery we use the
> first one for the create-database case and the second one otherwise.
> Thoughts?

Right, infact now also if you see the logic, the
write_relmap_file_internal() is taking care of the actual update and
the write_relmap_file() is doing update + setting the global
variables.  So yeah we can rename as you suggested in 0001 and here
also we can change the logic as you suggested that the recovery and
createdb will only call the first function which is just doing the
update.


> Regarding 0003, my initial thought was to like the fact that you'd
> avoided duplicating code by using a function parameter, but as I look
> at it a bit more, it's not clear to me that it's enough code that we
> really care about not duplicating it. I would not expect to find a
> function called RelationCopyAllFork() in 

Re: parse/analyze API refactoring

2022-03-09 Thread Peter Eisentraut

On 28.02.22 19:51, Nathan Bossart wrote:

On Mon, Feb 28, 2022 at 07:46:40AM +0100, Peter Eisentraut wrote:

You set this commit fest entry to Waiting on Author, but there were no
reviews posted and the patch still applies and builds AFAICT, so I don't
know what you meant by that.


Apologies for the lack of clarity.  I believe my only feedback was around
deduplicating the pg_analyze_and_rewrite_*() functions.  Would you rather
handle that in a separate patch?


I have committed my original patches.  I'll leave the above-mentioned 
topic as ideas for the future.





[PATCH] Double declaration in pg_upgrade.h

2022-03-09 Thread Pavel Borisov
Hi, hackers!
I've noticed that check_ok() in pg_upgrade.h has been declared two times.
Here's a one-line patch correcting this.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


v1-0001-Fix-double-declaration-for-check_ok-in-pg_upgrade.patch
Description: Binary data


RE: Column Filtering in Logical Replication

2022-03-09 Thread houzj.f...@fujitsu.com
On Wednesday, March 9, 2022 6:04 PM Amit Kapila 
> On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
>  wrote:
> >
> > On 3/4/22 11:42, Amit Kapila wrote:
> >
> > > *
> > > Fetching column filter info in tablesync.c is quite expensive. It
> > > seems to be using four round-trips to get the complete info whereas
> > > for row-filter we use just one round trip. I think we should try to
> > > get both row filter and column filter info in just one round trip.
> > >
> >
> > Maybe, but I really don't think this is an issue.
> >
> 
> I am not sure but it might matter for small tables. Leaving aside the
> performance issue, I think the current way will get the wrong column list in
> many cases: (a) The ALL TABLES IN SCHEMA case handling won't work for
> partitioned tables when the partitioned table is part of one schema and
> partition table is part of another schema. (b) The handling of partition 
> tables in
> other cases will fetch incorrect lists as it tries to fetch the column list 
> of all the
> partitions in the hierarchy.
> 
> One of my colleagues has even tested these cases both for column filters and
> row filters and we find the behavior of row filter is okay whereas for column
> filter it uses the wrong column list. We will share the tests and results 
> with you
> in a later email. We are trying to unify the column filter queries with row 
> filter to
> make their behavior the same and will share the findings once it is done. I 
> hope
> if we are able to achieve this that we will reduce the chances of bugs in 
> this area.
> 
> Note: I think the first two patches for tests are not required after commit
> ceb57afd3c.

Hi,

Here are some tests and results about the table sync query of
column filter patch and row filter.

1) multiple publications which publish schema of parent table and partition.
pub
create schema s1;
create table s1.t (a int, b int, c int) partition by range (a);
create table t_1 partition of s1.t for values from (1) to (10);
create publication pub1 for all tables in schema s1;
create publication pub2 for table t_1(b);

sub
- prepare tables
CREATE SUBSCRIPTION sub CONNECTION 'port=1 dbname=postgres' PUBLICATION 
pub1, pub2;

When doing table sync for 't_1', the column list will be (b). I think it should
be no filter because table t_1 is also published via ALL TABLES IN SCHMEA
publication.

For Row Filter, it will use no filter for this case.


2) one publication publishes both parent and child
pub
create table t (a int, b int, c int) partition by range (a);
create table t_1 partition of t for values from (1) to (10)
   partition by range (a);
create table t_2 partition of t_1 for values from (1) to (10);

create publication pub2 for table t_1(a), t_2
  with (PUBLISH_VIA_PARTITION_ROOT);

sub
- prepare tables
CREATE SUBSCRIPTION sub CONNECTION 'port=1 dbname=postgres' PUBLICATION 
pub2;

When doing table sync for table 't_1', it has no column list. I think the
expected column list is (a).

For Row Filter, it will use the row filter of the top most parent table(t_1) in
this case.


3) one publication publishes both parent and child
pub
create table t (a int, b int, c int) partition by range (a);
create table t_1 partition of t for values from (1) to (10)
   partition by range (a);
create table t_2 partition of t_1 for values from (1) to (10);

create publication pub2 for table t_1(a), t_2(b)
  with (PUBLISH_VIA_PARTITION_ROOT);

sub
- prepare tables
CREATE SUBSCRIPTION sub CONNECTION 'port=1 dbname=postgres' PUBLICATION 
pub2;

When doing table sync for table 't_1', the column list would be (a, b). I think
the expected column list is (a).

For Row Filter, it will use the row filter of the top most parent table(t_1) in
this case.

Best regards,
Hou zj


Re: cpluspluscheck complains about use of register

2022-03-09 Thread Fabien COELHO




It seems we should just remove the use of register?


I have a vague idea that it was once important to say "register" if
you are going to use the variable in an asm snippet that requires it
to be in a register.  That might be wrong, or it might be obsolete
even if once true.  We could try taking these out and seeing if the
buildfarm complains.


We have several inline asm statements not using register despite using
variables in a register (e.g. pg_atomic_compare_exchange_u32_impl()), so I
wouldn't expect a problem with compilers we support.

Should we make configure test for -Wregister? There's at least one additional
use of register that we'd have to change (pg_regexec).


From a compilation perspective, "register" tells the compiler that you 
cannot have a pointer on a variable, i.e. it generates an error if someone 
adds something like:


   void * p = _variable;

Removing the "register" declaration means that such protection would be 
removed, and creating such a pointer could reduce drastically compiler 
optimization opportunities.


--
Fabien.




Re: RFC: Logging plan of the running query

2022-03-09 Thread torikoshia

On 2022-02-08 01:13, Fujii Masao wrote:

AbortSubTransaction() should reset ActiveQueryDesc to
save_ActiveQueryDesc that ExecutorRun() set, instead of NULL?
Otherwise ActiveQueryDesc of top-level statement will be unavailable
after subtransaction is aborted in the nested statements.


I once agreed above suggestion and made v20 patch making 
save_ActiveQueryDesc a global variable, but it caused segfault when 
calling pg_log_query_plan() after FreeQueryDesc().


OTOH, doing some kind of reset of ActiveQueryDesc seems necessary since 
it also caused segfault when running pg_log_query_plan() during 
installcheck.


There may be a better way, but resetting ActiveQueryDesc to NULL seems 
safe and simple.
Of course it makes pg_log_query_plan() useless after a subtransaction is 
aborted.
However, if it does not often happen that people want to know the 
running query's plan whose subtransaction is aborted, resetting 
ActiveQueryDesc to NULL would be acceptable.


Attached is a patch that sets ActiveQueryDesc to NULL when a 
subtransaction is aborted.


How do you think?

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 5be784278e8e7aeeeadf60a772afccda7b59e6e4 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Wed, 9 Mar 2022 18:18:06 +0900
Subject: [PATCH v21] Add function to log the plan of the query currently
 running on the backend.

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.

Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar,
Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro
Horiguchi, Robert Treat
---
 doc/src/sgml/func.sgml   |  49 +++
 src/backend/access/transam/xact.c|  13 ++
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/explain.c   | 140 ++-
 src/backend/executor/execMain.c  |  10 ++
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/lmgr/lock.c  |   9 +-
 src/backend/tcop/postgres.c  |   4 +
 src/backend/utils/init/globals.c |   1 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/explain.h   |   3 +
 src/include/miscadmin.h  |   1 +
 src/include/storage/lock.h   |   2 -
 src/include/storage/procsignal.h |   1 +
 src/include/tcop/pquery.h|   2 +
 src/test/regress/expected/misc_functions.out |  54 +--
 src/test/regress/sql/misc_functions.sql  |  41 --
 17 files changed, 314 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8a802fb225..075056 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25461,6 +25461,25 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_query_plan
+
+pg_log_query_plan ( pid integer )
+boolean
+   
+   
+Requests to log the plan of the query currently running on the
+backend with specified process ID.
+It will be logged at LOG message level and
+will appear in the server log based on the log
+configuration set (See 
+for more information), but will not be sent to the client
+regardless of .
+
+  
+
   

 
@@ -25574,6 +25593,36 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_query_plan can be used
+to log the plan of a backend process. For example:
+
+postgres=# SELECT pg_log_query_plan(201116);
+ pg_log_query_plan
+---
+ t
+(1 row)
+
+The format of the query plan is the same as when VERBOSE,
+COSTS, SETTINGS and
+FORMAT TEXT are used in the EXPLAIN
+command. For example:
+
+LOG:  query plan running on backend with PID 201116 is:
+Query Text: SELECT * FROM pgbench_accounts;
+Seq Scan on public.pgbench_accounts  (cost=0.00..52787.00 rows=200 width=97)
+  Output: aid, bid, abalance, filler
+Settings: work_mem = '1MB'
+
+Note that when statements are executed inside a function, only the
+plan of the most deeply nested query is logged.

Re: Column Filtering in Logical Replication

2022-03-09 Thread Amit Kapila
On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
 wrote:
>
> On 3/4/22 11:42, Amit Kapila wrote:
> > On Wed, Mar 2, 2022 at 5:43 PM Tomas Vondra
> >  wrote:
> >>
> >> Attached is an updated patch, addressing most of the issues reported so
> >> far. There are various minor tweaks, but the main changes are:
> > ...
> >>
> >> 3) checks of column filter vs. publish_via_partition_root and replica
> >> identity, following the same logic as the row-filter patch (hopefully,
> >> it touches the same places, using the same logic, ...)
> >>
> >> That means - with "publish_via_partition_root=false" it's not allowed to
> >> specify column filters on partitioned tables, only for leaf partitions.
> >>
> >> And we check column filter vs. replica identity when adding tables to
> >> publications, or whenever we change the replica identity.
> >>
> >
> > This handling is different from row filter work and I see problems
> > with it.
>
> By different, I assume you mean I tried to enfoce the rules in ALTER
> PUBLICATION and other ALTER commands, instead of when modifying the
> data?
>

Yes.

> OK, I reworked this to do the same thing as the row filtering patch.
>

Thanks, I'll check this.

> > The column list validation w.r.t primary key (default replica
> > identity) is missing. The handling of column list vs. partitions has
> > multiple problems: (a) In attach partition, the patch is just checking
> > ancestors for RI validation but what if the table being attached has
> > further subpartitions; (b) I think the current locking also seems to
> > have problems because it is quite possible that while it validates the
> > ancestors here, concurrently someone changes the column list. I think
> > it won't be enough to just change the locking mode because with the
> > current patch strategy during attach, we will be first taking locks
> > for child tables of current partition and then parent tables which can
> > pose deadlock hazards.
> > > The columns list validation also needs to be done when we change
> > publication action.
> >
> I believe those issues should be solved by adopting the same approach as
> the row-filtering patch, right?
>

Right.

>
> > Some other miscellaneous comments:
> > =
> > *
> > In get_rel_sync_entry(), the handling for partitioned tables doesn't
> > seem to be correct. It can publish a different set of columns based on
> > the order of publications specified in the subscription.
> >
> > For example:
> > 
> > create table parent (a int, b int, c int) partition by range (a);
> > create table test_part1 (like parent);
> > alter table parent attach partition test_part1 for values from (1) to (10);
> >
> > create publication pub for table parent(a) with 
> > (PUBLISH_VIA_PARTITION_ROOT);
> > create publication pub2 for table test_part1(b);
> > ---
> >
> > Now, depending on the order of publications in the list while defining
> > subscription, the column list will change
> > 
> > create subscription sub connection 'port=1 dbname=postgres'
> > publication pub, pub2;
> >
> > For the above, column list will be: (a)
> >
> > create subscription sub connection 'port=1 dbname=postgres'
> > publication pub2, pub;
> >
> > For this one, the column list will be: (a, b)
> > 
> >
> > To avoid this, the column list should be computed based on the final
> > publish_as_relid as we are doing for the row filter.
> >
>
> Hmm, yeah. That seems like a genuine problem - it should not depend on
> the order of publications in the subscription, I guess.
>
> But is it an issue in the patch? Isn't that a pre-existing issue? AFAICS
> the problem is that we initialize publish_as_relid=relid before the loop
> over publications, and then just update it. So the first iteration
> starts with relid, but the second iteration ends with whatever value is
> set by the first iteration (e.g. the root).
>
> So with the example you posted, we start with
>
>   publish_as_relid = relid = test_part1
>
> but then if the first publication is pubviaroot=true, we update it to
> parent. And in the second iteration, we fail to find the column filter,
> because "parent" (publish_as_relid) is not part of the pub2.
>
> If we do it in the other order, we leave the publish_as_relid value as
> is (and find the filter), and then update it in the second iteration
> (and find the column filter too).
>
> Now, this can be resolved by re-calculating the publish_as_relid from
> scratch in each iteration (start with relid, then maybe update it). But
> that's just half the story - the issue is there even without column
> filters. Consider this example:
>
> create table t (a int, b int, c int) partition by range (a);
>
> create table t_1 partition of t for values from (1) to (10)
>partition by range (a);
>
> create table t_2 partition of t_1 for values from (1) to (10);
>
> create publication pub1 for table t(a)
>   with (PUBLISH_VIA_PARTITION_ROOT);
>
> create publication pub2 for table t_1(a)
>   with 

Re: Time to drop plpython2?

2022-03-09 Thread Peter Eisentraut

On 08.03.22 20:03, Andres Freund wrote:

Hi,

On 2022-03-08 13:49:15 -0500, Tom Lane wrote:

Andres Freund  writes:

A bit depressing to have a 500 line alternative output file for a one line
diff :(.


Yeah.  How badly do we need that particular test case?


A bit hard to tell. The test was introduced in

commit 2bd78eb8d51cc9ee03ba0287b23ff4c266dcd9b9
Author: Peter Eisentraut 
Date:   2011-04-06 22:36:06 +0300

 Add traceback information to PL/Python errors


We would probably try to write this test differently today, but at this 
point I wouldn't bother and just wait for Python 3.5 to fall off the end 
of the conveyor belt.





Re: Column Filtering in Logical Replication

2022-03-09 Thread Peter Eisentraut


On 07.03.22 16:18, Tomas Vondra wrote:

AFAICS these issues should be resolved by the adoption of the row-filter
approach (i.e. it should fail the same way as for row filter).


The first two patches (additional testing for row filtering feature) 
look okay to me.


Attached is a fixup patch for your main feature patch (the third one).

It's a bit of code and documentation cleanup, but mainly I removed the 
term "column filter" from the patch.  Half the code was using "column 
list" or similar and half the code "column filter", which was confusing. 
 Also, there seemed to be a bit of copy-and-pasting from row-filter 
code going on, with some code comments not quite sensible, so I rewrote 
some of them.  Also some code used "rf" and "cf" symbols which were a 
bit hard to tell apart.  A few more letters can increase readability.


Note in publicationcmds.c OpenTableList() the wrong if condition was used.

I'm still confused about the intended replica identity handling.  This 
patch still checks whether the column list contains the replica identity 
at DDL time.  And then it also checks at execution time.  I thought the 
latest understanding was that the DDL-time checking would be removed.  I 
think it's basically useless now, since as the test cases show, you can 
subvert those checks by altering the replica identity later.From d0e9df4674389cda9f891f5678f476d35095c618 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 8 Mar 2022 16:23:01 +0100
Subject: [PATCH] fixup! Allow specifying column filters for logical
 replication

---
 doc/src/sgml/catalogs.sgml  |   4 +-
 doc/src/sgml/protocol.sgml  |   5 +-
 doc/src/sgml/ref/alter_publication.sgml |   4 +
 src/backend/catalog/pg_publication.c|  20 ++-
 src/backend/commands/publicationcmds.c  |  80 ++--
 src/backend/executor/execReplication.c  |   8 +-
 src/backend/replication/logical/proto.c |  21 +--
 src/backend/replication/logical/tablesync.c |  16 +--
 src/backend/replication/pgoutput/pgoutput.c |   2 +-
 src/backend/utils/cache/relcache.c  |  25 ++--
 src/bin/psql/describe.c |   2 +-
 src/include/catalog/pg_publication.h|   7 +-
 src/include/commands/publicationcmds.h  |   4 +-
 src/test/regress/expected/publication.out   | 134 ++--
 src/test/regress/sql/publication.sql|  90 ++---
 src/test/subscription/t/029_column_list.pl  |  50 
 16 files changed, 235 insertions(+), 237 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 2b61f42b71..c043da37ae 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4392,7 +4392,7 @@ pg_index Columns
   
   
This is an array of indnatts values that
-   indicate which table columns this index indexes.  For example a value
+   indicate which table columns this index indexes.  For example, a value
of 1 3 would mean that the first and the third table
columns make up the index entries.  Key columns come before non-key
(included) columns.  A zero in this array indicates that the
@@ -6271,7 +6271,7 @@ pg_publication_namespace 
Columns
   
   
This is an array of values that indicates which table columns are
-   part of the publication.  For example a value of 1 3
+   part of the publication.  For example, a value of 1 3
would mean that the first and the third table columns are published.
A null value indicates that all columns are published.
   
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 496593201b..6f4d76ef7f 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -7005,9 +7005,8 @@ Logical Replication Message Formats
 
 
 
-Next, the following message part appears for each column (except
-generated columns and other columns that don't appear in the column
-filter list, for tables that have one):
+Next, the following message part appears for each column included in
+the publication (except generated columns):
 
 
 
diff --git a/doc/src/sgml/ref/alter_publication.sgml 
b/doc/src/sgml/ref/alter_publication.sgml
index aa6827c977..470d50a244 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -119,10 +119,14 @@ Parameters
   specified, the table and all its descendant tables (if any) are
   affected.  Optionally, * can be specified after the 
table
   name to explicitly indicate that descendant tables are included.
+ 
 
+ 
   Optionally, a column list can be specified.  See  for details.
+ 
 
+ 
   If the optional WHERE clause is specified, rows for
   which the expression
   evaluates to false or null will not be published. Note that parentheses
diff --git a/src/backend/catalog/pg_publication.c 
b/src/backend/catalog/pg_publication.c
index 

Re: Optionally automatically disable logical replication subscriptions on error

2022-03-09 Thread Masahiko Sawada
On Wed, Mar 9, 2022 at 4:33 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Tuesday, March 8, 2022 10:23 PM Amit Kapila  
> wrote:
> > On Tue, Mar 8, 2022 at 1:37 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
>
>
> > 2. Is there a reason the patch doesn't allow workers to restart via
> > maybe_reread_subscription() when this new option is changed, if so, then 
> > let's
> > add a comment for the same? We currently seem to be restarting the worker on
> > any change via Alter Subscription. If we decide to change it for this 
> > option as
> > well then I think we need to accordingly update the current comment: "Exit 
> > if
> > any parameter that affects the remote connection was changed." to something
> > like "Exit if any parameter that affects the remote connection or a 
> > subscription
> > option was changed..."
> I thought it's ok without the change at the beginning, but I was wrong.
> To make this new option aligned with others, I should add one check
> for this feature. Fixed.

Why do we need to restart the apply worker when disable_on_error is
changed? It doesn't affect the remote connection at all. I think it
can be changed without restarting like synchronous_commit option.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Changing "Hot Standby" to "hot standby"

2022-03-09 Thread Michael Paquier
On Wed, Mar 09, 2022 at 07:45:32AM +, Daniel Westermann (DWE) wrote:
> Thanks for having a look. Done that way.

Hmm.  Outside the title that had better use upper-case characters for
the first letter of each word, I can see references to the pattern you
are trying to eliminate in amcheck.sgml (1), config.sgml (3),
protocol.sgml (3) and mvcc.sgml (1).  Shouldn't you refresh these as
well if the point is to make the full set of docs consistent?

As of the full tree, I can see that:
$ git grep "hot standby" | wc -l
259
$ git grep "Hot Standby" | wc -l
73

So there is a trend for one of the two.
--
Michael


signature.asc
Description: PGP signature


Re: wal_compression=zstd

2022-03-09 Thread Michael Paquier
On Sat, Mar 05, 2022 at 07:26:39PM +0900, Michael Paquier wrote:
> Repeatability and randomness of data counts, we could have for example
> one case with a set of 5~7 int attributes, a second with text values
> that include random data, up to 10~12 bytes each to count on the tuple
> header to be able to compress some data, and a third with more
> repeatable data, like one attribute with an int column populate 
> with generate_series().  Just to give an idea.

And that's what I did as of the attached set of test:
- Cluster on tmpfs.
- max_wal_size, min_wal_size at 2GB and shared_buffers at 1GB, aka
large enough to include the full data set in memory.
- Rather than using Justin's full patch set, I have just patched the
code in xloginsert.c to switch the level.
- One case with table that uses one int attribute, with rather
repetitive data worth 484MB.
- One case with table using (int, text), where the text data is made
of 10~11 bytes of random data, worth ~340MB.
- Use pg_prewarm to load the data into shared buffers.  With the
cluster mounted on a tmpfs that should not matter though.
- Both tables have a fillfactor at 50 to give room to the updates.

I have measured the CPU usage with a toy extension, also attached,
called pg_rusage() that is a simple wrapper to upstream's pg_rusage.c 
to initialize a rusage snapshot and print its data with two SQL
functions that are called just before and after the FPIs are generated
(aka the UPDATE query that rewrites the whole table in the script
attached).

The quickly-hacked test script and the results are in test.tar.gz, for
reference.  The toy extension is pg_rusage.tar.gz.

Here are the results I compiled, as of results_format.sql in the
tarball attached:
 descr | rel_size | fpi_size | time_s 
---+--+--+
 int column no compression | 429 MB   | 727 MB   |  13.15
 int column ztsd default level | 429 MB   | 523 MB   |  14.23
 int column zstd level 1   | 429 MB   | 524 MB   |  13.94
 int column zstd level 10  | 429 MB   | 523 MB   |  23.46
 int column zstd level 19  | 429 MB   | 523 MB   | 103.71
 int column lz4 default level  | 429 MB   | 575 MB   |  13.37
 int/text no compression   | 344 MB   | 558 MB   |  10.08
 int/text lz4 default level| 344 MB   | 463 MB   |  10.29
 int/text zstd default level   | 344 MB   | 415 MB   |  11.48
 int/text zstd level 1 | 344 MB   | 418 MB   |  11.25
 int/text zstd level 10| 344 MB   | 415 MB   |  20.59
 int/text zstd level 19| 344 MB   | 413 MB   |  62.64
(12 rows)

I did not expect zstd to be this slow at a level of 10~ actually.  The
runtime (elapsed CPU time) got severely impacted at level 19, that I
ran just for fun to see how that it would become compared to a level
of 10.  There is a slight difference between the default level and a
level of 1, and the compression size does not change much, nor does
the CPU usage really change.

Attached is an updated patch, while on it, that I have tweaked before
running my own tests.

At the end, I'd still like to think that we'd better stick with the
default level for this parameter, and that's the suggestion of
upstream.  So I would like to move on with that for this patch.
--
Michael


test.tar.gz
Description: application/gzip


pg_rusage.tar.gz
Description: application/gzip
From 254ddbf4223c35a7990e301e53d6ddbffcf210c0 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 18 Feb 2022 22:54:18 -0600
Subject: [PATCH v2] add wal_compression=zstd

---
 src/include/access/xlog.h |  3 +-
 src/include/access/xlogrecord.h   |  5 +++-
 src/backend/access/transam/xloginsert.c   | 30 ++-
 src/backend/access/transam/xlogreader.c   | 20 +
 src/backend/utils/misc/guc.c  |  3 ++
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/bin/pg_waldump/pg_waldump.c   |  2 ++
 doc/src/sgml/config.sgml  | 11 ---
 doc/src/sgml/installation.sgml|  8 +
 9 files changed, 76 insertions(+), 8 deletions(-)

diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 4b45ac64db..09f6464331 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -75,7 +75,8 @@ typedef enum WalCompression
 {
 	WAL_COMPRESSION_NONE = 0,
 	WAL_COMPRESSION_PGLZ,
-	WAL_COMPRESSION_LZ4
+	WAL_COMPRESSION_LZ4,
+	WAL_COMPRESSION_ZSTD
 } WalCompression;
 
 /* Recovery states */
diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h
index c1b1137aa7..052ac6817a 100644
--- a/src/include/access/xlogrecord.h
+++ b/src/include/access/xlogrecord.h
@@ -149,8 +149,11 @@ typedef struct XLogRecordBlockImageHeader
 /* compression methods supported */
 #define BKPIMAGE_COMPRESS_PGLZ	0x04
 #define BKPIMAGE_COMPRESS_LZ4	0x08
+#define BKPIMAGE_COMPRESS_ZSTD	0x10
+
 #define	BKPIMAGE_COMPRESSED(info) \
-	((info & (BKPIMAGE_COMPRESS_PGLZ |