date:20220927

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-27 Thread kuroda.hay...@fujitsu.com

Dear Önder:

Thank you for updating patch! 
Your documentation seems OK, and I could not find any other places to be added

Followings are my comments.


01 relation.c - general

Many files are newly included.
I was not sure but some codes related with planner may be able to move to 
src/backend/optimizer/plan.
How do you and any other one think?

02 relation.c - FindLogicalRepUsableIndex

```
+/*
+ * Returns an index oid if we can use an index for the apply side. If not,
+ * returns InvalidOid.
+ */
+static Oid
+FindLogicalRepUsableIndex(Relation localrel, LogicalRepRelation *remoterel)
```

I grepped files, but I cannot find the word "apply side". How about 
"subscriber" instead?

03 relation.c - FindLogicalRepUsableIndex

```
+   /* Simple case, we already have an identity or pkey */
+   idxoid = GetRelationIdentityOrPK(localrel);
+   if (OidIsValid(idxoid))
+   return idxoid;
+
+   /*
+* If index scans are disabled, use a sequential scan.
+*
+* Note that we still allowed index scans above when there is a primary
+* key or unique index replica identity, but that is the legacy 
behaviour
+* (even when enable_indexscan is false), so we hesitate to move this
+* enable_indexscan check to be done earlier in this function.
+*/
+   if (!enable_indexscan)
+   return InvalidOid;
```

a. 
I think "identity or pkey" should be "replica identity key or primary key" or 
"RI or PK"

b. 
Later part should be at around GetRelationIdentityOrPK.


04 relation.c - FindUsableIndexForReplicaIdentityFull

```
+   MemoryContext usableIndexContext;
...
+   usableIndexContext = AllocSetContextCreate(CurrentMemoryContext,
+   
   "usableIndexContext",
+   
   ALLOCSET_DEFAULT_SIZES);
```

I grepped other sources, and I found that the name like "tmpcxt" is used for 
the temporary MemoryContext.

05 relation.c - SuitablePathsForRepIdentFull

```
+   indexRelation = index_open(idxoid, AccessShareLock);
+   indexInfo = BuildIndexInfo(indexRelation);
+   is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+   is_partial = (indexInfo->ii_Predicate != NIL);
+   is_only_on_expression = 
IndexOnlyOnExpression(indexInfo);
+   index_close(indexRelation, NoLock);
```

Why the index is closed with NoLock? AccessShareLock is acquired, so shouldn't 
same lock be released?


06 relation.c - GetCheapestReplicaIdentityFullPath

IIUC a query like "SELECT tbl WHERE attr1 = $1 AND attr2 = $2 ... AND attrN = 
$N" is emulated, right?
you can write explicitly it as comment

07 relation.c - GetCheapestReplicaIdentityFullPath

```
+   Path   *path = (Path *) lfirst(lc);
+   Oid idxoid = GetIndexOidFromPath(path);
+
+   if (!OidIsValid(idxoid))
+   {
+   /* Unrelated Path, skip */
+   suitableIndexList = lappend(suitableIndexList, path);
+   }
```

I was not clear about here. IIUC in the function we want to extract "good" scan 
plan and based on that the cheapest one is chosen. 
GetIndexOidFromPath() seems to return InvalidOid when the input path is not 
index scan, so why is it appended to the suitable list?


===
08 worker.c - usable_indexoid_internal

I think this is not "internal" function, such name should be used for like 
"apply_handle_commit" - "apply_handle_commit_internal", or 
"apply_handle_insert" - "apply_handle_insert_internal".
How about "get_usable_index" or something?

09 worker.c - usable_indexoid_internal

```
+   Oid targetrelid = 
targetResultRelInfo->ri_RelationDesc->rd_rel->oid;
+   Oid localrelid = relinfo->ri_RelationDesc->rd_id;
+
+   if (targetrelid != localrelid)
```

I think these lines are very confusable.
IIUC targetrelid is corresponded to the "parent", and localrelid is 
corresponded to the "child", right?
How about changing name to "partitionedoid" and "leadoid" or something?

===
10 032_subscribe_use_index.pl

```
# create tables pub and sub
$node_publisher->safe_psql('postgres',
"CREATE TABLE test_replica_id_full (x int)");
$node_publisher->safe_psql('postgres',
"ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;");
$node_subscriber->safe_psql('postgres',
"CREATE TABLE test_replica_id_full (x int)");
$node_subscriber->safe_psql('postgres',
"CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");
```

In many places same table is defined, altered as "REPLICA IDENTITY FULL", and 
index is created.
Could you combine them into function?

11 032_subscribe_use_index.pl

```
# wait until the index is used on the

Re: Extend win32 error codes to errno mapping in win32error.c

2022-09-27 Thread Michael Paquier

On Wed, Sep 28, 2022 at 11:14:53AM +0530, Bharath Rupireddy wrote:
> IMO, we can add mapping for just ERROR_INVALID_NAME which is an
> obvious error code and easy to hit, leaving others. There are actually
> many win32 error codes that may get hit in our code base and actually
> mapping everything isn't possible.

Yes.  I am fine to do just that as you have managed to hit it during
development.  The others may have more opinions to offer.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Add peer authentication TAP test

2022-09-27 Thread Michael Paquier

On Fri, Aug 26, 2022 at 10:43:43AM +0200, Drouvot, Bertrand wrote:
> During the work in [1] we created a new TAP test to test the SYSTEM_USER
> behavior with peer authentication.
> 
> It turns out that there is currently no TAP test for the peer
> authentication, so we think (thanks Michael for the suggestion [2]) that
> it's better to split the work in [1] between "pure" SYSTEM_USER related work
> and the "pure" peer authentication TAP test work.
> 
> That's the reason of this new thread, please find attached a patch to add a
> new TAP test for the peer authentication.

+# Get the session_user to define the user name map test.
+my $session_user =
+  $node->safe_psql('postgres', 'select session_user');
[...]
+# Define a user name map.
+$node->append_conf('pg_ident.conf', qq{mypeermap $session_user 
testmap$session_user});
+
+# Set pg_hba.conf with the peer authentication and the user name map.
+reset_pg_hba($node, 'peer map=mypeermap');

A map consists of a "MAPNAME SYSTEM_USER PG_USER".  Why does this test
use a Postgres role (from session_user) as the system user for the
peer map?
--
Michael


signature.asc
Description: PGP signature

Re: Fix some newly modified tab-complete changes

2022-09-27 Thread Kyotaro Horiguchi

At Wed, 28 Sep 2022 14:14:01 +1000, Peter Smith  wrote 
in 
> On Tue, Sep 27, 2022 at 8:28 PM shiy.f...@fujitsu.com
>  wrote:
> >
> > Hi hackers,
> >
> > I saw a problem when using tab-complete for "GRANT", "TABLES IN SCHEMA" 
> > should
> > be "ALL TABLES IN SCHEMA" in the following case.
> >
> > postgres=# grant all on
> > ALL FUNCTIONS IN SCHEMA   DATABASE  FUNCTION
> >   PARAMETER SCHEMATABLESPACE
> > ALL PROCEDURES IN SCHEMA  DOMAINinformation_schema. 
> >   PROCEDURE SEQUENCE  tbl
> > ALL ROUTINES IN SCHEMAFOREIGN DATA WRAPPER  LANGUAGE
> >   public.   TABLE TYPE
> > ALL SEQUENCES IN SCHEMA   FOREIGN SERVERLARGE OBJECT
> >   ROUTINE   TABLES IN SCHEMA
> >
> > I found that it is related to the recent commit 790bf615dd, and maybe it's
> > better to fix it. I also noticed that some comments should be modified 
> > according
> > to this new syntax. Attach a patch to fix them.
> >
> 
> Thanks for the patch! Below are my review comments.
> 
> The patch looks good to me but I did find some other tab-completion
> anomalies. IIUC these are unrelated to your work, but since I found
> them while testing your patch I am reporting them here.

Looks fine to me, too.

> Perhaps you want to fix them in the same patch, or just raise them
> again separately?
> 
> ==
> 
> 1. tab complete for CREATE PUBLICATION
> 
> I don’t think this is any new bug, but I found that it is possible to do 
> this...
> 
> test_pub=# create publication p for ALL TABLES IN SCHEMA 
> information_schema  pg_catalog  pg_toastpublic
>
> or, even this...
> 
> test_pub=# create publication p for XXX TABLES IN SCHEMA 
> information_schema  pg_catalog  pg_toastpublic

Completion is responding to "IN SCHEMA" in these cases.  However, I
don't reach this state only by completion becuase it doesn't suggest
"IN SCHEMA" after "TABLES" nor "ALL TABLES".  I don't see a reason to
change that behavior unless that fix doesn't cause any additional
complexity.

> ==
> 
> 2. tab complete for GRANT
> 
> test_pub=# grant 
> ALLEXECUTE
> pg_execute_server_program  pg_read_server_files   postgres
>   TRIGGER
> ALTER SYSTEM   GRANT  pg_monitor
>   pg_signal_backend  REFERENCES
> TRUNCATE
> CONNECTINSERT pg_read_all_data
>   pg_stat_scan_tablesSELECT UPDATE
> CREATE pg_checkpoint
> pg_read_all_settings   pg_write_all_data  SET
>   USAGE
> DELETE pg_database_owner
> pg_read_all_stats  pg_write_server_files  TEMPORARY
> 
> 2a.
> grant "GRANT" ??

Yeah, for the mement I thought that might a kind of admin option but
there's no such a privilege. REVOKE gets the same suggestion.

> 2b.
> grant "TEMPORARY" but not "TEMP" ??

TEMP is an alternative spelling so that's fine.


I found the following suggestion.

CREATE PUBLICATION p FOR TABLES  -> ["IN SCHEMA", "WITH ("]

I believe "WITH (" doesn't come there.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Thomas Munro

On Wed, Sep 28, 2022 at 6:15 PM Tom Lane  wrote:
> There may be a larger conversation to be had here about how
> much our CI infrastructure should be detecting.  There seems
> to be a depressingly large gap between what that found and
> what the buildfarm is finding --- not only in portability
> issues, but in things like cpluspluscheck failures, which
> I had supposed CI would find.

+1, Andres had some sanitizer flags in the works (stopped by a weird
problem to be resolved first), and 32 bit CI would clearly be good.
It also seems that ARM is now available to us via CI (either Amazon's
or a Mac), which IIRC is more SIGBUS-y about alignment than x86?

FTR CI reported that cpluspluscheck failure and more[1], so perhaps we
just need to get clearer agreement on the status of CI, ie a policy
that CI had better be passing before you get to the next stage.  It's
still pretty new...

[1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3711

Re: Extend win32 error codes to errno mapping in win32error.c

2022-09-27 Thread Bharath Rupireddy

On Wed, Sep 28, 2022 at 10:10 AM Michael Paquier  wrote:
>
> One important thing, in my opinion, when it comes to updating this
> table, is that it could be better to report the original error number
> if errno can be somewhat confusing for the mapping.

Returning errno = e instead of EINVAL in _dosmaperr() may have an
impact on the callers that do a special handling for errno EINVAL. I
don't think it's a good idea.

> ERROR_INVALID_NAME => ENOENT
> Yeah, this mapping looks fine.

Hm.

> ERROR_HANDLE_DISK_FULL => ENOSPC
> This one maps to various Setup*Error(), as well as
> GetDiskFreeSpaceEx().  The former is not interesting, but I can buy
> the case of the former for extension code (I've played with that in
> the past on WIN32, actually).
>
> ERROR_OUTOFMEMORY => ENOMEM
> ERROR_NOACCESS => EACCES
> ERROR_INSUFFICIENT_BUFFER => EINVAL
> Hmm.  I have looked at our WIN32 system calls and the upstream docs,
> but these do not seem to be reachable in our code.

IMO, we can add mapping for just ERROR_INVALID_NAME which is an
obvious error code and easy to hit, leaving others. There are actually
many win32 error codes that may get hit in our code base and actually
mapping everything isn't possible.

Please see the v2 patch. I've also added a CF entry -
https://commitfest.postgresql.org/40/3914/ so that the patch gets
tested across.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

v2-0001-Extend-win32-error-codes-to-errno-mapping-in-win3.patch
Description: Binary data

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Tom Lane

Dilip Kumar  writes:
> wrasse is also failing with a bus error,

Yeah.  At this point I think it's time to call for this patch
to get reverted.  It should get tested *off line* on some
non-Intel, non-64-bit, alignment-picky architectures before
the rest of us have to deal with it any more.

There may be a larger conversation to be had here about how
much our CI infrastructure should be detecting.  There seems
to be a depressingly large gap between what that found and
what the buildfarm is finding --- not only in portability
issues, but in things like cpluspluscheck failures, which
I had supposed CI would find.

regards, tom lane

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Dilip Kumar

wrasse is also failing with a bus error, but I cannot get the stack
trace.  So it seems it is hitting some alignment issues during startup
[1].  Is it possible to get the backtrace or lineno?

[1]
2022-09-28 03:19:26.228 CEST [180:4] LOG:  redo starts at 0/30FE9D8
2022-09-28 03:19:27.674 CEST [177:3] LOG:  startup process (PID 180)
was terminated by signal 10: Bus Error
2022-09-28 03:19:27.674 CEST [177:4] LOG:  terminating any other
active server processes
2022-09-28 03:19:27.677 CEST [177:5] LOG:  shutting down due to
startup process failure
2022-09-28 03:19:27.681 CEST [177:6] LOG:  database system is shut down

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Avoid memory leaks during base backups

2022-09-27 Thread Bharath Rupireddy

On Wed, Sep 28, 2022 at 10:19 AM Tom Lane  wrote:
>
> Bharath Rupireddy  writes:
> > I had the same opinion. Here's what I think - for backup functions, we
> > can have the new memory context child of TopMemoryContext and for
> > perform_base_backup(), we can have the memory context child of
> > CurrentMemoryContext. With PG_TRY()-PG_FINALLY()-PG_END_TRY(), we can
> > delete those memory contexts upon ERRORs. This approach works for us
> > since backup-related code doesn't have any FATALs.
>
> Not following your last point here?  A process exiting on FATAL
> does not especially need to clean up its memory allocations first.
> Which is good, because "backup-related code doesn't have any FATALs"
> seems like an assertion with a very short half-life.

You're right. My bad. For FATALs, we don't need to clean the memory as
the process itself exits.

 * Note: an ereport(FATAL) will not be caught by this construct; control will
 * exit straight through proc_exit().

/*
 * Perform error recovery action as specified by elevel.
 */
if (elevel == FATAL)
{

/*
 * Do normal process-exit cleanup, then return exit code 1 to indicate
 * FATAL termination.  The postmaster may or may not consider this
 * worthy of panic, depending on which subprocess returns it.
 */
proc_exit(1);
}

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Avoid memory leaks during base backups

2022-09-27 Thread Tom Lane

Bharath Rupireddy  writes:
> I had the same opinion. Here's what I think - for backup functions, we
> can have the new memory context child of TopMemoryContext and for
> perform_base_backup(), we can have the memory context child of
> CurrentMemoryContext. With PG_TRY()-PG_FINALLY()-PG_END_TRY(), we can
> delete those memory contexts upon ERRORs. This approach works for us
> since backup-related code doesn't have any FATALs.

Not following your last point here?  A process exiting on FATAL
does not especially need to clean up its memory allocations first.
Which is good, because "backup-related code doesn't have any FATALs"
seems like an assertion with a very short half-life.

regards, tom lane

RE: [small patch] Change datatype of ParallelMessagePending from "volatile bool" to "volatile sig_atomic_t"

2022-09-27 Thread kuroda.hay...@fujitsu.com

Dear Michael,

> Yeah, at least as of the cancel callback psql_cancel_callback() that
> handle_sigint() would call on SIGINT as this is set by psql.  So it
> does not seem right to use a boolean rather than a sig_atomic_t in
> this case, as well.

PSA fix patch. Note that PromptInterruptContext.enabled was also fixed
because it is substituted from sigint_interrupt_enabled

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



0001-Mark-sigint_interrupt_enabled-as-sig_atomic_t.patch
Description: 0001-Mark-sigint_interrupt_enabled-as-sig_atomic_t.patch

Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2022-09-27 Thread Bharath Rupireddy

On Wed, Sep 28, 2022 at 12:32 AM Thomas Munro  wrote:
>
> On Wed, Sep 28, 2022 at 1:03 AM Bharath Rupireddy
>  wrote:>
> > On Tue, Sep 27, 2022 at 3:01 PM Thomas Munro  wrote:
> > > Something like the attached.
> >
> > Isn't it also better to add notes in win32pread.c and win32pwrite.c
> > about the callers doing lseek(SEEK_SET) if they wish to and Windows
> > implementations changing file position? We can also add a TODO item
> > about replacing pg_ versions with pread and friends someday when
> > Windows fixes the issue? Having it in the commit and include/port.h is
> > good, but the comments in win32pread.c and win32pwrite.c make life
> > easier IMO. Otherwise, the patch LGTM.
>
> Thanks, will do.

I'm looking forward to getting it in.

> FWIW I doubt the OS itself will change released
> behaviour like that, but I did complain about the straight-up
> misleading documentation and they agreed and fixed it[1].
>
> [1] https://github.com/MicrosoftDocs/sdk-api/pull/1309

Great! Is there any plan to request for change in WriteFile() to not
alter file position?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Extend win32 error codes to errno mapping in win32error.c

2022-09-27 Thread Michael Paquier

On Tue, Sep 27, 2022 at 03:23:04PM +0530, Bharath Rupireddy wrote:
> The failure occurs in dosmaperr() in win32error.c due to an unmapped
> errno for win32 error code. The error code 123 i.e. ERROR_INVALID_NAME
> says "The file name, directory name, or volume label syntax is
> incorrect." [2], the closest errno mapping would be ENOENT. I quickly
> looked around for the other win32 error codes [2] that don't have
> mapping.

> I filtered out some common error codes such as
> ERROR_OUTOFMEMORY, ERROR_HANDLE_DISK_FULL, ERROR_INSUFFICIENT_BUFFER,
> ERROR_NOACCESS. There may be many more, but these seemed common IMO.
> 
> Having a right errno mapping always helps recognize the errors correctly.

One important thing, in my opinion, when it comes to updating this
table, is that it could be better to report the original error number
if errno can be somewhat confusing for the mapping.  It is also less
interesting to increase the size of the table for errors that cannot
be reached, or related to system calls we don't use.

ERROR_INVALID_NAME => ENOENT
Yeah, this mapping looks fine.

ERROR_HANDLE_DISK_FULL => ENOSPC
This one maps to various Setup*Error(), as well as
GetDiskFreeSpaceEx().  The former is not interesting, but I can buy
the case of the former for extension code (I've played with that in
the past on WIN32, actually).

ERROR_OUTOFMEMORY => ENOMEM
ERROR_NOACCESS => EACCES
ERROR_INSUFFICIENT_BUFFER => EINVAL
Hmm.  I have looked at our WIN32 system calls and the upstream docs,
but these do not seem to be reachable in our code.
--
Michael

signature.asc
Description: PGP signature

Re: Avoid memory leaks during base backups

2022-09-27 Thread Bharath Rupireddy

On Wed, Sep 28, 2022 at 9:46 AM Michael Paquier  wrote:
>
> On Tue, Sep 27, 2022 at 05:32:26PM +0900, Kyotaro Horiguchi wrote:
> > At Tue, 27 Sep 2022 11:33:56 +0530, Bharath Rupireddy 
> >  wrote in
> > > On Mon, Sep 26, 2022 at 7:34 PM Tom Lane  wrote:
> > > > This ... seems like inventing your own shape of wheel.  The
> > > > normal mechanism for preventing this type of leak is to put the
> > > > allocations in a memory context that can be reset or deallocated
> > > > in mainline code at the end of the operation.
> > >
> > > Yes, that's the typical way and the patch attached does it for
> > > perform_base_backup(). What happens if we allocate some memory in the
> > > new memory context and error-out before reaching the end of operation?
> > > How do we deallocate such memory?
> >
> > Whoever directly or indirectly catches the exception can do that.  For
> > example, SendBaseBackup() seems to catch execptions from
> > perform_base_backup(). bbsinc_cleanup() is already resides there.
>
> Even with that, what's the benefit in using an extra memory context in
> basebackup.c?  backup_label and tablespace_map are mentioned upthread,
> but we have a tight control of these, and they should be allocated in
> the memory context created for replication commands (grep for
> "Replication command context") anyway.  Using a dedicated memory
> context for the SQL backup functions under TopMemoryContext could be
> interesting, on the other hand..

I had the same opinion. Here's what I think - for backup functions, we
can have the new memory context child of TopMemoryContext and for
perform_base_backup(), we can have the memory context child of
CurrentMemoryContext. With PG_TRY()-PG_FINALLY()-PG_END_TRY(), we can
delete those memory contexts upon ERRORs. This approach works for us
since backup-related code doesn't have any FATALs.

Thoughts?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Avoid memory leaks during base backups

2022-09-27 Thread Michael Paquier

On Tue, Sep 27, 2022 at 05:32:26PM +0900, Kyotaro Horiguchi wrote:
> At Tue, 27 Sep 2022 11:33:56 +0530, Bharath Rupireddy 
>  wrote in 
> > On Mon, Sep 26, 2022 at 7:34 PM Tom Lane  wrote:
> > > This ... seems like inventing your own shape of wheel.  The
> > > normal mechanism for preventing this type of leak is to put the
> > > allocations in a memory context that can be reset or deallocated
> > > in mainline code at the end of the operation.
> > 
> > Yes, that's the typical way and the patch attached does it for
> > perform_base_backup(). What happens if we allocate some memory in the
> > new memory context and error-out before reaching the end of operation?
> > How do we deallocate such memory?
> 
> Whoever directly or indirectly catches the exception can do that.  For
> example, SendBaseBackup() seems to catch execptions from
> perform_base_backup(). bbsinc_cleanup() is already resides there.

Even with that, what's the benefit in using an extra memory context in
basebackup.c?  backup_label and tablespace_map are mentioned upthread,
but we have a tight control of these, and they should be allocated in
the memory context created for replication commands (grep for
"Replication command context") anyway.  Using a dedicated memory
context for the SQL backup functions under TopMemoryContext could be
interesting, on the other hand..
--
Michael


signature.asc
Description: PGP signature

Re: Fix some newly modified tab-complete changes

2022-09-27 Thread Peter Smith

On Tue, Sep 27, 2022 at 8:28 PM shiy.f...@fujitsu.com
 wrote:
>
> Hi hackers,
>
> I saw a problem when using tab-complete for "GRANT", "TABLES IN SCHEMA" should
> be "ALL TABLES IN SCHEMA" in the following case.
>
> postgres=# grant all on
> ALL FUNCTIONS IN SCHEMA   DATABASE  FUNCTION  
> PARAMETER SCHEMATABLESPACE
> ALL PROCEDURES IN SCHEMA  DOMAINinformation_schema.   
> PROCEDURE SEQUENCE  tbl
> ALL ROUTINES IN SCHEMAFOREIGN DATA WRAPPER  LANGUAGE  
> public.   TABLE TYPE
> ALL SEQUENCES IN SCHEMA   FOREIGN SERVERLARGE OBJECT  
> ROUTINE   TABLES IN SCHEMA
>
> I found that it is related to the recent commit 790bf615dd, and maybe it's
> better to fix it. I also noticed that some comments should be modified 
> according
> to this new syntax. Attach a patch to fix them.
>

Thanks for the patch! Below are my review comments.

The patch looks good to me but I did find some other tab-completion
anomalies. IIUC these are unrelated to your work, but since I found
them while testing your patch I am reporting them here.

Perhaps you want to fix them in the same patch, or just raise them
again separately?

==

1. tab complete for CREATE PUBLICATION

I don’t think this is any new bug, but I found that it is possible to do this...

test_pub=# create publication p for ALL TABLES IN SCHEMA 
information_schema  pg_catalog  pg_toastpublic

or, even this...

test_pub=# create publication p for XXX TABLES IN SCHEMA 
information_schema  pg_catalog  pg_toastpublic

==

2. tab complete for GRANT

test_pub=# grant 
ALLEXECUTE
pg_execute_server_program  pg_read_server_files   postgres
  TRIGGER
ALTER SYSTEM   GRANT  pg_monitor
  pg_signal_backend  REFERENCES
TRUNCATE
CONNECTINSERT pg_read_all_data
  pg_stat_scan_tablesSELECT UPDATE
CREATE pg_checkpoint
pg_read_all_settings   pg_write_all_data  SET
  USAGE
DELETE pg_database_owner
pg_read_all_stats  pg_write_server_files  TEMPORARY

2a.
grant "GRANT" ??

~

2b.
grant "TEMPORARY" but not "TEMP" ??

--
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Dilip Kumar

On Wed, Sep 28, 2022 at 2:59 AM Tom Lane  wrote:
>
> ... also, lapwing's not too happy [1].  The alter_table test
> expects this to yield zero rows, but it doesn't:

By looking at regression diff as shown below, it seems that we are
able to get the relfilenode from the Oid using
pg_relation_filenode(oid) but the reverse mapping
pg_filenode_relation(reltablespace, relfilenode) returned NULL.

I am not sure but by looking at the code it is somehow related to
alignment padding while computing the hash key size in the 32-bit
machine in the function InitializeRelfilenumberMap().  I am still
looking into this and will provide updates on this.

+  oid  | mapped_oid | reltablespace | relfilenode |
 relname
+---++---+-+
+ 16385 || 0 |  10 | char_tbl
+ 16388 || 0 |  11 | float8_tbl
+ 16391 || 0 |  12 | int2_tbl
+ 16394 || 0 |  13 | int4_tbl

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-27 Thread Kyotaro Horiguchi

At Wed, 28 Sep 2022 10:09:39 +0900, Michael Paquier  wrote 
in 
> On Tue, Sep 27, 2022 at 03:11:54PM +0530, Bharath Rupireddy wrote:
> > On Tue, Sep 27, 2022 at 2:20 PM Kyotaro Horiguchi
> >  wrote:
> >> If this is still unacceptable, I propose to change the comment. (I
> >> found that the previous patch forgets about do_pg_backup_stop())
> >>
> >> - * It fills in backup_state with the information required for the backup,
> >> + * It fills in the parameter "state" with the information required for 
> >> the backup,
> > 
> > +1. There's another place that uses backup_state in the comments. I
> > modified that as well. Please see the attached patch.
> 
> Thanks, fixed the comments.  I have let the variable names as they are
> now in the code, as both are backup-related code paths so it is IMO
> clear that the state is linked to a backup.

Thanks!  I'm fine with that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: making relfilenodes 56 bits

2022-09-27 Thread Thomas Munro

Hi Dilip,

I am very happy to see these commits.  Here's some belated review for
the tombstone-removal patch.

> v7-0004-Don-t-delay-removing-Tombstone-file-until-next.patch

More things you can remove:

 * sync_unlinkfiletag in struct SyncOps
 * the macro UNLINKS_PER_ABSORB
 * global variable pendingUnlinks

This comment after the question mark is obsolete:

* XXX should we CHECK_FOR_INTERRUPTS in this loop?
Escaping with an
* error in the case of SYNC_UNLINK_REQUEST would leave the
* no-longer-used file still present on disk, which
would be bad, so
* I'm inclined to assume that the checkpointer will
always empty the
* queue soon.

(I think if the answer to the question is now yes, then we should
replace the stupid sleep with a condition variable sleep, but there's
another thread about that somewhere).

In a couple of places in dbcommands.c you could now make this change:

/*
-* Force a checkpoint before starting the copy. This will
force all dirty
-* buffers, including those of unlogged tables, out to disk, to ensure
-* source database is up-to-date on disk for the copy.
-* FlushDatabaseBuffers() would suffice for that, but we also want to
-* process any pending unlink requests. Otherwise, if a checkpoint
-* happened while we're copying files, a file might be deleted just when
-* we're about to copy it, causing the lstat() call in copydir() to fail
-* with ENOENT.
+* Force all dirty buffers, including those of unlogged tables, out to
+* disk, to ensure source database is up-to-date on disk for the copy.
 */
-   RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE |
- CHECKPOINT_WAIT |
CHECKPOINT_FLUSH_ALL);
+   FlushDatabaseBuffers(src_dboid);

More obsolete comments you could change:

 * If we were copying database at block levels then drop pages for the
 * destination database that are in the shared buffer cache.  And tell
-->  * checkpointer to forget any pending fsync and unlink
requests for files

--> * Tell checkpointer to forget any pending fsync and unlink requests for
* files in the database; else the fsyncs will fail at next
checkpoint, or
* worse, it will delete file

In tablespace.c I think you could now make this change:

if (!destroy_tablespace_directories(tablespaceoid, false))
{
-   /*
-* Not all files deleted?  However, there can be
lingering empty files
-* in the directories, left behind by for example DROP
TABLE, that
-* have been scheduled for deletion at next checkpoint
(see comments
-* in mdunlink() for details).  We could just delete
them immediately,
-* but we can't tell them apart from important data
files that we
-* mustn't delete.  So instead, we force a checkpoint
which will clean
-* out any lingering files, and try again.
-*/
-   RequestCheckpoint(CHECKPOINT_IMMEDIATE |
CHECKPOINT_FORCE | CHECKPOINT_WAIT);
-
+#ifdef WIN32
/*
 * On Windows, an unlinked file persists in the
directory listing
 * until no process retains an open handle for the
file.  The DDL
@@ -523,6 +513,7 @@ DropTableSpace(DropTableSpaceStmt *stmt)

/* And now try again. */
if (!destroy_tablespace_directories(tablespaceoid, false))
+#endif
{
/* Still not empty, the files must be important then */
ereport(ERROR,

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-27 Thread Masahiko Sawada

On Fri, Sep 23, 2022 at 12:11 AM John Naylor
 wrote:
>
>
> On Thu, Sep 22, 2022 at 11:46 AM John Naylor  
> wrote:
> > One thing I want to try soon is storing fewer than 16/32 etc entries, so 
> > that the whole node fits comfortably inside a power-of-two allocation. That 
> > would allow us to use aset without wasting space for the smaller nodes, 
> > which would be faster and possibly would solve the fragmentation problem 
> > Andres referred to in
>
> > https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de
>
> While calculating node sizes that fit within a power-of-two size, I noticed 
> the current base node is a bit wasteful, taking up 8 bytes. The node kind 
> only has a small number of values, so it doesn't really make sense to use an 
> enum here in the struct (in fact, Andres' prototype used a uint8 for 
> node_kind). We could use a bitfield for the count and kind:
>
> uint16 -- kind and count bitfield
> uint8 shift;
> uint8 chunk;
>
> That's only 4 bytes. Plus, if the kind is ever encoded in a pointer tag, the 
> bitfield can just go back to being count only.

Good point, agreed.

>
> Here are the v6 node kinds:
>
> node4:   8 +   4 +(4)+   4*8 =   48 bytes
> node16:  8 +  16 +  16*8 =  152
> node32:  8 +  32 +  32*8 =  296
> node128: 8 + 256 + 128/8 + 128*8 = 1304
> node256: 8   + 256/8 + 256*8 = 2088
>
> And here are the possible ways we could optimize nodes for space using aset 
> allocation. Parentheses are padding bytes. Even if my math has mistakes, the 
> numbers shouldn't be too far off:
>
> node3:   4 +   3 +(1)+   3*8 =   32 bytes
> node6:   4 +   6 +(6)+   6*8 =   64
> node13:  4 +  13 +(7)+  13*8 =  128
> node28:  4 +  28 +  28*8 =  256
> node31:  4 + 256 +  32/8 +  31*8 =  512 (XXX not good)
> node94:  4 + 256 +  96/8 +  94*8 = 1024
> node220: 4 + 256 + 224/8 + 220*8 = 2048
> node256: = 4096
>
> The main disadvantage is that node256 would balloon in size.

Yeah, node31 and node256 are bloated.  We probably could use slab for
node256 independently. It's worth trying a benchmark to see how it
affects the performance and the tree size.

BTW We need to consider not only aset/slab but also DSA since we
allocate dead tuple TIDs on DSM in parallel vacuum cases. FYI DSA uses
the following size classes:

static const uint16 dsa_size_classes[] = {
sizeof(dsa_area_span), 0,   /* special size classes */
8, 16, 24, 32, 40, 48, 56, 64,  /* 8 classes separated by 8 bytes */
80, 96, 112, 128,   /* 4 classes separated by 16 bytes */
160, 192, 224, 256, /* 4 classes separated by 32 bytes */
320, 384, 448, 512, /* 4 classes separated by 64 bytes */
640, 768, 896, 1024,/* 4 classes separated by 128 bytes */
1280, 1560, 1816, 2048, /* 4 classes separated by ~256 bytes */
2616, 3120, 3640, 4096, /* 4 classes separated by ~512 bytes */
5456, 6552, 7280, 8192  /* 4 classes separated by ~1024 bytes */
};

node256 will be classed as 2616, which is still not good.

Anyway, I'll implement DSA support for radix tree.

Regards,

-- 
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Insertion Sort Improvements

2022-09-27 Thread John Naylor

On Tue, Sep 27, 2022 at 4:39 PM Benjamin Coutu  wrote:
>
> Getting back to improvements for small sort runs, it might make sense to
consider using in-register based sorting via sorting networks for very
small runs.

> It looks like some of the commercial DBMSs do this very successfully.

"Looks like"?

> They use 4 512bit registers (AVX-512) in this example, which could
technically store 4 * 4 sort-elements (given that the sorting key is 64 bit
and the tuple pointer is 64 bit). I wonder whether this could also be done
with just SSE (instead of AVX), which the project now readily supports
thanks to your recent efforts?

SortTuples are currently 24 bytes, and supported vector registers are 16
bytes, so not sure how you think that would work.

More broadly, the more invasive a change is, the more risk is involved, and
the more effort to test and evaluate. If you're serious about trying to
improve insertion sort performance, the simple idea we discussed at the
start of the thread is a much more modest step that has a good chance of
justifying the time put into it. That is not to say it's easy, however,
because testing is a non-trivial amount of work.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: A doubt about a newly added errdetail

2022-09-27 Thread Amit Kapila

On Tue, Sep 27, 2022 at 6:12 PM Alvaro Herrera  wrote:
>
> While reading this code, I noticed that function expr_allowed_in_node()
> has a very strange API: it doesn't have any return convention at all
> other than "if we didn't modify errdetail_str then all is good".  I was
> tempted to add an "Assert(*errdetail_msg == NULL)" at the start of it,
> just to make sure that it is not called if a message is already set.
>
> I think it would be much saner to inline the few lines of that function
> in its sole caller, as in the attached.
>

LGTM.

-- 
With Regards,
Amit Kapila.

Re: SYSTEM_USER reserved word implementation

2022-09-27 Thread Michael Paquier

On Tue, Sep 27, 2022 at 03:38:49PM -0700, Jacob Champion wrote:
> On 9/26/22 06:29, Drouvot, Bertrand wrote:
> Since there are only internal clients to the API, I'd argue this makes
> more sense as an Assert(authn_id != NULL), but I don't think it's a
> dealbreaker.

Using an assert() looks like a good idea from here.  If this is called
with a NULL authn, this could reflect a problem in the authentication
logic.

>> As far the assertion failure mentioned by Michael when moving the 
>> SVFOP_SYSTEM_USER from NAMEOID to TEXTOID: V4 is assuming that it is 
>> safe to force the collation to C_COLLATION_OID for SQLValueFunction 
>> having a TEXT type, but I would be happy to also hear your thoughts 
>> about it.
> 
> Unfortunately I don't have much to add here; I don't know enough about
> the underlying problems.

I have been looking at that, and after putting my hands on that this
comes down to the facility introduced in 40c24bf.  So, I think that
we'd better use COERCE_SQL_SYNTAX so as there is no need to worry
about the shortcuts this patch is trying to use with the collation
setup.  And there are a few tests for get_func_sql_syntax() in
create_view.sql.  Note that this makes the patch slightly shorter, and
simpler.

The docs still mentioned "name", and not "text".

This brings in a second point.  40c24bf has refrained from removing
SQLValueFunction, but based the experience on this thread I see a
pretty good argument in doing the jump once and for all.  This
deserves a separate discussion, though.  I'll do that and create a new
thread.
--
Michael
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 8b72f8a215..e89703a3bf 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1508,6 +1508,9 @@
 { oid => '746', descr => 'session user name',
   proname => 'session_user', provolatile => 's', prorettype => 'name',
   proargtypes => '', prosrc => 'session_user' },
+{ oid => '9977', descr => 'system user name',
+  proname => 'system_user', provolatile => 's', prorettype => 'text',
+  proargtypes => '', prosrc => 'system_user' },
 
 { oid => '744',
   proname => 'array_eq', prorettype => 'bool',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index ee48e392ed..e7ebea4ff4 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -357,6 +357,9 @@ extern void InitializeSessionUserIdStandalone(void);
 extern void SetSessionAuthorization(Oid userid, bool is_superuser);
 extern Oid	GetCurrentRoleId(void);
 extern void SetCurrentRoleId(Oid roleid, bool is_superuser);
+extern void InitializeSystemUser(const char *authn_id,
+ const char *auth_method);
+extern const char *GetSystemUser(void);
 
 /* in utils/misc/superuser.c */
 extern bool superuser(void);	/* current user is superuser */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 9a7cc0c6bd..ccc927851c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -409,6 +409,7 @@ PG_KEYWORD("support", SUPPORT, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("symmetric", SYMMETRIC, RESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("sysid", SYSID, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("system", SYSTEM_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("system_user", SYSTEM_USER, RESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("table", TABLE, RESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("tables", TABLES, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("tablesample", TABLESAMPLE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 8cba888223..ee0985c7ed 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1496,6 +1496,14 @@ ParallelWorkerMain(Datum main_arg)
 		 false);
 	RestoreClientConnectionInfo(clientconninfospace);
 
+	/*
+	 * Initialize SystemUser now that MyClientConnectionInfo is restored.
+	 * Also ensure that auth_method is actually valid, aka authn_id is not NULL.
+	 */
+	if (MyClientConnectionInfo.authn_id)
+		InitializeSystemUser(MyClientConnectionInfo.authn_id,
+			 hba_authname(MyClientConnectionInfo.auth_method));
+
 	/* Attach to the leader's serializable transaction, if SERIALIZABLE. */
 	AttachSerializableXact(fps->serializable_xact_handle);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0d8d292850..94d5142a4a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -743,7 +743,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW
 	SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SQL_P STABLE STANDALONE_P
 	START STATEMENT STATISTICS STDIN STDOUT STORAGE STORED STRICT_P STRIP_P
-	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P
+	SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P SYSTEM_USER
 
 	TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE

Re: Strip -mmacosx-version-min options from plperl build

2022-09-27 Thread Andres Freund

Hi,

On 2022-08-30 09:35:51 -0400, Andrew Dunstan wrote:
> On 2022-08-26 Fr 16:25, Andres Freund wrote:
> > On 2022-08-26 16:00:31 -0400, Tom Lane wrote:
> >> Andrew Dunstan  writes:
> >>> On 2022-08-26 Fr 12:11, Tom Lane wrote:
>  And if that doesn't help, try -Wl,--export-all-symbols
> >>> worked
> > Except that it's only happening for plperl, I'd wonder if it's possibly
> > related to our magic symbols being prefixed with _. I noticed that the
> > underscore prefix e.g. changes the behaviour of gcc's "collect2" on AIX, 
> > which
> > is responsible for exporting symbols etc.
> >
> >
> >> Hmph.  Hard to see how that isn't a linker bug.
> > Agreed, given that this is only happening with plperl, and not with any of 
> > the
> > other extensions...
> >
> >
> >> As a stopgap to get the farm green again, I propose adding something like
> >>
> >> ifeq ($(PORTNAME), cygwin)
> >> SHLIB_LINK += -Wl,--export-all-symbols
> >> endif
> >>
> >> to plperl's makefile.
> > :(
> >
>
> It doesn't make me very happy either, but nobody seems to have a better
> idea.

The plpython issue I was investigating in
https://postgr.es/m/20220928022724.erzuk5v4ai4b53do%40awork3.anarazel.de
feels eerily similar to the issue here.

I wonder if it's the same problem - __attribute__((visibility("default")))
works to export - unless another symbol uses __declspec (dllexport). In the
referenced thread that was PyInit_plpy(), here it could be some perl generated
one.

Does this issue resolved if you add
#define PGDLLEXPORT __declspec (dllexport)
to cygwin.h?  Without the -Wl,--export-all-symbols of course.

Greetings,

Andres Freund

Re: Assign TupleTableSlot->tts_tableOid duplicated in tale AM.

2022-09-27 Thread Peter Geoghegan

On Tue, Sep 20, 2022 at 11:51 PM Wenchao Zhang  wrote:
> We can get the two assigned values are same by reading codes. Maybe we should 
> remove one?
>
> What's more, we find that maybe we assign slot->tts_tableOid in outer 
> interface like table_scan_getnextslot may be better and more friendly when we 
> import other pluggable storage formats.

I suppose that you're right; it really should happen in exactly one
place, based on some overarching theory about how tts_tableOid works
with the table AM interface. I just don't know what that theory is.

-- 
Peter Geoghegan

meson vs mingw, plpython, readline and other fun

2022-09-27 Thread Andres Freund

Hi,

Melih mentioned on IM that while he could build postgres with meson on windows
w/ mingw, the tests didn't run.

Issues:

- The bit computing PATH to the temporary installation for running tests only
  dealt with backward slashes in paths on windows, because that's what
  python.org python uses by default. But a msys ucrt python returns forward
  slashes. Trivial fix.

  I didn't encounter this because I'd used a meson from git, which thus didn't
  have msys's patch to return a different prefix.

  This made pg_regress/isolationtester tests other than the main regression
  tests pass.


- I'd only passed in a fake HOST_TUPLE when building pg_regress, oops.

  I don't think it makes sense to come up with a config.guess compatible name
  - they're quite random. And we can't rely on shell to work when targetting
  msvc. The attached patch does:

# Need make up something roughly like x86_64-pc-mingw64. resultmap matches on
# patterns like ".*-.*-mingw.*". We probably can do better, but for now just
# replace 'gcc' with 'mingw' on windows.
host_tuple_cc = cc.get_id()
if host_system == 'windows' and host_tuple_cc == 'gcc'
  host_tuple_cc = 'mingw'
endif
host_tuple = '@0@-@1@-@2@'.format(host_cpu, host_system, host_tuple_cc)

  which I don't perfectly like (e.g. clang also kind of works on windows), but
  it seems ok enough for now. I suspect we'd need a bunch of other changes to
  make clang on windows work.

  This made the main pg_regress tests pass.


- I had not added the logic to not use existing getopt on mingw, causing tap
  tests to fail. Fixing that didn't immediately work because of duplicate
  symbols - because I hadn't copied over -Wl,--allow-multiple-definition.

  "Contrary" to the comment in src/template/win32 it doesn't appear to be
  needed for libpq and pgport - likely because for the meson build an export
  file is generated (needed for msvc anyway, I didn't think to disable it for
  mingw).  But since we need to be able to override getopt(), we obviously
  need --allow-multiple-definition anyway.


- This lead me to try to also add -Wl,--disable-auto-import. However, that
  caused two problems.

  1) plpython tests started to fail, due to not finding Pg_magic_func in
 plpython3.dll. This confounded me for quite a while. It worked for every
 other extension .dll? A lot of looking around lead me to define
#define PGDLLEXPORT __declspec (dllexport)
 for mingw as well. For mingw we otherwise end up with
#define PGDLLEXPORT __attribute__((visibility("default")))
 which works.

 As far as I can tell __attribute__((visibility("default"))) works as long 
as
 as there's no declspec(dllexport) symbol in the same dll. If there is, it
 stops working. Ugh.

 I don't see a reason not to define PGDLLEXPORT as __declspec(dllexport)
 for mingw as well?

 I suspect this is an issue for autoconf mingw build as well, but that
 fails to even configure - there's no coverage on the BF for it I think.

 This made plpython's test pass (again).


  2) psql failed to build due to readline. I hadn't implemented disabling it
 automatically. Somewhat luckily - turns out it actually works (as long as
 --disable-auto-import isn't used), including autocomplete!

 The issue afaict is that while readline has an import library, functions
 aren't "annotated" with __declspec(dllimport), thus without
 --enable-auto-import the references are assumed to be local, and thus
 linking fails.

 It's possible we could "fix" this by declaring the relevant symbols
 ourselves or such.  But for now just adding --enable-auto-import to the
 flags used to link to readline seems saner?

 I think it'd be very cool to finally have a working readline on windows.

 Unfortunately IO::Pty isn't installable on windows, it'd have been
 interesting to see how well that readline works.


- Before I updated mingw, interactive psql didn't show a prompt, making me
  think something was broken. That turned out to be the same in an autoconf
  build. When inside the msys terminal (mintty) isatty returned 0, because of
  some detail of how it emulates ttys. After updating mingw that problem is
  gone.

  I included this partially so I have something to find in email next time I
  search for mintty and isatty :)


With these things fixed, postgres built and ran tests successfully! With
nearly all "relevant" dependencies:
icu, libxml, libslt, lz4, nls, plperl, plpython, pltcl, readline, ssl, zlib,
zstd

Missing are gss and uuid. Both apparently work on windows, but they're not in
the msys repository, and I don't feel like compiling them myself.

Only 5 tests skipped:
- recovery/017_shm - probably not applicable
- recovery/022_crash_temp_files - I don't think it's ok that this test skips,
  but that's for another thread
- psql/010_tab_completion - IO::Pty can't be installed
- psql/010_cancel  - IO::Pty can't be installed
- ldap/001_auth - doesn't know how to find

Re: GUC tables - use designated initializers

2022-09-27 Thread Peter Smith

On Wed, Sep 28, 2022 at 2:21 AM Tom Lane  wrote:
>
> Peter Smith  writes:
> > Enums index a number of the GUC tables. This all relies on the
> > elements being carefully arranged to be in the same order as those
> > enums. There are comments to say what enum index belongs to each table
> > element.
> > But why not use designated initializers to enforce what the comments
> > are hoping for?
>
> Interesting proposal, but it's not clear to me that this solution makes
> the code more bulletproof rather than less so.  Yeah, you removed the
> order dependency, but the other concern here is that the array gets
> updated at all when adding a new enum entry.  This method seems like
> it'd help disguise such an oversight.  In particular, the adjacent
> StaticAssertDecls about the array lengths are testing something different
> than they used to, and I fear they lost some of their power.

Thanks for the feedback!

The current code StaticAssertDecl asserts that the array length is the
same as the number of enums by using hardwired knowledge of what enum
is the "last" one (e.g. DEVELOPER_OPTIONS in the example below).

StaticAssertDecl(lengthof(config_group_names) == (DEVELOPER_OPTIONS + 2),
"array length mismatch");

Hmmm. I think maybe I found the example to justify your fear. It's a
bit subtle and AFAIK the HEAD code would not suffer this problem ---
imagine if the developer adds the new enum before the "last" one (e.g.
ADD_NEW_BEFORE_LAST comes before DEVELOPER_OPTIONS) and at the same
time they *forgot* to update the table elements, then that designated
index [DEVELOPER_OPTIONS] will still ensure the table becomes the
correct increased length (so the StaticAssertDecl will be ok) except
now there will be an undetected "hole" in the table at the forgotten
[ADD_NEW_BEFORE_LAST] index.

> Can we
> improve those checks so they'd catch a missing entry again?

Thinking...

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [small patch] Change datatype of ParallelMessagePending from "volatile bool" to "volatile sig_atomic_t"

2022-09-27 Thread Michael Paquier

On Tue, Sep 27, 2022 at 01:38:26AM +, kuroda.hay...@fujitsu.com wrote:
> Maybe you mentioned about sigint_interrupt_enabled,
> and it also seems to be modified in the signal handler.

Yeah, at least as of the cancel callback psql_cancel_callback() that
handle_sigint() would call on SIGINT as this is set by psql.  So it
does not seem right to use a boolean rather than a sig_atomic_t in
this case, as well.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Log details for client certificate failures

2022-09-27 Thread Masahiko Sawada

On Wed, Sep 28, 2022 at 4:44 AM Jacob Champion  wrote:
>
> On Tue, Sep 27, 2022 at 1:51 AM Masahiko Sawada  wrote:
> > I think we can fix it by the attached patch but I'd like to discuss
> > whether it's worth fixing it.
>
> Whoops. So every time it's changed, we leak a little postmaster memory?

No. Since cluster_name is PGC_POSTMATER, we leak a little postmaster
memory only once when starting up. application_name is PGC_USERSET but
since we normally allocate memory in PortalMemoryContext we eventually
can free it. Since check_cluster_name and check_application_name are
similar, I changed both for consistency.

Regards,

-- 
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-27 Thread Michael Paquier

On Tue, Sep 27, 2022 at 03:11:54PM +0530, Bharath Rupireddy wrote:
> On Tue, Sep 27, 2022 at 2:20 PM Kyotaro Horiguchi
>  wrote:
>> If this is still unacceptable, I propose to change the comment. (I
>> found that the previous patch forgets about do_pg_backup_stop())
>>
>> - * It fills in backup_state with the information required for the backup,
>> + * It fills in the parameter "state" with the information required for the 
>> backup,
> 
> +1. There's another place that uses backup_state in the comments. I
> modified that as well. Please see the attached patch.

Thanks, fixed the comments.  I have let the variable names as they are
now in the code, as both are backup-related code paths so it is IMO
clear that the state is linked to a backup.
--
Michael


signature.asc
Description: PGP signature

Re: Adding a clang-format file

2022-09-27 Thread Peter Geoghegan

On Tue, Sep 27, 2022 at 5:35 AM Aleksander Alekseev
 wrote:
> Personally I don't have anything against the idea. TimescaleDB uses
> clang-format to mimic pgindent and it works quite well. One problem
> worth mentioning though is that the clang-format file is dependent on
> the particular version of clang-format.

I was hoping that something generic could work here. Something that we
could provide that didn't claim to be authoritative, that has a
reasonable degree of compatibility that allows most people to use the
file without much fuss. Kind of like our .editorconfig file. That
might not be a realistic goal, though, since the clang-format settings
are all quite complicated.

-- 
Peter Geoghegan

Re: GUC values - recommended way to declare the C variables?

2022-09-27 Thread Peter Smith

On Tue, Sep 27, 2022 at 11:07 AM Peter Smith  wrote:
>
...

> I will try to post a patch in the new few days to address (per your
> suggestions) some of the variables that I am more familiar with.
>

PSA a small patch to tidy a few of the GUC C variables - adding
comments and removing unnecessary declaration assignments.

make check-world passed OK.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.

v1-0001-Tidied-some-GUC-C-variable-declarations.patch
Description: Binary data

Re: Allow foreign keys to reference a superset of unique columns

2022-09-27 Thread Kaiting Chen

> So the broader point I'm trying to make is that, as I understand it,
> indexes backing foreign key constraints is an implementation detail.
> The SQL standard details the behavior of foreign key constraints
> regardless of implementation details like a backing index. That means
> that the behavior of two column foreign key constraints is defined in
> a single way whether or not there's a backing index at all or whether
> such a backing index, if present, contains one or two columns.

> I understand that for the use case you're describing this isn't the
> absolute most efficient way to implement the desired data semantics.
> But it would be incredibly confusing (and, I think, a violation of the
> SQL standard) to have one foreign key constraint work in a different
> way from another such constraint when both are indistinguishable at
> the constraint level (the backing index isn't an attribute of the
> constraint; it's merely an implementation detail).

It appears to me that the unique index backing a foreign key constraint
*isn't*
an implementation detail in PostgreSQL; rather, the *unique constraint* is
the
implementation detail. The reason I say this is because it's possible to
create
a foreign key constraint where the uniqueness of referenced columns are
guaranteed by only a unique index and where no such unique constraint
exists.

Specifically, this line in the documentation:

  The referenced columns must be the columns of a non-deferrable unique or
  primary key constraint in the referenced table.

Isn't true. In practice, the referenced columns must be the columns of a
valid,
nondeferrable, nonfunctional, nonpartial, unique index. Whether or not a
unique
constraint exists is immaterial to whether or not postgres will let you
define
such a foreign key constraint.

Re: Allow foreign keys to reference a superset of unique columns

2022-09-27 Thread Kaiting Chen

>>> Another example is that I think the idea is only well-defined when
>>> the subset column(s) are a primary key, or at least all marked NOT NULL.
>>> Otherwise they're not as unique as you're claiming.  But then the FK
>>> constraint really has to be dependent on a PK constraint not just an
>>> index definition, since indexes in themselves don't enforce
not-nullness.
>>> That gets back to Peter's complaint that referring to an index isn't
>>> good enough.

>> The unique index underlying foo.a guarantees that (foo.a, foo.b) is
unique
>> if foo.a isn't NULL. That is, there can be multiple rows (NULL, 1) in
foo.
>> However, such a row can't be the target of the foreign key constraint
>> anyway.

> You're ignoring the possibility of a MATCH PARTIAL FK constraint.
> Admittedly, we don't implement those today, and there hasn't been
> a lot of interest in doing so.  But they're in the SQL spec so we
> should fix that someday.

> I also wonder how this all interacts with the UNIQUE NULLS NOT
> DISTINCT feature that we just got done implementing for v15.
> I don't know if the spec says that an FK depending on such a
> constraint should treat nulls as ordinary unique values --- but
> it sure seems like that'd be a plausible user expectation.

I don't think that the UNIQUE NULLS DISTINCT/NOT DISTINCT patch will have
any
impact on this proposal. Currently (and admittedly I haven't thought at all
about MATCH PARTIAL), a NULL in a referencing row precludes a reference at
all:

  * If the foreign key constraint is declared MATCH SIMPLE, then no
referenced
row exists for the referencing row.
  * If the foreign key constraint is declared MATCH FULL, then the
referencing
row must not have a NULL in any of its referencing columns.

UNIQUE NULLS NOT DISTINCT is the current behavior, and this proposql
shouldn't
have a problem with the current behavior. In the case of UNIQUE NULLS
DISTINCT,
then NULLs behave, from a uniqueness perspective, as a singleton value and
thus
shouldn't cause any additional semantic difficulties in regards to this
proposal.

I don't have access to a copy of the SQL specification and it doesn't look
like
anyone implements MATCH PARTIAL. Based on what I can gather from the
internet,
it appears that MATCH PARTIAL allows at most one referencing column to be
NULL,
and guarantees that at least one row in the referenced table matches the
remaining columns; implicitly, multiple matches are allowed. If these are
the
semantics of MATCH PARTIAL, then it seems to me that uniqueness of the
referenced rows aren't very important.

What other semantics and edge cases regarding this proposal should I
consider?

Re: SYSTEM_USER reserved word implementation

2022-09-27 Thread Jacob Champion

On 9/26/22 06:29, Drouvot, Bertrand wrote:
> Please find attached V4 taking care of Jacob's previous comments.

> + /*
> +  * InitializeSystemUser should already be called once we are sure that
> +  * authn_id is not NULL (means auth_method is actually valid).
> +  * But keep the test here also for safety.
> +  */
> + if (authn_id)

Since there are only internal clients to the API, I'd argue this makes
more sense as an Assert(authn_id != NULL), but I don't think it's a
dealbreaker.

> As far the assertion failure mentioned by Michael when moving the 
> SVFOP_SYSTEM_USER from NAMEOID to TEXTOID: V4 is assuming that it is 
> safe to force the collation to C_COLLATION_OID for SQLValueFunction 
> having a TEXT type, but I would be happy to also hear your thoughts 
> about it.

Unfortunately I don't have much to add here; I don't know enough about
the underlying problems.

Thanks,
--Jacob

Re: Allow foreign keys to reference a superset of unique columns

2022-09-27 Thread Tom Lane

Kaiting Chen  writes:
>> Another example is that I think the idea is only well-defined when
>> the subset column(s) are a primary key, or at least all marked NOT NULL.
>> Otherwise they're not as unique as you're claiming.  But then the FK
>> constraint really has to be dependent on a PK constraint not just an
>> index definition, since indexes in themselves don't enforce not-nullness.
>> That gets back to Peter's complaint that referring to an index isn't
>> good enough.

> The unique index underlying foo.a guarantees that (foo.a, foo.b) is unique
> if foo.a isn't NULL. That is, there can be multiple rows (NULL, 1) in foo.
> However, such a row can't be the target of the foreign key constraint
> anyway.

You're ignoring the possibility of a MATCH PARTIAL FK constraint.
Admittedly, we don't implement those today, and there hasn't been
a lot of interest in doing so.  But they're in the SQL spec so we
should fix that someday.

I also wonder how this all interacts with the UNIQUE NULLS NOT
DISTINCT feature that we just got done implementing for v15.
I don't know if the spec says that an FK depending on such a
constraint should treat nulls as ordinary unique values --- but
it sure seems like that'd be a plausible user expectation.

The bottom line is there's zero chance you'll ever convince me that this
is a good idea.  I think the semantics are at best questionable, I think
it will break important optimizations, and I think the chances of
finding ourselves in conflict with some future SQL spec extension are
too high.  (Even if you can make the case that this isn't violating the
spec *today*, which I rather doubt so far as the information_schema is
concerned.  The fact that we've got legacy behaviors that are outside
the spec there isn't a great argument for adding more.)

Now, if you can persuade the SQL committee that this behavior should be
standardized, then two of those concerns would go away (since I don't
think you'll get squishy semantics past them).  But I think submitting
a patch now is way premature and mostly going to waste people's time.

regards, tom lane

Re: Allow foreign keys to reference a superset of unique columns

2022-09-27 Thread Kaiting Chen

> Could we create this additional unique constraint implicitly, when using
> FOREIGN KEY ... REFERENCES on a superset of unique columns? This would
> make it easier to use, but still give proper information_schema output.

Currently, a foreign key declared where the referenced columns have only a
unique index and not a unique constraint already populates the constraint
related columns of information_schema.referential_constraints with NULL. It
doesn't seem like this change would require a major deviation from the
existing
behavior in information_schema:

  CREATE TABLE foo (a integer, b integer);

  CREATE UNIQUE INDEX ON foo (a, b);

  CREATE TABLE bar (
x integer,
y integer,
FOREIGN KEY (x, y) REFERENCES foo(a, b)
  );

  # SELECT * FROM information_schema.referential_constraints
WHERE constraint_name = 'bar_x_y_fkey';

  -[ RECORD 1 ]-+--
  constraint_catalog| kaitingc
  constraint_schema | public
  constraint_name   | bar_x_y_fkey
  unique_constraint_catalog |
  unique_constraint_schema  |
  unique_constraint_name|
  match_option  | NONE
  update_rule   | NO ACTION
  delete_rule   | NO ACTION

The only change would be to information_schema.key_column_usage:

  # SELECT * FROM information_schema.key_column_usage
WHERE constraint_name = 'bar_x_y_fkey';

  -[ RECORD 173
]---+--
  constraint_catalog| kaitingc
  constraint_schema | public
  constraint_name   | bar_x_y_fkey
  table_catalog | kaitingc
  table_schema  | public
  table_name| bar
  column_name   | x
  ordinal_position  | 1
  position_in_unique_constraint | 1
  -[ RECORD 174
]---+--
  constraint_catalog| kaitingc
  constraint_schema | public
  constraint_name   | bar_x_y_fkey
  table_catalog | kaitingc
  table_schema  | public
  table_name| bar
  column_name   | y
  ordinal_position  | 2
  position_in_unique_constraint | 2

Where position_in_unique_constraint would have to be NULL for the referenced
columns that don't appear in the unique index. That column is already
nullable:

  For a foreign-key constraint, ordinal position of the referenced column
within
  its unique constraint (count starts at 1); otherwise null

So it seems like this would be a minor documentation change at most. Also,
should that documentation be updated to mention that it's actually the
"ordinal
position of the referenced column within its unique index" (since it's a
little
confusing that in referential_constraints, unique_constraint_name is NULL)?

Re: Allow foreign keys to reference a superset of unique columns

2022-09-27 Thread Kaiting Chen

> As I was reading through the email chain I had this thought: could you
> get the same benefit (or 90% of it anyway) by instead allowing the
> creation of a uniqueness constraint that contains more columns than
> the index backing it? So long as the index backing it still guaranteed
> the uniqueness on a subset of columns that would seem to be safe.

> After writing down that idea I noticed Wolfgang Walther had commented
> similarly, but it appears that that idea got lost (or at least not
> responded to).

Is it necessary to have the unique constraint at all? This currently works
in
PostgreSQL:

  CREATE TABLE foo (a integer, b integer);

  CREATE UNIQUE INDEX ON foo (a, b);

  CREATE TABLE bar (
x integer,
y integer,
FOREIGN KEY (x, y) REFERENCES foo(a, b)
  );

Where no unique constraint exists on foo (a, b). Forcing the creation of a
unique constraint in this case seems more confusing to me, as a user, than
allowing it without the definition of the unique constraint, given the
existing
behavior.

> I'd be happy to sign up to review an updated patch if you're
> interested in continuing this effort. If so, could you register the
> patch in the CF app (if not there already)?

The patch should already be registered! Though it's still in a state that
needs
a lot of work.

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Tom Lane

Robert Haas  writes:
> On Tue, Sep 27, 2022 at 4:50 PM Tom Lane  wrote:
>> There is a second problem that I am going to hold your feet to the
>> fire about:
>> (lldb) p sizeof(SharedInvalidationMessage)
>> (unsigned long) $1 = 24

> Also, I don't really know what problem you think it's going to cause
> if that structure gets bigger. If we increased the size from 16 bytes
> even all the way to 32 or 64 bytes, what negative impact do you think
> that would have?

Maybe it wouldn't have any great impact.  I don't know, but I don't
think it's incumbent on me to measure that.  You or the patch author
should have had a handle on that question *before* committing.

regards, tom lane

Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-09-27 Thread Jacob Champion

On Mon, Sep 26, 2022 at 6:39 PM Andrey Chudnovsky
 wrote:
> For the providing token directly, that would be primarily used for
> scenarios where the same party controls both the server and the client
> side wrapper.
> I.e. The client knows how to get a token for a particular principal
> and doesn't need any additional information other than human readable
> messages.
> Please clarify the scenarios where you see this falling apart.

The most concrete example I can see is with the OAUTHBEARER error
response. If you want to eventually handle differing scopes per role,
or different error statuses (which the proof-of-concept currently
hardcodes as `invalid_token`), then the client can't assume it knows
what the server is going to say there. I think that's true even if you
control both sides and are hardcoding the provider.

How should we communicate those pieces to a custom client when it's
passing a token directly? The easiest way I can see is for the custom
client to speak the OAUTHBEARER protocol directly (e.g. SASL plugin).
If you had to parse the libpq error message, I don't think that'd be
particularly maintainable.

> I can provide an example in the cloud world. We (Azure) as well as
> other providers offer ways to obtain OAUTH tokens for
> Service-to-Service communication at IAAS / PAAS level.
> on Azure "Managed Identity" feature integrated in Compute VM allows a
> client to make a local http call to get a token. VM itself manages the
> certificate livecycle, as well as implements the corresponding OAUTH
> flow.
> This capability is used by both our 1st party PAAS offerings, as well
> as 3rd party services deploying on VMs or managed K8S clusters.
> Here, the client doesn't need libpq assistance in obtaining the token.

Cool. To me that's the strongest argument yet for directly providing
tokens to libpq.

> My optimistic plan here would be to implement several core OAUTH flows
> in libpq core which would be generic enough to support major
> enterprise OAUTH providers:
> 1. Client Credentials flow (Client_id + Client_secret) for backend 
> applications.
> 2. Authorization Code Flow with PKCE and/or Device code flow for GUI
> applications.

As long as it's clear to DBAs when to use which flow (because existing
documentation for that is hit-and-miss), I think it's reasonable to
eventually support multiple flows. Personally my preference would be
to start with one or two core flows, and expand outward once we're
sure that we do those perfectly. Otherwise the explosion of knobs and
buttons might be overwhelming, both to users and devs.

Related to the question of flows is the client implementation library.
I've mentioned that I don't think iddawc is production-ready. As far
as I'm aware, there is only one certified OpenID relying party written
in C, and that's... an Apache server plugin. That leaves us either
choosing an untested library, scouring the web for a "tested" library
(and hoping we're right in our assessment), or implementing our own
(which is going to tamp down enthusiasm for supporting many flows,
though that has its own set of benefits). If you know of any reliable
implementations with a C API, please let me know.

> (2.) above would require a protocol between libpq and upstream clients
> to exchange several messages.
> Your patch includes a way for libpq to deliver to the client a message
> about the next authentication steps, so planned to build on top of
> that.

Specifically it delivers that message to an end user. If you want a
generic machine client to be able to use that, then we'll need to talk
about how.

> A little about scenarios, we look at.
> What we're trying to achieve here is an easy integration path for
> multiple players in the ecosystem:
> - Managed PaaS Postgres providers (both us and multi-cloud solutions)
> - SaaS providers deploying postgres on IaaS/PaaS providers' clouds
> - Tools - pg_admin, psql and other ones.
> - BI, ETL, Federation and other scenarios where postgres is used as
> the data source.
>
> If we can offer a provider agnostic solution for Backend <=> libpq <=>
> Upstreal client path, we can have all players above build support for
> OAUTH credentials, managed by the cloud provider of their choice.

Well... I don't quite understand why we'd go to the trouble of
providing a provider-agnostic communication solution only to have
everyone write their own provider-specific client support. Unless
you're saying Microsoft would provide an officially blessed plugin for
the *server* side only, and Google would provide one of their own, and
so on.

The server side authorization is the only place where I think it makes
sense to specialize by default. libpq should remain agnostic, with the
understanding that we'll need to make hard decisions when a major
provider decides not to follow a spec.

> For us, that would mean:
> - Better administrator experience with pg_admin / psql handling of the
> AAD (Azure Active Directory) authentication flows.
> - Path for integration

Re: Allow foreign keys to reference a superset of unique columns

2022-09-27 Thread Kaiting Chen

> For one example of where the semantics get fuzzy, it's not
> very clear how the extra-baggage columns ought to participate in
> CASCADE updates.  Currently, if we have
>CREATE TABLE foo (a integer PRIMARY KEY, b integer);
> then an update that changes only foo.b doesn't need to update
> referencing tables, and I think we even have optimizations that
> assume that if no unique-key columns are touched then RI checks
> need not be made.  But if you did
>CREATE TABLE bar (x integer, y integer,
>  FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE
CASCADE);
> then perhaps you expect bar.y to be updated ... or maybe you don't?

I'd expect bar.y to be updated. In my mind, the FOREIGN KEY constraint
should
behave the same, regardless of whether the underlying unique index on the
referenced side is an equivalent set to, or a strict subset of, the
referenced
columns.

> Another example is that I think the idea is only well-defined when
> the subset column(s) are a primary key, or at least all marked NOT NULL.
> Otherwise they're not as unique as you're claiming.  But then the FK
> constraint really has to be dependent on a PK constraint not just an
> index definition, since indexes in themselves don't enforce not-nullness.
> That gets back to Peter's complaint that referring to an index isn't
> good enough.

I think that uniqueness should be guaranteed enough even if the subset
columns
are nullable:

  CREATE TABLE foo (a integer UNIQUE, b integer);

  CREATE TABLE bar (
x integer,
y integer,
FOREIGN KEY (x, y) REFERENCES foo(a, b)
  );

The unique index underlying foo.a guarantees that (foo.a, foo.b) is unique
if
foo.a isn't NULL. That is, there can be multiple rows (NULL, 1) in foo.
However,
such a row can't be the target of the foreign key constraint anyway. So, I'm
fairly certain that, where it matters, a unique index on a nullable subset
of
the referenced columns guarantees a distinct referenced row.

> It's also unclear to me how this ought to interact with the
> information_schema views concerning foreign keys.  We generally
> feel that we don't want to present any non-SQL-compatible data
> in information_schema, for fear that it will confuse applications
> that expect to see SQL-spec behavior there.  So do we leave such
> FKs out of the views altogether, or show only the columns involving
> the associated unique constraint?  Neither answer seems pleasant.

Here's the information_schema output for this example:

  CREATE TABLE foo (a integer, b integer);

  CREATE UNIQUE INDEX ON foo (a, b);

  CREATE TABLE bar (
x integer,
y integer,
FOREIGN KEY (x, y) REFERENCES foo(a, b)
  );

  # SELECT * FROM information_schema.referential_constraints
WHERE constraint_name = 'bar_x_y_fkey';

  -[ RECORD 1 ]-+--
  constraint_catalog| kaitingc
  constraint_schema | public
  constraint_name   | bar_x_y_fkey
  unique_constraint_catalog |
  unique_constraint_schema  |
  unique_constraint_name|
  match_option  | NONE
  update_rule   | NO ACTION
  delete_rule   | NO ACTION

  # SELECT * FROM information_schema.key_column_usage
WHERE constraint_name = 'bar_x_y_fkey';

  -[ RECORD 173
]---+--
  constraint_catalog| kaitingc
  constraint_schema | public
  constraint_name   | bar_x_y_fkey
  table_catalog | kaitingc
  table_schema  | public
  table_name| bar
  column_name   | x
  ordinal_position  | 1
  position_in_unique_constraint | 1
  -[ RECORD 174
]---+--
  constraint_catalog| kaitingc
  constraint_schema | public
  constraint_name   | bar_x_y_fkey
  table_catalog | kaitingc
  table_schema  | public
  table_name| bar
  column_name   | y
  ordinal_position  | 2
  position_in_unique_constraint | 2

It appears that currently in PostgreSQL, the unique_constraint_catalog,
schema,
and name are NULL in referential_constraints when a unique index (without an
associated unique constraint) underlies the referenced columns. The
behaviour
I'm proposing would have the same behavior vis-a-vis
referential_constraints.

As for key_column_usage, I propose that position_in_unique_constraint be
NULL if
the referenced column isn't indexed.

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Tom Lane

... also, lapwing's not too happy [1].  The alter_table test
expects this to yield zero rows, but it doesn't:

 SELECT m.* FROM filenode_mapping m LEFT JOIN pg_class c ON c.oid = m.oid
 WHERE c.oid IS NOT NULL OR m.mapped_oid IS NOT NULL;

I've reproduced that symptom in a 32-bit FreeBSD VM building with clang,
so I suspect that it'll occur on any 32-bit build.  mamba is a couple
hours away from offering a confirmatory data point, though.

(BTW, is that test case sane at all?  I'm bemused by the symmetrical
NOT NULL tests on a fundamentally not-symmetrical left join; what
are those supposed to accomplish?  Also, the fact that it doesn't
deign to show any fields from "c" is sure making it hard to tell
what's wrong.)

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing=2022-09-27%2018%3A40%3A18

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Robert Haas

On Tue, Sep 27, 2022 at 4:50 PM Tom Lane  wrote:
> I wrote:
> >   * frame #0: 0x00010a36af8c postgres`ParseCommitRecord(info='\x80', 
> > xlrec=0x7fa0678a8090, parsed=0x7ff7b5c50e78) at xactdesc.c:102:30
>
> Okay, so the problem is this: by widening RelFileNumber to 64 bits,
> you have increased the alignment requirement of struct RelFileLocator,
> and thereby also SharedInvalidationMessage, to 8 bytes where it had
> been 4.  longfin's alignment check is therefore expecting that
> xl_xact_twophase will likewise be 8-byte-aligned, but it isn't:

Yeah, I reached the same conclusion.

> There is a second problem that I am going to hold your feet to the
> fire about:
>
> (lldb) p sizeof(SharedInvalidationMessage)
> (unsigned long) $1 = 24
>
> We have sweated a good deal for a long time to keep that struct
> to 16 bytes.  I do not think 50% bloat is acceptable.

I noticed that problem, too.

The attached patch, which perhaps you can try out, fixes the alignment
issue and also reduces the size of SharedInvalidationMessage from 24
bytes back to 20 bytes. I do not really see a way to do better than
that. We use 1 type byte, 3 bytes for the backend ID, 4 bytes for the
database OID, and 4 bytes for the tablespace OID. Previously, we then
used 4 bytes for the relfilenode, but now we need 7, and there's no
place from which we can plausibly steal those bits, at least not as
far as I can see. If you have ideas, I'm all ears.

Also, I don't really know what problem you think it's going to cause
if that structure gets bigger. If we increased the size from 16 bytes
even all the way to 32 or 64 bytes, what negative impact do you think
that would have? It would use a little bit more shared memory, but on
modern systems I doubt it would be enough to get excited about. The
bigger impact would probably be that it would make commit records a
bit bigger since those carry invalidations as payload. That is not
great, but I think it only affects commit records for transactions
that do DDL, so I'm struggling to see that as a big performance
problem.

--
Robert Haas
EDB: http://www.enterprisedb.com

v1-0001-tweak-SharedInvalSmgrMsg.patch
Description: Binary data

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Tom Lane

I wrote:
>   * frame #0: 0x00010a36af8c postgres`ParseCommitRecord(info='\x80', 
> xlrec=0x7fa0678a8090, parsed=0x7ff7b5c50e78) at xactdesc.c:102:30

Okay, so the problem is this: by widening RelFileNumber to 64 bits,
you have increased the alignment requirement of struct RelFileLocator,
and thereby also SharedInvalidationMessage, to 8 bytes where it had
been 4.  longfin's alignment check is therefore expecting that
xl_xact_twophase will likewise be 8-byte-aligned, but it isn't:

(lldb) p data
(char *) $0 = 0x7fa06783a0a4 "\U0001"

I'm not sure whether the code that generates commit WAL records is
breaking a contract it should maintain, or xactdesc.c needs to be
taught to not assume that this data is adequately aligned.

There is a second problem that I am going to hold your feet to the
fire about:

(lldb) p sizeof(SharedInvalidationMessage)
(unsigned long) $1 = 24

We have sweated a good deal for a long time to keep that struct
to 16 bytes.  I do not think 50% bloat is acceptable.

regards, tom lane

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Tom Lane

Justin Pryzby  writes:
> On Tue, Sep 27, 2022 at 02:55:18PM -0400, Robert Haas wrote:
>> Both animals are running with -fsanitize=alignment and it's not
>> difficult to believe that the commit mentioned above could have
>> introduced an alignment problem where we didn't have one before, but
>> without a stack backtrace I don't know how to track it down. I tried
>> running those tests locally with -fsanitize=alignment and they passed.

> There's one here:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel=2022-09-27%2018%3A43%3A06

On longfin's host, the test_decoding run produces two core files.
One has a backtrace like this:

  * frame #0: 0x00010a36af8c postgres`ParseCommitRecord(info='\x80', 
xlrec=0x7fa0678a8090, parsed=0x7ff7b5c50e78) at xactdesc.c:102:30
frame #1: 0x00010a765f9e postgres`xact_decode(ctx=0x7fa0680d9118, 
buf=0x7ff7b5c51000) at decode.c:201:5 [opt]
frame #2: 0x00010a765d17 
postgres`LogicalDecodingProcessRecord(ctx=0x7fa0680d9118, 
record=) at decode.c:119:3 [opt]
frame #3: 0x00010a76d890 
postgres`pg_logical_slot_get_changes_guts(fcinfo=, confirm=true, 
binary=false) at logicalfuncs.c:271:5 [opt]
frame #4: 0x00010a76d320 
postgres`pg_logical_slot_get_changes(fcinfo=) at 
logicalfuncs.c:338:9 [opt]
frame #5: 0x00010a5a521d 
postgres`ExecMakeTableFunctionResult(setexpr=, 
econtext=0x7fa068098f50, argContext=, 
expectedDesc=0x7fa06701ba38, randomAccess=) at 
execSRF.c:234:13 [opt]
frame #6: 0x00010a5c405b postgres`FunctionNext(node=0x7fa068098d40) 
at nodeFunctionscan.c:95:5 [opt]
frame #7: 0x00010a5a61b9 postgres`ExecScan(node=0x7fa068098d40, 
accessMtd=(postgres`FunctionNext at nodeFunctionscan.c:61), 
recheckMtd=(postgres`FunctionRecheck at nodeFunctionscan.c:251)) at 
execScan.c:199:10 [opt]
frame #8: 0x00010a596ee0 postgres`standard_ExecutorRun [inlined] 
ExecProcNode(node=0x7fa068098d40) at executor.h:259:9 [opt]
frame #9: 0x00010a596eb8 postgres`standard_ExecutorRun [inlined] 
ExecutePlan(estate=, planstate=0x7fa068098d40, 
use_parallel_mode=, operation=CMD_SELECT, 
sendTuples=, numberTuples=0, direction=1745456112, 
dest=0x7fa067023848, execute_once=) at execMain.c:1636:10 [opt]
frame #10: 0x00010a596e2a 
postgres`standard_ExecutorRun(queryDesc=, direction=1745456112, 
count=0, execute_once=) at execMain.c:363:3 [opt]

and the other

  * frame #0: 0x00010a36af8c postgres`ParseCommitRecord(info='\x80', 
xlrec=0x7fa06783a090, parsed=0x7ff7b5c50040) at xactdesc.c:102:30
frame #1: 0x00010a3cd24d postgres`xact_redo(record=0x7fa0670096c8) 
at xact.c:6161:3
frame #2: 0x00010a41770d 
postgres`ApplyWalRecord(xlogreader=0x7fa0670096c8, 
record=0x7fa06783a060, replayTLI=0x7ff7b5c507f0) at 
xlogrecovery.c:1897:2
frame #3: 0x00010a4154be postgres`PerformWalRecovery at 
xlogrecovery.c:1728:4
frame #4: 0x00010a3e0dc7 postgres`StartupXLOG at xlog.c:5473:3
frame #5: 0x00010a7498a0 postgres`StartupProcessMain at startup.c:267:2 
[opt]
frame #6: 0x00010a73e2cb 
postgres`AuxiliaryProcessMain(auxtype=StartupProcess) at auxprocess.c:141:4 
[opt]
frame #7: 0x00010a745b97 
postgres`StartChildProcess(type=StartupProcess) at postmaster.c:5408:3 [opt]
frame #8: 0x00010a7487e2 postgres`PostmasterStateMachine at 
postmaster.c:4006:16 [opt]
frame #9: 0x00010a745804 
postgres`reaper(postgres_signal_arg=) at postmaster.c:3256:2 [opt]
frame #10: 0x7ff815b16dfd libsystem_platform.dylib`_sigtramp + 29
frame #11: 0x7ff815accd5b libsystem_kernel.dylib`__select + 11
frame #12: 0x00010a74689c postgres`ServerLoop at postmaster.c:1768:13 
[opt]
frame #13: 0x00010a743fbb postgres`PostmasterMain(argc=, 
argv=0x606480a0) at postmaster.c:1476:11 [opt]
frame #14: 0x00010a61c775 postgres`main(argc=8, argv=) at 
main.c:197:3 [opt]

Looks like it might be the same bug, but perhaps not.

I recompiled access/transam and access/rmgrdesc at -O0 to get the accurate
line numbers shown for those files.  Let me know if you need any more
info; I can add -O0 in more places, or poke around in the cores.

regards, tom lane

Re: Convert GetDatum() and DatumGet() macros to inline functions

2022-09-27 Thread Tom Lane

Peter Eisentraut  writes:
> On 26.09.22 19:34, Tom Lane wrote:
>> I think we can do this while still having reasonable type-safety
>> by adding AssertVariableIsOfTypeMacro() checks to the macros.

> (I had looked into AssertVariableIsOfTypeMacro() for an earlier variant 
> of this patch, before I had the idea with the inline functions.  It's in 
> general a bit too strict, such as with short vs int, and signed vs 
> unsigned, but it should work ok for this limited set of uses.)

Yeah.  I had sort of expected to need a UInt64GetDatumFast variant
that would accept uint64, but there doesn't appear to be anyplace
that wants that today.  We should be willing to add it if anyone
complains, though.

regards, tom lane

Re: longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Justin Pryzby

On Tue, Sep 27, 2022 at 02:55:18PM -0400, Robert Haas wrote:
> Both animals are running with -fsanitize=alignment and it's not
> difficult to believe that the commit mentioned above could have
> introduced an alignment problem where we didn't have one before, but
> without a stack backtrace I don't know how to track it down. I tried
> running those tests locally with -fsanitize=alignment and they passed.

There's one here:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel=2022-09-27%2018%3A43%3A06

/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/access/rmgrdesc/xactdesc.c:102:30:
 runtime error: member access within misaligned address 0x04125074 for type 
'xl_xact_invals' (aka 'struct xl_xact_invals'), which requires 8 byte alignment

#0 0x5b6702 in ParseCommitRecord 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/access/rmgrdesc/xactdesc.c:102:30
#1 0xb5264d in xact_decode 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/decode.c:201:5
#2 0xb521ac in LogicalDecodingProcessRecord 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/decode.c:119:3
#3 0xb5e868 in pg_logical_slot_get_changes_guts 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/logicalfuncs.c:271:5
#4 0xb5e25f in pg_logical_slot_get_changes 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/replication/logical/logicalfuncs.c:338:9
#5 0x896bba in ExecMakeTableFunctionResult 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execSRF.c:234:13
#6 0x8c7660 in FunctionNext 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/nodeFunctionscan.c:95:5
#7 0x899048 in ExecScanFetch 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execScan.c:133:9
#8 0x89896b in ExecScan 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execScan.c:199:10
#9 0x8c6892 in ExecFunctionScan 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/nodeFunctionscan.c:270:9
#10 0x892f42 in ExecProcNodeFirst 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execProcnode.c:464:9
#11 0x8802dd in ExecProcNode 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/include/executor/executor.h:259:9
#12 0x8802dd in ExecutePlan 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execMain.c:1636:10
#13 0x8802dd in standard_ExecutorRun 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execMain.c:363:3
#14 0x87ffbb in ExecutorRun 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/executor/execMain.c:307:3
#15 0xc36c07 in PortalRunSelect 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/tcop/pquery.c:924:4
#16 0xc364ca in PortalRun 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/tcop/pquery.c:768:18
#17 0xc34138 in exec_simple_query 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:1238:10
#18 0xc30953 in PostgresMain 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c
#19 0xb27e3f in BackendRun 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:4482:2
#20 0xb2738d in BackendStartup 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:4210:3
#21 0xb2738d in ServerLoop 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1804:7
#22 0xb24312 in PostmasterMain 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1476:11
#23 0x953694 in main 
/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/main/main.c:197:3
#24 0x7f834e39a209 in __libc_start_call_main 
csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#25 0x7f834e39a2bb in __libc_start_main csu/../csu/libc-start.c:389:3
#26 0x4a40a0 in _start 
(/mnt/resource/bf/build/kestrel/HEAD/pgsql.build/tmp_install/mnt/resource/bf/build/kestrel/HEAD/inst/bin/postgres+0x4a40a0)

Note that cfbot is warning for a different reason now:
https://cirrus-ci.com/task/5794615155490816

Re: [PATCH] Log details for client certificate failures

2022-09-27 Thread Jacob Champion

On Tue, Sep 27, 2022 at 1:51 AM Masahiko Sawada  wrote:
> I think we can fix it by the attached patch but I'd like to discuss
> whether it's worth fixing it.

Whoops. So every time it's changed, we leak a little postmaster memory?

Your patch looks good to me and I see no reason not to fix it.

Thanks,
--Jacob

Re: Convert GetDatum() and DatumGet() macros to inline functions

2022-09-27 Thread Peter Eisentraut


On 26.09.22 19:34, Tom Lane wrote:

I think we can do this while still having reasonable type-safety
by adding AssertVariableIsOfTypeMacro() checks to the macros.
An advantage of that solution is that we verify that the code
will be safe for a 32-bit build even in 64-bit builds.  (Of
course, it's just checking the variable's type not its lifespan,
but this is still a step forward.)

0001 attached is what you committed, 0002 is a proposed delta
to fix the Fast macros.


Thanks, I committed it like that.

(I had looked into AssertVariableIsOfTypeMacro() for an earlier variant 
of this patch, before I had the idea with the inline functions.  It's in 
general a bit too strict, such as with short vs int, and signed vs 
unsigned, but it should work ok for this limited set of uses.)

Re: pgsql: Increase width of RelFileNumbers from 32 bits to 56 bits.

2022-09-27 Thread Justin Pryzby

On Tue, Sep 27, 2022 at 03:12:56PM -0400, Robert Haas wrote:
> On Tue, Sep 27, 2022 at 2:51 PM Justin Pryzby  wrote:
> > This seems to be breaking cfbot:
> > https://cirrus-ci.com/github/postgresql-cfbot/postgresql
> >
> > For example:
> > https://cirrus-ci.com/task/6720256776339456
> 
> OK, so it looks like the pg_buffercache test is failing there. But it
> doesn't fail for me, and I don't see a regression.diffs file in the
> output that would enable me to see what is failing. If it's there, can
> you tell me how to find it?

It's here in the artifacts.
https://api.cirrus-ci.com/v1/artifact/task/5647133427630080/testrun/build/testrun/pg_buffercache/regress/regression.diffs

Actually, this worked under autoconf but failed under meson.

I think you just need to make the corresponding change in
contrib/pg_buffercache/meson.build that's in ./Makefile.

Re: pgsql: Increase width of RelFileNumbers from 32 bits to 56 bits.

2022-09-27 Thread Robert Haas

On Tue, Sep 27, 2022 at 2:51 PM Justin Pryzby  wrote:
> This seems to be breaking cfbot:
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql
>
> For example:
> https://cirrus-ci.com/task/6720256776339456

OK, so it looks like the pg_buffercache test is failing there. But it
doesn't fail for me, and I don't see a regression.diffs file in the
output that would enable me to see what is failing. If it's there, can
you tell me how to find it?

> Some other minor issues:

Will push fixes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2022-09-27 Thread Thomas Munro

On Wed, Sep 28, 2022 at 1:03 AM Bharath Rupireddy
 wrote:>
> On Tue, Sep 27, 2022 at 3:01 PM Thomas Munro  wrote:
> > Something like the attached.
>
> Isn't it also better to add notes in win32pread.c and win32pwrite.c
> about the callers doing lseek(SEEK_SET) if they wish to and Windows
> implementations changing file position? We can also add a TODO item
> about replacing pg_ versions with pread and friends someday when
> Windows fixes the issue? Having it in the commit and include/port.h is
> good, but the comments in win32pread.c and win32pwrite.c make life
> easier IMO. Otherwise, the patch LGTM.

Thanks, will do.  FWIW I doubt the OS itself will change released
behaviour like that, but I did complain about the straight-up
misleading documentation and they agreed and fixed it[1].

After some looking around, the only way I could find to avoid this
behaviour is to switch to async handles, which do behave as documented
in this respect, but then you can't use plain read/write at all unless
you write replacement wrappers for those too, and AFAICT that adds
more system calls (start write, wait for write to finish) and
complexity/weirdness without any real payoff so it seems like the
wrong direction, at least without more research that I'm not pursuing
currently.  (FWIW in AIO porting experiments a while back we used
async handles to get IOs running concurrently with CPU work and
possibly other IOs so it was actually worth doing, but that was only
for smgr and the main wal writing code where there's no intermixed
plain read/write calls as you have here, so the problem doesn't even
come up.)

[1] https://github.com/MicrosoftDocs/sdk-api/pull/1309

longfin and tamandua aren't too happy but I'm not sure why

2022-09-27 Thread Robert Haas

Hi,

longfin and tamandua recently begun failing like this, quite possibly
as a result of 05d4cbf9b6ba708858984b01ca0fc56d59d4ec7c:

+++ regress check in contrib/test_decoding +++
test ddl  ... FAILED (test process exited with
exit code 2) 3276 ms
(all other tests in this suite also fail, probably because the server
crashed here)

The server logs look like this:

2022-09-27 13:51:08.652 EDT [37090:4] LOG:  server process (PID 37105)
was terminated by signal 4: Illegal instruction: 4
2022-09-27 13:51:08.652 EDT [37090:5] DETAIL:  Failed process was
running: SELECT data FROM
pg_logical_slot_get_changes('regression_slot', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1');

Both animals are running with -fsanitize=alignment and it's not
difficult to believe that the commit mentioned above could have
introduced an alignment problem where we didn't have one before, but
without a stack backtrace I don't know how to track it down. I tried
running those tests locally with -fsanitize=alignment and they passed.

Any ideas on how to track this down?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-09-27 Thread Melanie Plageman

v30 attached
rebased and pgstat_io_ops.c builds with meson now
also, I tested with pgstat_report_stat() only flushing when forced and
tests still pass
From 153b06200b36891c9e07df526a86dbd913e36e3e Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Thu, 11 Aug 2022 18:28:50 -0400
Subject: [PATCH v30 3/3] Add system view tracking IO ops per backend type

Add pg_stat_io, a system view which tracks the number of IOOps (allocs,
extends, fsyncs, reads, and writes) done through each IOContext (shared
buffers, local buffers, strategy buffers) by each type of backend (e.g.
client backend, checkpointer).

Some BackendTypes do not accumulate IO operations statistics and will
not be included in the view.

Some IOContexts are not used by some BackendTypes and will not be in the
view. For example, checkpointer does not use a BufferAccessStrategy
(currently), so there will be no row for the "strategy" IOContext for
checkpointer.

Some IOOps are invalid in combination with certain IOContexts. Those
cells will be NULL in the view to distinguish between 0 observed IOOps
of that type and an invalid combination. For example, local buffers are
not fsync'd so cells for all BackendTypes for IOCONTEXT_STRATEGY and
IOOP_FSYNC will be NULL.

Some BackendTypes never perform certain IOOps. Those cells will also be
NULL in the view. For example, bgwriter should not perform reads.

View stats are populated with statistics incremented when a backend
performs an IO Operation and maintained by the cumulative statistics
subsystem.

Each row of the view shows stats for a particular BackendType and
IOContext combination (e.g. shared buffer accesses by checkpointer) and
each column in the view is the total number of IO Operations done (e.g.
writes).
So a cell in the view would be, for example, the number of shared
buffers written by checkpointer since the last stats reset.

Note that some of the cells in the view are redundant with fields in
pg_stat_bgwriter (e.g. buffers_backend), however these have been kept in
pg_stat_bgwriter for backwards compatibility. Deriving the redundant
pg_stat_bgwriter stats from the IO operations stats structures was also
problematic due to the separate reset targets for 'bgwriter' and 'io'.

Suggested by Andres Freund

Author: Melanie Plageman 
Reviewed-by: Andres Freund 
Reviewed-by: Justin Pryzby 
Reviewed-by: Kyotaro Horiguchi 
Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml | 115 ++-
 src/backend/catalog/system_views.sql |  12 ++
 src/backend/utils/adt/pgstatfuncs.c  | 117 
 src/include/catalog/pg_proc.dat  |   9 ++
 src/test/regress/expected/rules.out  |   9 ++
 src/test/regress/expected/stats.out  | 202 +++
 src/test/regress/sql/stats.sql   | 104 ++
 7 files changed, 567 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9440b41770..c7ca078bc8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -448,6 +448,15 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser

+ 
+  pg_stat_iopg_stat_io
+  A row for each IO Context for each backend type showing
+  statistics about backend IO operations. See
+   
+   pg_stat_io for details.
+ 
+ 
+

   pg_stat_walpg_stat_wal
   One row only, showing statistics about WAL activity. See
@@ -3600,7 +3609,111 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
stats_reset timestamp with time zone

-   Time at which these statistics were last reset
+   Time at which these statistics were last reset.
+  
+ 
+
+   
+  
+
+ 
+
+ 
+  pg_stat_io
+
+  
+   pg_stat_io
+  
+
+  
+   The pg_stat_io view has a row for each backend type
+   and IO Context containing global data for the cluster on IO Operations done
+   by that backend type in that IO Context.
+  
+
+  
+   pg_stat_io View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+ 
+  
+   backend_type text
+  
+  
+   Type of backend (e.g. background worker, autovacuum worker).
+  
+ 
+
+ 
+  
+   io_context text
+  
+  
+   IO Context used (e.g. shared buffers, local buffers).
+  
+ 
+
+ 
+  
+   alloc bigint
+  
+  
+   Number of buffers allocated.
+  
+ 
+
+ 
+  
+   read bigint
+  
+  
+   Number of blocks read.
+  
+ 
+
+ 
+  
+   write bigint
+  
+  
+   Number of blocks written.
+  
+ 
+
+ 
+  
+   extend bigint
+  
+  
+   Number of blocks extended.
+  
+ 
+
+ 
+  
+   fsync bigint
+  
+  
+   Number of blocks fsynced.
+  
+

Re: pgsql: Fix perltidy breaking perlcritic

2022-09-27 Thread Dagfinn Ilmari Mannsåker

[resending to -hackers instead of -committers]

Andrew Dunstan  writes:

> On Fri, Sep 9, 2022 at 10:44 PM John Naylor 
> wrote:
>
>> On Fri, Sep 9, 2022 at 3:32 AM Andrew Dunstan  wrote:
>>
>> > A better way do do this IMNSHO is to put the eval in a block on its own
>> along with the no critic marker on its own line, like this:
>> >
>> > {
>> >## no critic (ProhibitStringyEval)
>> >eval ...
>> > }
>> >
>> > perlcritic respects block boundaries for its directives.
>>
>> I tried that in the attached -- it looks a bit nicer but requires more
>> explanation. I don't have strong feelings either way.
>>
>>
> Maybe even better would be just this, which I bet perltidy would not monkey
> with, and would require no explanation:
>
> eval "\$hash_ref = $_";  ## no critic (ProhibitStringyEval)

I didn't see this until it got committed, since I'm not subscribed to
-committers, but I think it would be even better to rely on the fact
that eval returns the value of the last expression in the string, which
also gets rid of the ugly quoting and escaping, per the attached.

- ilmari

>From 8ef12d134e0a21c289796207d87244ba5f5ec92c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Mon, 12 Sep 2022 10:43:16 +0100
Subject: [PATCH] Use return value of eval instead of assigning inside string

---
 src/backend/catalog/Catalog.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 919a828ca7..41bbabdfee 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -315,7 +315,7 @@ sub ParseData
 	# We're treating the input line as a piece of Perl, so we
 	# need to use string eval here. Tell perlcritic we know what
 	# we're doing.
-	eval "\$hash_ref = $_"; ## no critic (ProhibitStringyEval)
+	$hash_ref = eval $_;## no critic (ProhibitStringyEval)
 	if (!ref $hash_ref)
 	{
 		die "$input_file: error parsing line $.:\n$_\n";
@@ -361,7 +361,7 @@ sub ParseData
 		# the whole file at once.
 		local $/;
 		my $full_file = <$ifd>;
-		eval "\$data = $full_file"## no critic (ProhibitStringyEval)
+		$data = eval $full_file## no critic (ProhibitStringyEval)
 		  or die "error parsing $input_file\n";
 		foreach my $hash_ref (@{$data})
 		{
-- 
2.34.1

Re: making relfilenodes 56 bits

2022-09-27 Thread Robert Haas

On Tue, Sep 27, 2022 at 2:33 AM Dilip Kumar  wrote:
> Looks fine to me.

OK, committed. I also committed the 0002 patch with some wordsmithing,
and I removed a < 0 test an an unsigned value because my compiler
complained about it. 0001 turned out to make headerscheck sad, so I
just pushed a fix for that, too.

I'm not too sure about 0003. I think if we need an is_shared flag
maybe we might as well just pass the tablespace OID. The is_shared
flag seems to just make things a bit complicated for the callers for
no real benefit.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Backends stalled in 'startup' state

2022-09-27 Thread Ashwin Agrawal

On Thu, Sep 15, 2022 at 4:42 PM Ashwin Agrawal  wrote:

>
> We recently saw many backends (close to max_connection limit) get stalled
> in 'startup' in one of the production environments for Greenplum (fork of
> PostgreSQL). Tracing the reason, it was found all the tuples created by
> bootstrap (xmin=1) in pg_attribute were at super high block numbers (for
> example beyond 30,000). Tracing the reason for the backend startup stall
> exactly matched Tom's reasoning in [1]. Stalls became much longer in
> presence of sub-transaction overflow or presence of long running
> transactions as tuple visibility took longer. The thread ruled out the
> possibility of system catalog rows to be present in higher block numbers
> instead of in front for pg_attribute.
>
> This thread provides simple reproduction on the latest version of
> PostgreSQL and RCA for how bootstrap catalog entries can move to higher
> blocks and as a result cause stalls for backend starts. Simple fix to avoid
> the issue provided at the end.
>
> The cause is syncscan triggering during VACUUM FULL. VACUUM FULL rewrites
> the table by performing the seqscan as well. And
> heapam_relation_copy_for_cluster() conveys feel free to use syncscan. Hence
> logic to not start from block 0 instead some other block already in cache
> is possible and opens the possibility to move the bootstrap tuples to
> anywhere else in the table.
>
> --
> Repro
> --
> -- create database to play
> drop database if exists test;
> create database test;
> \c test
>
> -- function just to create many tables to increase pg_attribute size
> -- (ideally many column table might do the job more easily)
> CREATE OR REPLACE FUNCTION public.f(id integer)
>  RETURNS void
>  LANGUAGE plpgsql
>  STRICT
> AS $function$
> declare
>   sql text;
>   i int;
> begin
>   for i in id..id+ loop
> sql='create table if not exists tbl'||i||' (id int)';
> execute sql;
>   end loop;
> end;
> $function$;
>
> select f(1);
> select f(2);
> select f(3);
> select f(4);
>
> -- validate pg_attribute size is greater than 1/4 of shared_buffers
> -- for syncscan to triggger
> show shared_buffers;
> select pg_size_pretty(pg_relation_size('pg_attribute'));
> select ctid, xmin, attrelid, attname from pg_attribute where xmin = 1
> limit 5;
>
> -- perform seq scan of pg_attribute to page past bootstrapped tuples
> copy (select * from pg_attribute limit 2000) to '/tmp/p';
>
> -- this will internally use syncscan starting with block after bootstrap
> tuples
> -- and hence move bootstrap tuples last to higher block numbers
> vacuum full pg_attribute;
>
> --
> Sample run
> --
> show shared_buffers;
>  shared_buffers
> 
>  128MB
> (1 row)
>
> select pg_size_pretty(pg_relation_size('pg_attribute'));
>  pg_size_pretty
> 
>  40 MB
> (1 row)
>
> select ctid, xmin, attrelid, attname from pg_attribute where xmin = 1
> limit 5;
>  ctid  | xmin | attrelid |   attname
> ---+--+--+--
>  (0,1) |1 | 1255 | oid
>  (0,2) |1 | 1255 | proname
>  (0,3) |1 | 1255 | pronamespace
>  (0,4) |1 | 1255 | proowner
>  (0,5) |1 | 1255 | prolang
> (5 rows)
>
> copy (select * from pg_attribute limit 2000) to '/tmp/p';
> COPY 2000
> vacuum full pg_attribute;
> VACUUM
> select ctid, xmin, attrelid, attname from pg_attribute where xmin = 1
> limit 5;
>ctid| xmin | attrelid |   attname
> ---+--+--+--
>  (5115,14) |1 | 1255 | oid
>  (5115,15) |1 | 1255 | proname
>  (5115,16) |1 | 1255 | pronamespace
>  (5115,17) |1 | 1255 | proowner
>  (5115,18) |1 | 1255 | prolang
> (5 rows)
>
>
> Note:
> -- used logic causing the problem to fix it as well on the system :-)
> -- scan till block where bootstrap tuples are located
> select ctid, xmin, attrelid, attname from pg_attribute where xmin = 1
> limit 5;
> -- now due to syncscan triggering it will pick the blocks with bootstrap
> tuples first and help to bring them back to front
> vacuum full pg_attribute;
>
> --
> Patch to avoid the problem:
> --
> diff --git a/src/backend/access/heap/heapam_handler.c
> b/src/backend/access/heap/heapam_handler.c
> index a3414a76e8..4c031914a3 100644
> --- a/src/backend/access/heap/heapam_handler.c
> +++ b/src/backend/access/heap/heapam_handler.c
> @@ -757,7 +757,17 @@ heapam_relation_copy_for_cluster(Relation OldHeap,
> Relation NewHeap,
> pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
>
>  PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
>
> -   tableScan =

Re: Temporary file access API

2022-09-27 Thread John Morris

  *   So I think that code simplification and easy adoption of in-memory data
changes (such as encryption or compression) are two rather distinct goals.
admit that I'm running out of ideas how to develop a framework that'd be
useful for both.

I’m also wondering about code simplification vs a more general 
encryption/compression framework. I started with the current proposal, and I’m 
also looking at David Steele’s approach to encryption/compression in 
pgbackrest. I’m beginning to think we should do a high-level design which 
includes encryption/compression, and then implement it as a “simplification” 
without actually doing the transformations.

Re: GUC tables - use designated initializers

2022-09-27 Thread Tom Lane

Peter Smith  writes:
> Enums index a number of the GUC tables. This all relies on the
> elements being carefully arranged to be in the same order as those
> enums. There are comments to say what enum index belongs to each table
> element.
> But why not use designated initializers to enforce what the comments
> are hoping for?

Interesting proposal, but it's not clear to me that this solution makes
the code more bulletproof rather than less so.  Yeah, you removed the
order dependency, but the other concern here is that the array gets
updated at all when adding a new enum entry.  This method seems like
it'd help disguise such an oversight.  In particular, the adjacent
StaticAssertDecls about the array lengths are testing something different
than they used to, and I fear they lost some of their power.  Can we
improve those checks so they'd catch a missing entry again?

> Furthermore, with this change, now the GUC table elements are able to
> be rearranged into any different order - eg alphabetical - if that
> would be useful (my patch does not do this).

If anything, that's an anti-feature IMV.  I quite dislike code where
the same set of items are dealt with in randomly different orders
in different places.

regards, tom lane

Re: kerberos/001_auth test fails on arm CPU darwin

2022-09-27 Thread Nazir Bilal Yavuz


Hi,

Thanks for the reviews!


On 9/27/2022 5:21 AM, Tom Lane wrote:

Andres Freund  writes:

Maybe we should rely on PATH, rather than hardcoding OS dependent locations?

My suggestion to consult krb5-config first was meant to allow PATH
to influence the results.  However, if that doesn't work, it's important
IMO to have a sane list of hardwired places to look in.



I updated my patch regarding these reviews.

The current logic is it will try to find all executables in that 
order(if it finds all executables, it won't try remaining steps):



1 - 'krb5-config --prefix'

2 - hardcoded paths(I added arm and MacPorts paths for darwin)

3 - from PATH

Also, I tried to do some refactoring for adding another paths to search 
in the future and being sure about all executables are found.


Ci run after fix is applied:
https://cirrus-ci.com/build/5758254918664192

Regards,
Nazir Bilal Yavuz

From 916de33762aa72e9d9e4665ba23a1e610c585570 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Mon, 26 Sep 2022 10:56:51 +0300
Subject: [PATCH 1/2] ci: Add arm CPU for darwin

.
ci-os-only: darwin.
---
 .cirrus.yml | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 7b5cb02102..cecd978515 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -259,7 +259,6 @@ task:
   name: macOS - Monterey - Meson
 
   env:
-CPUS: 12 # always get that much for cirrusci macOS instances
 BUILD_JOBS: $CPUS
 TEST_JOBS: $CPUS # already fast enough to not be worth tuning
 
@@ -275,15 +274,26 @@ task:
 
   only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || 
$CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(macos|darwin|osx).*'
 
-  osx_instance:
-image: monterey-base
+  matrix:
+- name: macOS - Monterey - Meson - ARM CPU
+  macos_instance:
+image: ghcr.io/cirruslabs/macos-monterey-base:latest
+  env:
+BREW_PATH: /opt/homebrew
+CPUS: 4
+
+- name: macOS - Monterey - Meson - Intel CPU
+  osx_instance:
+image: monterey-base
+  env:
+BREW_PATH: /usr/local
+CPUS: 12 # always get that much for cirrusci macOS instances
 
   sysinfo_script: |
 id
 uname -a
 ulimit -a -H && ulimit -a -S
 export
-
   setup_core_files_script:
 - mkdir ${HOME}/cores
 - sudo sysctl kern.corefile="${HOME}/cores/core.%P"
@@ -322,7 +332,7 @@ task:
   ccache_cache:
 folder: $CCACHE_DIR
   configure_script: |
-brewpath="/usr/local"
+brewpath=${BREW_PATH}
 PKG_CONFIG_PATH="${brewpath}/lib/pkgconfig:${PKG_CONFIG_PATH}"
 
 for pkg in icu4c krb5 openldap openssl zstd ; do
-- 
2.25.1

From cc3f5236a6c18d9d16c8c6faa1e2044d0610f490 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Fri, 23 Sep 2022 14:30:06 +0300
Subject: [PATCH 2/2] fix: darwin: ARM CPU darwin krb5 path fix

---
 src/test/kerberos/t/001_auth.pl | 89 -
 1 file changed, 65 insertions(+), 24 deletions(-)

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index 47169a1d1e..c1211b80dd 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -30,39 +30,80 @@ elsif ($ENV{PG_TEST_EXTRA} !~ /\bkerberos\b/)
plan skip_all => 'Potentially unsafe test GSSAPI/Kerberos not enabled 
in PG_TEST_EXTRA';
 }
 
-my ($krb5_bin_dir, $krb5_sbin_dir);
-
-if ($^O eq 'darwin')
-{
-   $krb5_bin_dir  = '/usr/local/opt/krb5/bin';
-   $krb5_sbin_dir = '/usr/local/opt/krb5/sbin';
-}
-elsif ($^O eq 'freebsd')
-{
-   $krb5_bin_dir  = '/usr/local/bin';
-   $krb5_sbin_dir = '/usr/local/sbin';
-}
-elsif ($^O eq 'linux')
-{
-   $krb5_sbin_dir = '/usr/sbin';
-}
-
 my $krb5_config  = 'krb5-config';
 my $kinit= 'kinit';
 my $kdb5_util= 'kdb5_util';
 my $kadmin_local = 'kadmin.local';
 my $krb5kdc  = 'krb5kdc';
 
-if ($krb5_bin_dir && -d $krb5_bin_dir)
+# get prefix for kerberos executables and try to find them at this path
+sub test_krb5_paths
 {
-   $krb5_config = $krb5_bin_dir . '/' . $krb5_config;
-   $kinit   = $krb5_bin_dir . '/' . $kinit;
+   my ($is_krb5_found, $krb5_path) = (@_);
+
+   # if it is alredy found return
+   if ($is_krb5_found)
+   {
+   return 1;
+   }
+
+   my ($kdb5_util_path, $kadmin_local_path, $krb5kdc_path, 
$krb5_config_path, $kinit_path, @krb5_file_paths, $bin, $sbin);
+
+   # remove '\n' since 'krb5-config --prefix' returns path ends with '\n'
+   $krb5_path =~ s/\n//g;
+   # remove trailing slashes
+   $krb5_path =~ s{(.+)/\z}{$1};
+   $bin = '/bin/';
+   $sbin = '/sbin/';
+   $kdb5_util_path= $krb5_path . $sbin . $kdb5_util;
+   $kadmin_local_path = $krb5_path . $sbin . $kadmin_local;
+   $krb5kdc_path  = $krb5_path . $sbin . $krb5kdc;
+   $krb5_config_path  = $krb5_path . $bin . $krb5_config;
+   $kinit_path= $krb5_path . $bin . $kinit;
+   @krb5_file_paths = ($kdb5_util_path,

Re: DROP OWNED BY is broken on master branch.

2022-09-27 Thread Robert Haas

On Tue, Sep 27, 2022 at 2:53 AM Rushabh Lathia  wrote:
> Yes, I was also thinking to avoid the duplicate logic but couldn't found
> a way.  I did the quick testing with the patch, and reported test is working
> fine.   But  "make check" is failing with few failures.

Oh, woops. There was a dumb mistake in that version -- it was testing
sdepForm->dbid == SHARED_DEPENDENCY_OWNER, which is nonsense, instead
of sdepForm->dbid == MyDatabaseId. Here's a fixed version.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v2-0001-Fix-bug-in-DROP-OWNED-BY.patch
Description: Binary data

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-09-27 Thread Hamid Akhtar

On Tue, 26 Jul 2022 at 21:35, Simon Riggs 
wrote:

> On Fri, 15 Jul 2022 at 12:29, Aleksander Alekseev
>  wrote:
> > Personally I didn't expect that
> > merging patches in this thread would take that long. They are in
> > "Ready for Committer" state for a long time now and there are no known
> > issues with them other than unit tests for SLRU [1] should be merged
> > first.
>
> These patches look ready to me, including the SLRU tests.
>
> Even though they do very little, these patches touch many aspects of
> the code, so it would make sense to apply these as the last step in
> the CF.
>
> To encourage committers to take that next step, let's have a
> democratic vote on moving this forwards:
> +1 from me.
>

This set of patches no longer applies cleanly to the master branch. There
are lots of
hunks as well as failures. Please rebase the patches.

There are failures for multiple files including the one given below:

patching file src/backend/replication/logical/worker.c
Hunk #1 succeeded at 1089 (offset 1 line).
Hunk #2 succeeded at 1481 (offset 1 line).
Hunk #3 succeeded at 3322 (offset 2 lines).
Hunk #4 succeeded at 3493 (offset 2 lines).
Hunk #5 FAILED at 4009.
1 out of 5 hunks FAILED -- saving rejects to file
src/backend/replication/logical/worker.c.rej

Re: Differentiate MERGE queries with different structures

2022-09-27 Thread bt22nakamorit


2022-09-27 17:48 に Alvaro Herrera さんは書きました:

Pushed, thanks.

The code and the tests went fine on my environment.
Thank you Alvaro for your help, and thank you Julien for your review!

Best,
Tatsuhiro Nakamori

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2022-09-27 Thread Bharath Rupireddy

On Tue, Aug 9, 2022 at 2:16 PM Bharath Rupireddy
 wrote:
>
> I've explained the problem with the current HA setup with synchronous
> replication upthread at [1]. Let me reiterate it here once again.
>
> When a query is cancelled (a simple stroke of CTRL+C or
> pg_cancel_backend() call) while the txn is waiting for ack in
> SyncRepWaitForLSN(); for the client, the txn is actually committed
> (locally-committed-but-not-yet-replicated to all of sync standbys)
> like any other txn, a warning is emitted into server logs but it is of
> no use for the client (think of client as applications). There can be
> many such txns waiting for ack in SyncRepWaitForLSN() and query cancel
> can be issued on all of those sessions. The problem is that the
> subsequent reads will then be able to read all of those
> locally-committed-but-not-yet-replicated to all of sync standbys txns
> data - this is what I call an inconsistency (can we call this a
> read-after-write inconsistency?) because of lack of proper query
> cancel handling. And if the sync standbys are down or unable to come
> up for some reason, until then, the primary will be serving clients
> with the inconsistent data. BTW, I found a report of this problem here
> [2].
>
> The solution proposed for the above problem is to just 'not honor
> query cancels at all while waiting for ack in SyncRepWaitForLSN()'.
>
> When a proc die is pending, then also, there can be
> locally-committed-but-not-yet-replicated to all of sync standbys txns.
> Typically, there are two choices for the clients 1) reuse the primary
> instance after restart 2) failover to one of sync standbys. For case
> (1), there might be read-after-write inconsistency as explained above.
> For case (2), those txns might get lost completely if the failover
> target sync standby or the new primary didn't receive them and the
> other sync standbys that have received them are now ahead and need a
> special treatment (run pg_rewind) for them to be able to connect to
> new primary.
>
> The solution proposed for case (1) of the above problem is to 'process
> the ProcDiePending immediately and upon restart the first backend can
> wait until the sync standbys are caught up to ensure no inconsistent
> reads'.
> The solution proposed for case (2) of the above problem is to 'either
> run pg_rewind for the sync standbys that are ahead or use the idea
> proposed at [3]'.
>
> I hope the above explanation helps.
>
> [1] 
> https://www.postgresql.org/message-id/flat/CALj2ACUrOB59QaE6%3DjF2cFAyv1MR7fzD8tr4YM5%2BOwEYG1SNzA%40mail.gmail.com
> [2] 
> https://stackoverflow.com/questions/42686097/how-to-disable-uncommited-reads-in-postgres-synchronous-replication
> [3] 
> https://www.postgresql.org/message-id/CALj2ACX-xO-ZenQt1MWazj0Z3ziSXBMr24N_X2c0dYysPQghrw%40mail.gmail.com

I'm attaching the v2 patch rebased on the latest HEAD. Please note
that there are still some open points, I'm yet to find time to think
more about them. Meanwhile, I'm posting the v2 patch for making cfbot
happy. Any further thoughts on the overall design of the patch are
most welcome. Thanks.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From be6734c83d72333cc4ec1b2be1f4d54b875b74c8 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 27 Sep 2022 12:55:34 +
Subject: [PATCH v2] Wait specified amount of time before cancelling sync
 replication

In PostgreSQL high availability setup with synchronous replication,
typically all the transactions first locally get committed, then
streamed to the synchronous standbys and the backends that generated
the transaction will wait for acknowledgement from synchronous
standbys. While waiting for acknowledgement, it may happen that the
query or the transaction gets canceled or the backend that's waiting
for acknowledgement is asked to exit. In either of these cases, the
wait for acknowledgement gets canceled and leaves transaction in an
inconsistent state as it got committed locally but not on the
standbys. When set the GUC synchronous_replication_naptime_before_cancel
introduced in this patch, it will let the backends wait for the
acknowledgement before canceling the wait for acknowledgement so
that the transaction can be available in synchronous standbys as
well.
---
 doc/src/sgml/config.sgml  | 30 +++
 src/backend/replication/syncrep.c | 50 +++
 src/backend/utils/misc/guc_tables.c   | 12 +
 src/backend/utils/misc/postgresql.conf.sample |  2 +
 src/include/replication/syncrep.h |  3 ++
 5 files changed, 97 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d8848bc774..baeef49012 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4524,6 +4524,36 @@ ANY num_sync ( 
+  synchronous_replication_naptime_before_cancel (integer)
+  
+   synchronous_replication_naptime_before_cancel configuration

Re: [RFC] building postgres with meson - v13

2022-09-27 Thread Peter Eisentraut


On 26.09.22 18:35, Andres Freund wrote:

9f5be26c1215 meson: Add docs for building with meson

I do like the overall layout of this.

The "Supported Platforms" section should be moved back to near the end
of the chapter.  I don't see a reason to move it forward, at least
none that is related to the meson issue.

The changes to the "Getting the Source" section are also not
appropriate for this patch.

We don't really support building from a tarball with meson yet (you'd need to
confiure, maintainer-clean, configure meson), so it does make some sense...


Okay, interesting point.  I suggest that we write it as if that were 
fixed, and for the time being insert a  (or similar) explaining 
the above restriction.  Otherwise we'll have to rewrite it again later.

Re: A doubt about a newly added errdetail

2022-09-27 Thread Alvaro Herrera

While reading this code, I noticed that function expr_allowed_in_node()
has a very strange API: it doesn't have any return convention at all
other than "if we didn't modify errdetail_str then all is good".  I was
tempted to add an "Assert(*errdetail_msg == NULL)" at the start of it,
just to make sure that it is not called if a message is already set.

I think it would be much saner to inline the few lines of that function
in its sole caller, as in the attached.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"E pur si muove" (Galileo Galilei)
diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 8895e1be4e..f9792df05f 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -447,36 +447,6 @@ contain_mutable_or_user_functions_checker(Oid func_id, void *context)
 			func_id >= FirstNormalObjectId);
 }
 
-/*
- * Check if the node contains any disallowed object. Subroutine for
- * check_simple_rowfilter_expr_walker.
- *
- * If a disallowed object is found, *errdetail_msg is set to a (possibly
- * translated) message to use as errdetail.  If none, *errdetail_msg is not
- * modified.
- */
-static void
-expr_allowed_in_node(Node *node, ParseState *pstate, char **errdetail_msg)
-{
-	if (IsA(node, List))
-	{
-		/*
-		 * OK, we don't need to perform other expr checks for List nodes
-		 * because those are undefined for List.
-		 */
-		return;
-	}
-
-	if (exprType(node) >= FirstNormalObjectId)
-		*errdetail_msg = _("User-defined types are not allowed.");
-	else if (check_functions_in_node(node, contain_mutable_or_user_functions_checker,
-	 (void *) pstate))
-		*errdetail_msg = _("User-defined or built-in mutable functions are not allowed.");
-	else if (exprCollation(node) >= FirstNormalObjectId ||
-			 exprInputCollation(node) >= FirstNormalObjectId)
-		*errdetail_msg = _("User-defined collations are not allowed.");
-}
-
 /*
  * The row filter walker checks if the row filter expression is a "simple
  * expression".
@@ -586,12 +556,32 @@ check_simple_rowfilter_expr_walker(Node *node, ParseState *pstate)
 	}
 
 	/*
-	 * For all the supported nodes, check the types, functions, and collations
-	 * used in the nodes.
+	 * For all the supported nodes, if we haven't already found a problem,
+	 * check the types, functions, and collations used in it.
 	 */
 	if (!errdetail_msg)
-		expr_allowed_in_node(node, pstate, _msg);
+	{
+		if (IsA(node, List))
+		{
+			/*
+			 * OK, we don't need to perform other expr checks for List nodes
+			 * because those are undefined for List.
+			 */
+		}
+		else if (exprType(node) >= FirstNormalObjectId)
+			errdetail_msg = _("User-defined types are not allowed.");
+		else if (check_functions_in_node(node, contain_mutable_or_user_functions_checker,
+		 (void *) pstate))
+			errdetail_msg = _("User-defined or built-in mutable functions are not allowed.");
+		else if (exprCollation(node) >= FirstNormalObjectId ||
+ exprInputCollation(node) >= FirstNormalObjectId)
+			errdetail_msg = _("User-defined collations are not allowed.");
+	}
 
+	/*
+	 * Finally, if we found a problem in this node, throw error now. Otherwise
+	 * keep going.
+	 */
 	if (errdetail_msg)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),

Re: Adding a clang-format file

2022-09-27 Thread Aleksander Alekseev

Hi Peter,

> I really don't see any real problem with making something available,
> without changing any official project guidelines. It's commonplace to
> provide a clang-format file these days.

Personally I don't have anything against the idea. TimescaleDB uses
clang-format to mimic pgindent and it works quite well. One problem
worth mentioning though is that the clang-format file is dependent on
the particular version of clang-format. TSDB requires version 7 or 8
and the reason why the project can't easily switch to a newer version
is that the format has changed. So perhaps a directory would be more
appropriate than a single file.

-- 
Best regards,
Aleksander Alekseev

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-27 Thread houzj.f...@fujitsu.com

On Tuesday, September 27, 2022 2:32 PM Kuroda, Hayato/黒田 隼人 

> 
> Dear Wang,
> 
> Followings are comments for your patchset.

Thanks for the comments.

> 
> 0001
> 
> 
> 01. launcher.c - logicalrep_worker_stop_internal()
> 
> ```
> +
> +   Assert(LWLockHeldByMe(LogicalRepWorkerLock));
> +
> ```

Changed.

> I think it should be Assert(LWLockHeldByMeInMode(LogicalRepWorkerLock,
> LW_SHARED)) because the lock is released one and acquired again as
> LW_SHARED.
> If newer function has been acquired lock as LW_EXCLUSIVE and call
> logicalrep_worker_stop_internal(),
> its lock may become weaker after calling it.
> 
> 02. launcher.c - apply_handle_stream_start()
> 
> ```
> +   /*
> +* Notify handle methods we're processing a remote
> in-progress
> +* transaction.
> +*/
> +   in_streamed_transaction = true;
> 
> -   MyLogicalRepWorker->stream_fileset = palloc(sizeof(FileSet));
> -   FileSetInit(MyLogicalRepWorker->stream_fileset);
> +   /*
> +* Start a transaction on stream start, this 
> transaction
> will be
> +* committed on the stream stop unless it is a
> tablesync worker in
> +* which case it will be committed after processing 
> all
> the
> +* messages. We need the transaction for handling the
> buffile,
> +* used for serializing the streaming data and subxact
> info.
> +*/
> +   begin_replication_step();
> ```
> 
> Previously in_streamed_transaction was set after the begin_replication_step(),
> but the ordering is modified. Maybe we don't have to modify it if there is no
> particular reason.
> 
> 03. launcher.c - apply_handle_stream_stop()
> 
> ```
> +   /* Commit the per-stream transaction */
> +   CommitTransactionCommand();
> +
> +   /* Reset per-stream context */
> +   MemoryContextReset(LogicalStreamingContext);
> +
> +   pgstat_report_activity(STATE_IDLE, NULL);
> +
> +   in_streamed_transaction = false;
> ```
> 
> Previously in_streamed_transaction was set after the MemoryContextReset(),
> but the ordering is modified.
> Maybe we don't have to modify it if there is no particular reason.

I adjusted the position of this due to some other improvements this time.

> 
> 04. applyparallelworker.c - LogicalParallelApplyLoop()
> 
> ```
> +   shmq_res = shm_mq_receive(mqh, , , false);
> ...
> +   if (ConfigReloadPending)
> +   {
> +   ConfigReloadPending = false;
> +   ProcessConfigFile(PGC_SIGHUP);
> +   }
> ```
> 
> 
> Here the parallel apply worker waits to receive messages and after dispatching
> it ProcessConfigFile() is called.
> It means that .conf will be not read until the parallel apply worker receives 
> new
> messages and apply them.
> 
> It may be problematic when users change log_min_message to debugXXX for
> debugging but the streamed transaction rarely come.
> They expected that detailed description appears on the log from next
> streaming chunk, but it does not.
> 
> This does not occur in leader worker when it waits messages from publisher,
> because it uses libpqrcv_receive(), which works asynchronously.
> 
> I 'm not sure whether it should be documented that the evaluation of GUCs may
> be delayed, how do you think?

I changed the shm_mq_receive to asynchronous mode which is also consistent with
what we did for Gather node when reading data from parallel query workers.

> 
> ===
> 0004
> 
> 05. logical-replication.sgml
> 
> ```
> ...
> In that case, it may be necessary to change the streaming mode to on or off 
> and
> cause the same conflicts again so the finish LSN of the failed transaction 
> will be
> written to the server log.
>  ...
> ```
> 
> Above sentence is added by 0001, but it is not modified by 0004.
> Such transactions will be retried as streaming=on mode, so some descriptions
> related with it should be added.

Added.

Best regards,
Hou zj

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-27 Thread houzj.f...@fujitsu.com

On Monday, September 26, 2022 6:58 PM Amit Kapila  
wrote:
> 
> On Mon, Sep 26, 2022 at 8:41 AM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Thur, Sep 22, 2022 at 18:12 PM Amit Kapila 
> wrote:
> >
> > > 3.
> > > ApplyWorkerMain()
> > > {
> > > ...
> > > ...
> > > +
> > > + if (server_version >= 16 &&
> > > + MySubscription->stream == SUBSTREAM_PARALLEL)
> > > + options.proto.logical.streaming = pstrdup("parallel");
> > >
> > > After deciding here whether the parallel streaming mode is enabled
> > > or not, we recheck the same thing in apply_handle_stream_abort() and
> > > parallel_apply_can_start(). In parallel_apply_can_start(), we do it
> > > via two different checks. How about storing this information say in
> > > structure MyLogicalRepWorker in ApplyWorkerMain() and then use it at
> > > other places?
> >
> > Improved as suggested.
> > Added a new flag "in_parallel_apply" to structure MyLogicalRepWorker.
> >
> 
> Can we name the variable in_parallel_apply as parallel_apply and set it in
> logicalrep_worker_launch() instead of in ParallelApplyWorkerMain()?

Changed.

> Few other comments:
> ==
> 1.
> + if (is_subworker &&
> + nparallelapplyworkers >= max_parallel_apply_workers_per_subscription)
> + {
> + LWLockRelease(LogicalRepWorkerLock);
> +
> + ereport(DEBUG1,
> + (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
> + errmsg("out of parallel apply workers"), errhint("You might need to
> + increase
> max_parallel_apply_workers_per_subscription.")));
> 
> I think it is better to keep the level of this as LOG. Similar messages at 
> other
> places use WARNING or LOG. Here, I prefer LOG because the system can still
> proceed without blocking anything.

Changed.

> 2.
> +/* Reset replication origin tracking. */ void
> +parallel_apply_replorigin_reset(void)
> +{
> + bool started_tx = false;
> +
> + /* This function might be called inside or outside of transaction. */
> + if (!IsTransactionState()) { StartTransactionCommand(); started_tx =
> + true; }
> 
> Why do we need a transaction in this function?

I think we don't need it and removed this in the new version patch.

> 3. Few suggestions to improve in the patch:
> diff --git a/src/backend/replication/logical/worker.c
> b/src/backend/replication/logical/worker.c
> index 1623c9e2fa..d9c519dfab 100644
> --- a/src/backend/replication/logical/worker.c
> +++ b/src/backend/replication/logical/worker.c
> @@ -1264,6 +1264,10 @@ apply_handle_stream_prepare(StringInfo s)
>   case TRANS_LEADER_SEND_TO_PARALLEL:
>   Assert(winfo);
> 
> + /*
> + * The origin can be active only in one process. See
> + * apply_handle_stream_commit.
> + */
>   parallel_apply_replorigin_reset();
> 
>   /* Send STREAM PREPARE message to the parallel apply worker. */ @@
> -1623,12 +1627,7 @@ apply_handle_stream_abort(StringInfo s)
>   (errcode(ERRCODE_PROTOCOL_VIOLATION),
>   errmsg_internal("STREAM ABORT message without STREAM STOP")));
> 
> - /*
> - * Check whether the publisher sends abort_lsn and abort_time.
> - *
> - * Note that the parallel apply worker is only started when the publisher
> - * sends abort_lsn and abort_time.
> - */
> + /* We receive abort information only when we can apply in parallel. */
>   if (MyLogicalRepWorker->in_parallel_apply)
>   read_abort_info = true;
> 
> @@ -1656,7 +1655,13 @@ apply_handle_stream_abort(StringInfo s)
>   Assert(winfo);
> 
>   if (subxid == xid)
> + {
> + /*
> + * The origin can be active only in one process. See
> + * apply_handle_stream_commit.
> + */
>   parallel_apply_replorigin_reset();
> + }
> 
>   /* Send STREAM ABORT message to the parallel apply worker. */
>   parallel_apply_send_data(winfo, s->len, s->data); @@ -1858,6 +1863,12 @@
> apply_handle_stream_commit(StringInfo s)
>   case TRANS_LEADER_SEND_TO_PARALLEL:
>   Assert(winfo);
> 
> + /*
> + * We need to reset the replication origin before sending the commit
> + * message and set it up again after confirming that parallel worker
> + * has processed the message. This is required because origin can be
> + * active only in one process at-a-time.
> + */
>   parallel_apply_replorigin_reset();
> 
>   /* Send STREAM COMMIT message to the parallel apply worker. */ diff --git
> a/src/include/replication/worker_internal.h
> b/src/include/replication/worker_internal.h
> index 4cbfb43492..2bd9664f86 100644
> --- a/src/include/replication/worker_internal.h
> +++ b/src/include/replication/worker_internal.h
> @@ -70,11 +70,7 @@ typedef struct LogicalRepWorker
>   */
>   pid_t apply_leader_pid;
> 
> - /*
> - * Indicates whether to use parallel apply workers.
> - *
> - * Determined based on streaming parameter and publisher version.
> - */
> + /* Indicates whether apply can be performed parallelly. */
>   bool in_parallel_apply;
> 

Merged, thanks.

Best regards,
Hou zj

Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2022-09-27 Thread Bharath Rupireddy

On Tue, Sep 27, 2022 at 3:01 PM Thomas Munro  wrote:
>
> Something like the attached.

Isn't it also better to add notes in win32pread.c and win32pwrite.c
about the callers doing lseek(SEEK_SET) if they wish to and Windows
implementations changing file position? We can also add a TODO item
about replacing pg_ versions with pread and friends someday when
Windows fixes the issue? Having it in the commit and include/port.h is
good, but the comments in win32pread.c and win32pwrite.c make life
easier IMO. Otherwise, the patch LGTM.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: has_privs_of_role vs. is_member_of_role, redux

2022-09-27 Thread Robert Haas

On Tue, Sep 27, 2022 at 2:05 AM Wolfgang Walther
 wrote:
> I'm just saying WITH SET FALSE should take away more of the things you
> can do (all the ownership things) to a point where it's safe to GRANT ..
> WITH INHERIT TRUE, SET FALSE and still be useful for pre-defined or
> privilege-container roles.

I don't see that as viable, either. It's too murky what you'd have to
take away to make it safe, and it sounds like stuff that naturally
falls under INHERIT rather than SET.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add common function ReplicationOriginName.

2022-09-27 Thread Aleksander Alekseev

Hi Peter,

> PSA patch v3 to combine the different replication origin name
> formatting in a single function ReplicationOriginNameForLogicalRep as
> suggested.

LGTM except for minor issues with the formatting; fixed.

> I expect you can find more like just this if you look harder than I did.

Thanks. You are right, there are more places that pass int as the
second argument of *nprintf(). I used a command:

$ grep -r nprintf ./ | perl -lne 'print if($_ !~
/nprintf\([^\,]+,\s*(sizeof|([0-9A-Z_ \-]+\,))/ )' > found.txt

... and then re-checked the results manually. This excludes patterns
like *nprintf(..., sizeof(...)) and *nprintf(..., MACRO) and leaves
only something like *nprintf(..., variable). The cases where we
subtract an integer from a Size, etc were ignored.

I don't have a strong opinion on whether we should be really worried
by this. But in case we do, here is the patch. The order of 0001 and
0002 doesn't matter.

As I understand, ecpg uses size_t rather than Size, so for this
library I used size_t. Not 100% sure if the changes I made to
src/backend/utils/adt/jsonb.c add much value. I leave this to the
committer to decide.

-- 
Best regards,
Aleksander Alekseev


v4-0002-Add-common-function-ReplicationOriginNameForLogic.patch
Description: Binary data


v4-0001-Pass-Size-size_t-as-a-2nd-argument-of-snprintf.patch
Description: Binary data

Re: Add more docs for pg_surgery?

2022-09-27 Thread Zhang Mingli


Regards,
Zhang Mingli
On Sep 27, 2022, 00:47 +0800, Ashutosh Sharma , wrote:
>
> And further it looks like you are doing an
> experiment on undamaged relation which is not recommended as
> documented.
Yeah.
>  If the relation would have been damaged, you probably may
> not be able to access it.
>
That make some sense.
> --
> With Regards,
> Ashutosh Sharma.

Re: kerberos/001_auth test fails on arm CPU darwin

2022-09-27 Thread Peter Eisentraut


On 27.09.22 03:37, Andres Freund wrote:

Maybe we should rely on PATH, rather than hardcoding OS dependent locations?
Or at least fall back to seach binaries in PATH? Seems pretty odd to hardcode
all these locations without a way to influence it from outside the test.


Homebrew intentionally does not install the krb5 and openldap packages 
into the path, because they conflict with macOS-provided software. 
However, those macOS-provided variants don't provide all the pieces we 
need for the tests.


Also, on Linux you need /usr/sbin, which is often not in the path.

So I think there is no good way around hardcoding a lot of these paths.

Fix some newly modified tab-complete changes

2022-09-27 Thread shiy.f...@fujitsu.com

Hi hackers,

I saw a problem when using tab-complete for "GRANT", "TABLES IN SCHEMA" should
be "ALL TABLES IN SCHEMA" in the following case.

postgres=# grant all on
ALL FUNCTIONS IN SCHEMA   DATABASE  FUNCTION  
PARAMETER SCHEMATABLESPACE
ALL PROCEDURES IN SCHEMA  DOMAINinformation_schema.   
PROCEDURE SEQUENCE  tbl
ALL ROUTINES IN SCHEMAFOREIGN DATA WRAPPER  LANGUAGE  
public.   TABLE TYPE
ALL SEQUENCES IN SCHEMA   FOREIGN SERVERLARGE OBJECT  
ROUTINE   TABLES IN SCHEMA

I found that it is related to the recent commit 790bf615dd, and maybe it's
better to fix it. I also noticed that some comments should be modified according
to this new syntax. Attach a patch to fix them.

Regards,
Shi yu


0001-Fix-some-newly-modified-tab-complete-changes-and-com.patch
Description:  0001-Fix-some-newly-modified-tab-complete-changes-and-com.patch

Re: A doubt about a newly added errdetail

2022-09-27 Thread Alvaro Herrera

On 2022-Sep-27, houzj.f...@fujitsu.com wrote:

> Thanks for reviewing!
> 
> Just in case I misunderstand, it seems you mean the message style[1] is OK, 
> right ?
> 
> [1]
> errdetail("Column list cannot be specified for a partitioned table when %s is 
> false.",
>"publish_via_partition_root")));

Yeah, since you're changing another word in that line, it's ok to move
the parameter line off-string.  (If you were only changing the parameter
to %s and there was no message duplication, I would reject the patch as
useless.)

I'm going over that patch now, I have a few other changes as attached,
intend to commit soon.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Ni aún el genio muy grande llegaría muy lejos
si tuviera que sacarlo todo de su propio interior" (Goethe)
diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index ba1afc21ac..35a93ad200 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -125,7 +125,8 @@ parse_publication_options(ParseState *pstate,
 			if (!SplitIdentifierString(publish, ',', _list))
 ereport(ERROR,
 		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("invalid list syntax for \"publish\" option")));
+		 errmsg("invalid list syntax in parameter \"%s\"",
+"publish")));
 
 			/* Process the option list. */
 			foreach(lc, publish_list)
@@ -143,7 +144,8 @@ parse_publication_options(ParseState *pstate,
 else
 	ereport(ERROR,
 			(errcode(ERRCODE_SYNTAX_ERROR),
-			 errmsg("unrecognized \"publish\" value: \"%s\"", publish_opt)));
+			 errmsg("unrecognized value for publication option \"%s\": \"%s\"",
+	"publish", publish_opt)));
 			}
 		}
 		else if (strcmp(defel->defname, "publish_via_partition_root") == 0)
@@ -444,14 +446,18 @@ contain_mutable_or_user_functions_checker(Oid func_id, void *context)
 }
 
 /*
- * Check if the node contains any unallowed object. See
+ * Check if the node contains any unallowed object. Subroutine for
  * check_simple_rowfilter_expr_walker.
  *
- * Returns the error detail message in errdetail_msg for unallowed expressions.
+ * If a disallowed object is found, *errdetail_msg is set to a (possibly
+ * translated) message to use as errdetail.  If none, *errdetail_msg is not
+ * modified.
  */
 static void
 expr_allowed_in_node(Node *node, ParseState *pstate, char **errdetail_msg)
 {
+	Assert(*errdetail_msg == NULL);
+
 	if (IsA(node, List))
 	{
 		/*
@@ -590,7 +596,7 @@ check_simple_rowfilter_expr_walker(Node *node, ParseState *pstate)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("invalid publication WHERE expression"),
- errdetail("%s", errdetail_msg),
+ errdetail_internal("%s", errdetail_msg),
  parser_errposition(pstate, exprLocation(node;
 
 	return expression_tree_walker(node, check_simple_rowfilter_expr_walker,
@@ -714,7 +720,7 @@ CheckPubRelationColumnList(List *tables, char *pubname, bool publish_schema,
 		/*
 		 * If the publication doesn't publish changes via the root partitioned
 		 * table, the partition's column list will be used. So disallow using
-		 * the column list on partitioned table in this case.
+		 * a column list on the partitioned table in this case.
 		 */
 		if (!pubviaroot &&
 			pri->relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
@@ -723,7 +729,7 @@ CheckPubRelationColumnList(List *tables, char *pubname, bool publish_schema,
 	 errmsg("cannot use column list for relation \"%s.%s\" in publication \"%s\"",
 			get_namespace_name(RelationGetNamespace(pri->relation)),
 			RelationGetRelationName(pri->relation), pubname),
-	 errdetail("Column list cannot be specified for a partitioned table when %s is false.",
+	 errdetail("Column lists cannot be specified for partitioned tables when %s is false.",
 			   "publish_via_partition_root")));
 	}
 }
@@ -1302,7 +1308,7 @@ AlterPublicationSchemas(AlterPublicationStmt *stmt,
 		errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		errmsg("cannot add schema to publication \"%s\"",
 			   stmt->pubname),
-		errdetail("Schema cannot be added if any table that specifies column list is already part of the publication."));
+		errdetail("Schemas cannot be added if any tables that specify a column list are already part of the publication."));
 
 			ReleaseSysCache(coltuple);
 		}

Extend win32 error codes to errno mapping in win32error.c

2022-09-27 Thread Bharath Rupireddy

Hi,

I recently faced an issue on windows where one of the tests was
failing with 'unrecognized win32 error code: 123', see [1]. I figured
out that it was due to a wrong file name being sent to open() system
call (this error is of my own making and I fixed it for that thread).
The failure occurs in dosmaperr() in win32error.c due to an unmapped
errno for win32 error code. The error code 123 i.e. ERROR_INVALID_NAME
says "The file name, directory name, or volume label syntax is
incorrect." [2], the closest errno mapping would be ENOENT. I quickly
looked around for the other win32 error codes [2] that don't have
mapping. I filtered out some common error codes such as
ERROR_OUTOFMEMORY, ERROR_HANDLE_DISK_FULL, ERROR_INSUFFICIENT_BUFFER,
ERROR_NOACCESS. There may be many more, but these seemed common IMO.

Having a right errno mapping always helps recognize the errors correctly.

I'm attaching a patch that maps the above win32 error codes to errno
in win32error.c. I also think that we can add a note in win32error.c
by mentioning the link [2] to revisit the mapping whenever
"unrecognized win32 error code:XXX" error occurs.

Thoughts?

Thanks Michael Paquier for off list chat.

[1] 
https://www.postgresql.org/message-id/CALj2ACWKvjOO-JzYpMBpk-o_o9CeKGEqMcS=yxf-pc6m+jo...@mail.gmail.com
[2] 
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/18d8fbe8-a967-4f1c-ae50-99ca8e491d2d

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 5e7ea6dc32e60d5a4b0868596eb63516c6ad21d7 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 27 Sep 2022 08:43:02 +
Subject: [PATCH v1] Extend win32 error codes to errno mapping in win32error.c

---
 src/port/win32error.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/src/port/win32error.c b/src/port/win32error.c
index fca867ba3d..44ede0cfaa 100644
--- a/src/port/win32error.c
+++ b/src/port/win32error.c
@@ -164,6 +164,21 @@ static const struct
 	},
 	{
 		ERROR_DELETE_PENDING, ENOENT
+	},
+	{
+		ERROR_INVALID_NAME, ENOENT
+	},
+	{
+		ERROR_OUTOFMEMORY, ENOMEM
+	},
+	{
+		ERROR_HANDLE_DISK_FULL, ENOSPC
+	},
+	{
+		ERROR_INSUFFICIENT_BUFFER, EINVAL
+	},
+	{
+		ERROR_NOACCESS, EACCES
 	}
 };
 
-- 
2.34.1

Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-27 Thread Bharath Rupireddy

On Tue, Sep 27, 2022 at 2:20 PM Kyotaro Horiguchi
 wrote:
>
> > -1 from me. We have the function context and the structure name there
> > to represent that the parameter name 'state' is actually 'backup
> > state'. I don't think we gain anything here. Whenever the BackupState
> > is used away from any function, the variable name backup_state is
> > used, static variable in xlogfuncs.c
>
> There's no shadowing caused by the change. If we mind the same
> variable names between files, we could rename backup_state in
> xlogfuncs.c to process_backup_state or session_backup_state.

-1.

> If this is still unacceptable, I propose to change the comment. (I
> found that the previous patch forgets about do_pg_backup_stop())
>
> - * It fills in backup_state with the information required for the backup,
> + * It fills in the parameter "state" with the information required for the 
> backup,

+1. There's another place that uses backup_state in the comments. I
modified that as well. Please see the attached patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From b66c67712f3499259427c3ea32d102ac802941ac Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 27 Sep 2022 09:36:32 +
Subject: [PATCH v1] Adjust BackupState comments to not use backup_state

---
 src/backend/access/transam/xlog.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1dd6df0fe1..675f851daa 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8256,9 +8256,10 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
  * symlinks while extracting files from tar. However for consistency and
  * platform-independence, we do it the same way everywhere.
  *
- * It fills in backup_state with the information required for the backup,
- * such as the minimum WAL location that must be present to restore from
- * this backup (starttli) and the corresponding timeline ID (starttli).
+ * It fills in the parameter "state" with the information required for the
+ * backup, such as the minimum WAL location that must be present to restore
+ * from this backup (starttli) and the corresponding timeline ID (starttli),
+ * etc.
  *
  * Every successfully started backup must be stopped by calling
  * do_pg_backup_stop() or do_pg_abort_backup(). There can be many
@@ -8574,8 +8575,9 @@ get_backup_status(void)
  * file (if required), resets sessionBackupState and so on.  It can optionally
  * wait for WAL segments to be archived.
  *
- * backup_state is filled with the information necessary to restore from this
- * backup with its stop LSN (stoppoint), its timeline ID (stoptli), etc.
+ * It fills in the parameter "state" with the information necessary to restore
+ * from this backup with its stop LSN (stoppoint), its timeline ID (stoptli),
+ * etc.
  *
  * It is the responsibility of the caller of this function to verify the
  * permissions of the calling user!
-- 
2.34.1

Re: Insertion Sort Improvements

2022-09-27 Thread Benjamin Coutu

Getting back to improvements for small sort runs, it might make sense to 
consider using in-register based sorting via sorting networks for very small 
runs.

This talk is specific to database sorting and illustrates how such an approach 
can be vectorized: 
https://youtu.be/HeFwVNHsDzc?list=PLSE8ODhjZXjasmrEd2_Yi1deeE360zv5O=1090

It looks like some of the commercial DBMSs do this very successfully. They use 
4 512bit registers (AVX-512) in this example, which could technically store 4 * 
4 sort-elements (given that the sorting key is 64 bit and the tuple pointer is 
64 bit). I wonder whether this could also be done with just SSE (instead of 
AVX), which the project now readily supports thanks to your recent efforts?

Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2022-09-27 Thread Thomas Munro

On Tue, Sep 27, 2022 at 6:43 PM Bharath Rupireddy
 wrote:
> On Tue, Sep 27, 2022 at 3:08 AM Thomas Munro  wrote:
> >
> > I don't think so, that's an extra kernel call.  I think I'll just have
> > to revert part of my recent change that removed the pg_ prefix from
> > those function names in our code, and restore the comment that warns
> > you about the portability hazard (I thought it went away with HP-UX
> > 10, where we were literally calling lseek() before every write()).
> > The majority of users of these functions don't intermix them with
> > calls to read()/write(), so they don't care about the file position,
> > so I think it's just something we'll have to continue to be mindful of
> > in the places that do.
>
> Yes, all of the existing pwrite() callers don't care about the file
> position, but the new callers such as the actual idea and patch
> proposed here in this thread cares.
>
> Is this the commit cf112c122060568aa06efe4e6e6fb9b2dd4f1090 part of
> which [1] you're planning to revert?

Yeah, just the renaming parts of that.  The lseek()-based emulation is
definitely not coming back.  Something like the attached.
From fc72be32668f0c37b899fd922d6986f39d8a6a2a Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 27 Sep 2022 21:12:03 +1300
Subject: [PATCH] Restore pg_pread and friends.

Commit cf112c12 was a little too hasty in getting rid of the pg_
prefixes where we use pread(), pwrite() and vectored variants.  We
dropped support for ancient Unixes that needed to use lseek() to
implement replacements for those, but it turned out that Windows also
changes the current position even when you pass in an offset to
ReadFile() and WriteFile(), if the file handle is synchronous.

Switching to asynchronous file handles would fix that, but have other
complications.  For now let's just put back the pg_ prefix and add some
comments to highlight the non-standard side-effect.

Reported-by: Bharath Rupireddy 
Discussion: https://postgr.es/m/20220923202439.GA1156054%40nathanxps13
---
 .../pg_stat_statements/pg_stat_statements.c   |  4 +-
 src/backend/access/heap/rewriteheap.c |  2 +-
 src/backend/access/transam/slru.c |  4 +-
 src/backend/access/transam/xlog.c |  4 +-
 src/backend/access/transam/xlogreader.c   |  2 +-
 src/backend/access/transam/xlogrecovery.c |  2 +-
 src/backend/backup/basebackup.c   |  2 +-
 src/backend/replication/walreceiver.c |  2 +-
 src/backend/storage/file/fd.c |  8 +--
 src/backend/utils/init/miscinit.c |  2 +-
 src/bin/pg_test_fsync/pg_test_fsync.c | 50 +--
 src/include/access/xlogreader.h   |  4 +-
 src/include/port.h|  9 
 src/include/port/pg_iovec.h   | 13 -
 src/include/port/win32_port.h |  4 +-
 src/port/preadv.c |  4 +-
 src/port/pwritev.c|  4 +-
 src/port/win32pread.c |  2 +-
 src/port/win32pwrite.c|  2 +-
 19 files changed, 71 insertions(+), 53 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index ba868f0de9..73439c0199 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2103,9 +2103,9 @@ qtext_store(const char *query, int query_len,
if (fd < 0)
goto error;
 
-   if (pwrite(fd, query, query_len, off) != query_len)
+   if (pg_pwrite(fd, query, query_len, off) != query_len)
goto error;
-   if (pwrite(fd, "\0", 1, off + query_len) != 1)
+   if (pg_pwrite(fd, "\0", 1, off + query_len) != 1)
goto error;
 
CloseTransientFile(fd);
diff --git a/src/backend/access/heap/rewriteheap.c 
b/src/backend/access/heap/rewriteheap.c
index 2f08fbe8d3..b01b39b008 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -1150,7 +1150,7 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
/* write out tail end of mapping file (again) */
errno = 0;
pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE);
-   if (pwrite(fd, data, len, xlrec->offset) != len)
+   if (pg_pwrite(fd, data, len, xlrec->offset) != len)
{
/* if write didn't set errno, assume problem is no disk space */
if (errno == 0)
diff --git a/src/backend/access/transam/slru.c 
b/src/backend/access/transam/slru.c
index c9a7b97949..b65cb49d7f 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -718,7 +718,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 
errno = 0;
pgstat_report_wait_start(WAIT_EVENT_SLRU_READ);
-   if (pread(fd, shared->page_buffer[slotno], BLCKSZ, offset) != BLCKSZ)
+   if (pg_pread(fd,

RE: A doubt about a newly added errdetail

2022-09-27 Thread houzj.f...@fujitsu.com

On Tuesday, September 27, 2022 3:21 PM Alvaro Herrera  
wrote:
> 
> On 2022-Sep-27, Kyotaro Horiguchi wrote:
> 
> > By the way, this is not an issue caused by the proposed patch, I see
> > the following message in the patch.
> >
> > -errdetail("Column list cannot be used
> for a partitioned table when %s is false.",
> > +errdetail("Column list cannot be
> specified for a partitioned
> > +table when %s is false.",
> >
> "publish_via_partition_root")));
> >
> > I think that the purpose of such separation of variable names is to
> > unify multiple messages differing only by the names (to keep
> > translation labor (relatively:p) low). In that sense, that separation
> > here is useless since I see no chance of having the same message with
> > another variable in future.
> 
> Well, it also reduces chances for typos and such, so while it's not strictly
> necessary to do it this way, I tend to prefer it on new messages.  However, as
> you say it's not very interesting when there's no possibility of duplication, 
> so
> changing existing messages to this style when we have no other reason to
> change the message, is not a useful use of time.  In this case we're changing
> the message in another way too, so I think it's okay.

Thanks for reviewing!

Just in case I misunderstand, it seems you mean the message style[1] is OK, 
right ?

[1]
errdetail("Column list cannot be specified for a partitioned table when %s is 
false.",
 "publish_via_partition_root")));

Best regards,
Hou zj

Re: Fast COPY FROM based on batch insert

2022-09-27 Thread Etsuro Fujita

On Wed, Aug 10, 2022 at 5:30 PM Etsuro Fujita  wrote:
> On Wed, Aug 10, 2022 at 1:06 AM Zhihong Yu  wrote:
> > +   /* If any rows were inserted, run AFTER ROW INSERT triggers. */
> > ...
> > +   for (i = 0; i < inserted; i++)
> > +   {
> > +   TupleTableSlot *slot = rslots[i];
> > ...
> > +   slot->tts_tableOid =
> > +   RelationGetRelid(resultRelInfo->ri_RelationDesc);
> >
> > It seems the return value of 
> > `RelationGetRelid(resultRelInfo->ri_RelationDesc)` can be stored in a 
> > variable outside the for loop.
> > Inside the for loop, assign this variable to slot->tts_tableOid.
>
> Actually, I did this to match the code in ExecBatchInsert(), but that
> seems like a good idea, so I’ll update the patch as such in the next
> version.

Done.  I also adjusted the code in CopyMultiInsertBufferFlush() a bit
further.  No functional changes.  I put back in the original position
an assertion ensuring the FDW supports batching.  Sorry for the back
and forth.  Attached is an updated version of the patch.

Other changes are:

* The previous patch modified postgres_fdw.sql so that the existing
test cases for COPY FROM were tested in batch-insert mode.  But I
think we should keep them as-is to test the default behavior, so I
added test cases for this feature by copying-and-pasting some of the
existing test cases.  Also, the previous patch added this:

+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+   server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+   server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+   begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+   for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.

Rather than doing so, I think it would be better to use a partitioned
table defined in the above section “test tuple routing for
foreign-table partitions”, to save cycles.  So I changed this as such.

* I modified comments a bit further and updated docs.

That is it.  I will review the patch a bit more, but I feel that it is
in good shape.

Best regards,
Etsuro Fujita

v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-3.patch
Description: Binary data

Re: [PATCH] Log details for client certificate failures

2022-09-27 Thread Masahiko Sawada

Hi,

On Tue, Sep 13, 2022 at 11:11 PM Peter Eisentraut
 wrote:
>
> On 09.09.22 00:32, Jacob Champion wrote:
> > On Thu, Jul 28, 2022 at 9:19 AM Jacob Champion  
> > wrote:
> >> On Thu, Jul 21, 2022 at 4:29 PM Jacob Champion  
> >> wrote:
> >>> v4 attempts to fix this by letting the check hooks pass
> >>> MCXT_ALLOC_NO_OOM to pg_clean_ascii(). (It's ignored in the frontend,
> >>> which just mallocs.)
> >>
> >> Ping -- should I add an open item somewhere so this isn't lost?
> >
> > Trying again. Peter, is this approach acceptable? Should I try something 
> > else?
>
> This looks fine to me.  Committed.

While looking at the recent changes for check_cluster_name() I found
this thread. Regarding the following change made by the commit
45b1a67a0fc, there is possibly small memory leak:

 static bool
 check_cluster_name(char **newval, void **extra, GucSource source)
 {
+   char   *clean;
+
/* Only allow clean ASCII chars in the cluster name */
-   pg_clean_ascii(*newval);
+   clean = pg_clean_ascii(*newval, MCXT_ALLOC_NO_OOM);
+   if (!clean)
+   return false;
+
+   clean = guc_strdup(WARNING, clean);
+   if (!clean)
+   return false;

+   *newval = clean;
return true;
 }

pg_clean_ascii() does palloc_extended() to allocate memory in
Postmaster context for the new characters and the clean is then
replaced with the new memory allocated by guc_strdup(). No-one
references the memory allocated by pg_clean_ascii() and it lasts for
postmaster lifetime. Valgrind memcheck also shows:

 1 bytes in 1 blocks are definitely lost in loss record 4 of 70
at 0xCD2A16: palloc_extended (mcxt.c:1239)
by 0xD09437: pg_clean_ascii (string.c:99)
by 0x7A5CF3: check_cluster_name (variable.c:1061)
by 0xCAF7CD: call_string_check_hook (guc.c:6365)
by 0xCAA724: InitializeOneGUCOption (guc.c:1439)
by 0xCAA0ED: InitializeGUCOptions (guc.c:1268)
by 0x99B245: PostmasterMain (postmaster.c:691)
by 0x858896: main (main.c:197)

I think we can fix it by the attached patch but I'd like to discuss
whether it's worth fixing it.

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index c795cb7a29..e555fb3150 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -1025,17 +1025,22 @@ bool
 check_application_name(char **newval, void **extra, GucSource source)
 {
 	char	   *clean;
+	char	   *ret;
 
 	/* Only allow clean ASCII chars in the application name */
 	clean = pg_clean_ascii(*newval, MCXT_ALLOC_NO_OOM);
 	if (!clean)
 		return false;
 
-	clean = guc_strdup(WARNING, clean);
-	if (!clean)
+	ret = guc_strdup(WARNING, clean);
+	if (!ret)
+	{
+		pfree(clean);
 		return false;
+	}
 
-	*newval = clean;
+	pfree(clean);
+	*newval = ret;
 	return true;
 }
 
@@ -1056,17 +1061,22 @@ bool
 check_cluster_name(char **newval, void **extra, GucSource source)
 {
 	char	   *clean;
+	char	   *ret;
 
 	/* Only allow clean ASCII chars in the cluster name */
 	clean = pg_clean_ascii(*newval, MCXT_ALLOC_NO_OOM);
 	if (!clean)
 		return false;
 
-	clean = guc_strdup(WARNING, clean);
-	if (!clean)
+	ret = guc_strdup(WARNING, clean);
+	if (!ret)
+	{
+		pfree(clean);
 		return false;
+	}
 
-	*newval = clean;
+	pfree(clean);
+	*newval = ret;
 	return true;
 }

Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-27 Thread Kyotaro Horiguchi

At Tue, 27 Sep 2022 14:03:24 +0530, Bharath Rupireddy 
 wrote in 
> On Tue, Sep 27, 2022 at 1:54 PM Kyotaro Horiguchi
>  wrote:
> > What do you think about this?
> 
> -1 from me. We have the function context and the structure name there
> to represent that the parameter name 'state' is actually 'backup
> state'. I don't think we gain anything here. Whenever the BackupState
> is used away from any function, the variable name backup_state is
> used, static variable in xlogfuncs.c

There's no shadowing caused by the change. If we mind the same
variable names between files, we could rename backup_state in
xlogfuncs.c to process_backup_state or session_backup_state.

If this is still unacceptable, I propose to change the comment. (I
found that the previous patch forgets about do_pg_backup_stop())

- * It fills in backup_state with the information required for the backup,
+ * It fills in the parameter "state" with the information required for the 
backup,

(This is following the notation just above)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Differentiate MERGE queries with different structures

2022-09-27 Thread Alvaro Herrera

Pushed, thanks.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
 It does it in a really, really complicated way
 why does it need to be complicated?
 Because it's MakeMaker.

Re: Fast COPY FROM based on batch insert

2022-09-27 Thread Etsuro Fujita

On Tue, Aug 23, 2022 at 2:58 PM Andrey Lepikhov
 wrote:
> On 22/8/2022 11:44, Etsuro Fujita wrote:
> > I think the latter is more consistent with the existing error context
> > information when in CopyMultiInsertBufferFlush().  Actually, I thought
> > this too, and I think this would be useful when the COPY FROM command
> > is executed on a foreign table.  My concern, however, is the case when
> > the command is executed on a partitioned table containing foreign
> > partitions; in that case the input data would not always be sorted in
> > the partition order, so the range for an error-occurring foreign
> > partition might contain many lines with rows from other partitions,
> > which I think makes the range information less useful.  Maybe I'm too
> > worried about that, though.

> I got your point. Indeed, perharps such info doesn't really needed to be
> included into the core, at least for now.

Ok.  Sorry for the late response.

Best regards,
Etsuro Fujita

Re: Data is copied twice when specifying both child and parent table in publication

2022-09-27 Thread Peter Smith

Here are my review comments for the HEAD_v11-0001 patch:

==

1. General - Another related bug?

In [1] Hou-san wrote:

For another case you mentioned (via_root used when publishing child)
CREATE PUBLICATION pub1 for TABLE parent;
CREATE PUBLICATION pub2 for TABLE child with (publish_via_partition_root);
CREATE SUBSCRIPTION sub connect xxx PUBLICATION pub1,pub2;

The expected behavior is only the child table is published, all the changes
should be replicated using the child table's identity. We should do table sync
only for child tables and is same as the current behavior on HEAD. So, I think
there is no bug in this case.

~

That behaviour seems different to my understanding because the pgdocs
says when the via_root param is true the 'child' table would be using
the 'parent' identity:

[2] publish_via_partition_root determines whether changes in a
partitioned table (or on its partitions) contained in the publication
will be published using the identity and schema of the partitioned
table rather than that of the individual partitions that are actually
changed.

~

So is this another bug (slightly different from the current one being
patched), or is it just some different special behaviour? If it's
another bug then you need to know that ASAP because I think you may
want to fix both of them at the same time which might impact how this
2x data copy patch should be implemented.

Or perhaps just the pgdocs need more notes about special
cases/combinations like this?

==

2. General - documentation?

For this current patch, IIUC it was decided that it is a bug because
the data gets duplicated, and then some sensible rule was decided that
this patch should use to address it (e.g. publishing a child combined
with publishing a parent via_root will just ignore the child's
publication...).

So my question is - is this (new/fixed) behaviour something that a
user will be able to figure out themselves from the existing
documentation, or does this patch now need its own special notes in
the documentation?

==

3. src/backend/catalog/pg_publication.c - pg_get_publication_tables

+ foreach(lc, pub_elem_tables)
+ {
+ Oid *result = (Oid *) malloc(sizeof(Oid) * 2);
+
+ result[0] = lfirst_oid(lc);
+ result[1] = pub_elem->oid;
+ table_infos = lappend(table_infos, result);
+ }

3a.
It looks like each element in the table_infos list is a malloced obj
of 2x Oids (Oid of table, Oid of pub). IMO better to call this element
'table_info' instead of the meaningless 'result'

~

3b.
Actually, I think it would be better if this function defines a little
2-element structure {Oid relid, Oid pubid} to use instead of this
array (which requires knowledge that [0] means relid and [1] means
pubid).

~~~

4.

+ foreach(lc, table_infos)
+ {
+ Oid *table_info_tmp = (Oid *) lfirst(lc);
+
+ if (!list_member_oid(tables, table_info_tmp[0]))
+ table_infos = foreach_delete_current(table_infos, lc);
  }
I think the '_tmp' suffix is not helpful here - IMO having another
relid variable would make this more self-explanatory.

Or better yet adopt the suggestion o f #3b and have a little struct
with self-explanatory member names.

=

5. src/backend/commands/subscriptioncmds.c - fetch_table_list

+ if (server_version >= 16)
+ appendStringInfo(, "SELECT DISTINCT N.nspname, C.relname,\n"

Since there is an else statement block, I think this would be more
readable if there was a statement block here too. YMMV

SUGGESTION
if (server_version >= 16)
{
appendStringInfo(, "SELECT DISTINCT N.nspname, C.relname,\n"
...
}

~~~

6.

+ /*
+ * Get the list of tables from publisher, the partition table whose
+ * ancestor is also in this list will be ignored, otherwise the initial
+ * data in the partition table would be replicated twice.
+ */

6a.
"from publisher, the partition" -> "from the publisher. The partition"

~

6b.
This looks like a common comment that also applied to the "if" part,
so it seems more appropriate to move it to where I indicated below.
Perhaps the whole comment needs a bit of massaging after you move
it...

+ /*
+ * Get namespace, relname and column list (if supported) of the tables
+ * belonging to the specified publications.
+ *
+ * HERE <
+ *
+ * From version 16, the parameter of the function pg_get_publication_tables
+ * can be an array of publications. The partition table whose ancestor is
+ * also published in this publication array will be filtered out in this
+ * function.
+ */


--
[1] 
https://www.postgresql.org/message-id/OS0PR01MB5716A30DDEECC59132E1084F94799%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[2] https://www.postgresql.org/docs/devel/sql-createpublication.html

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-27 Thread Bharath Rupireddy

On Tue, Sep 27, 2022 at 1:54 PM Kyotaro Horiguchi
 wrote:
>
> This commit introduced BackupState struct. The comment of
> do_pg_backup_start says that:
>
> > * It fills in backup_state with the information required for the backup,
>
> And the parameters are:
>
> > do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
> >  BackupState *state, StringInfo 
> > tblspcmapfile)
>
> So backup_state is different from both the type BackupState and the
> parameter state.  I find it annoying.  Don't we either rename the
> parameter or fix the comment?
>
> The parameter "state" sounds a bit too generic. So I prefer to rename
> the parameter to backup_state, as the attached.
>
> What do you think about this?

-1 from me. We have the function context and the structure name there
to represent that the parameter name 'state' is actually 'backup
state'. I don't think we gain anything here. Whenever the BackupState
is used away from any function, the variable name backup_state is
used, static variable in xlogfuncs.c

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Avoid memory leaks during base backups

2022-09-27 Thread Kyotaro Horiguchi

At Tue, 27 Sep 2022 11:33:56 +0530, Bharath Rupireddy 
 wrote in 
> On Mon, Sep 26, 2022 at 7:34 PM Tom Lane  wrote:
> > This ... seems like inventing your own shape of wheel.  The
> > normal mechanism for preventing this type of leak is to put the
> > allocations in a memory context that can be reset or deallocated
> > in mainline code at the end of the operation.
> 
> Yes, that's the typical way and the patch attached does it for
> perform_base_backup(). What happens if we allocate some memory in the
> new memory context and error-out before reaching the end of operation?
> How do we deallocate such memory?

Whoever directly or indirectly catches the exception can do that.  For
example, SendBaseBackup() seems to catch execptions from
perform_base_backup(). bbsinc_cleanup() is already resides there.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-27 Thread Kyotaro Horiguchi

This commit introduced BackupState struct. The comment of
do_pg_backup_start says that:

> * It fills in backup_state with the information required for the backup,

And the parameters are:

> do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
>  BackupState *state, StringInfo tblspcmapfile)

So backup_state is different from both the type BackupState and the
parameter state.  I find it annoying.  Don't we either rename the
parameter or fix the comment?

The parameter "state" sounds a bit too generic. So I prefer to rename
the parameter to backup_state, as the attached.

What do you think about this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 1dd6df0fe1..715d5868eb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8244,10 +8244,10 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID 
tli)
  * function. It creates the necessary starting checkpoint and constructs the
  * backup state and tablespace map.
  *
- * Input parameters are "state" (the backup state), "fast" (if true, we do
- * the checkpoint in immediate mode to make it faster), and "tablespaces"
- * (if non-NULL, indicates a list of tablespaceinfo structs describing the
- * cluster's tablespaces.).
+ * Input parameters are "backup_state", "fast" (if true, we do the checkpoint
+ * in immediate mode to make it faster), and "tablespaces" (if non-NULL,
+ * indicates a list of tablespaceinfo structs describing the cluster's
+ * tablespaces.).
  *
  * The tablespace map contents are appended to passed-in parameter
  * tablespace_map and the caller is responsible for including it in the backup
@@ -8269,11 +8269,11 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID 
tli)
  */
 void
 do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
-  BackupState *state, StringInfo tblspcmapfile)
+  BackupState *backup_state, StringInfo 
tblspcmapfile)
 {
boolbackup_started_in_recovery = false;
 
-   Assert(state != NULL);
+   Assert(backup_state != NULL);
backup_started_in_recovery = RecoveryInProgress();
 
/*
@@ -8292,7 +8292,7 @@ do_pg_backup_start(const char *backupidstr, bool fast, 
List **tablespaces,
 errmsg("backup label too long (max %d bytes)",
MAXPGPATH)));
 
-   memcpy(state->name, backupidstr, strlen(backupidstr));
+   memcpy(backup_state->name, backupidstr, strlen(backupidstr));
 
/*
 * Mark backup active in shared memory.  We must do full-page WAL writes
@@ -8385,9 +8385,9 @@ do_pg_backup_start(const char *backupidstr, bool fast, 
List **tablespaces,
 * pointer.
 */
LWLockAcquire(ControlFileLock, LW_SHARED);
-   state->checkpointloc = ControlFile->checkPoint;
-   state->startpoint = ControlFile->checkPointCopy.redo;
-   state->starttli = 
ControlFile->checkPointCopy.ThisTimeLineID;
+   backup_state->checkpointloc = ControlFile->checkPoint;
+   backup_state->startpoint = 
ControlFile->checkPointCopy.redo;
+   backup_state->starttli = 
ControlFile->checkPointCopy.ThisTimeLineID;
checkpointfpw = 
ControlFile->checkPointCopy.fullPageWrites;
LWLockRelease(ControlFileLock);
 
@@ -8404,7 +8404,7 @@ do_pg_backup_start(const char *backupidstr, bool fast, 
List **tablespaces,
recptr = XLogCtl->lastFpwDisableRecPtr;
SpinLockRelease(>info_lck);
 
-   if (!checkpointfpw || state->startpoint <= 
recptr)
+   if (!checkpointfpw || backup_state->startpoint 
<= recptr)
ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("WAL generated 
with full_page_writes=off was replayed "
@@ -8436,9 +8436,9 @@ do_pg_backup_start(const char *backupidstr, bool fast, 
List **tablespaces,
 * either because only few buffers have been dirtied 
yet.
 */
WALInsertLockAcquireExclusive();
-   if (XLogCtl->Insert.lastBackupStart < state->startpoint)
+   if (XLogCtl->Insert.lastBackupStart < 
backup_state->startpoint)
{
-   XLogCtl->Insert.lastBackupStart = 
state->startpoint;
+   XLogCtl->Insert.lastBackupStart = 
backup_state->startpoint;

Re: Allow foreign keys to reference a superset of unique columns

2022-09-27 Thread David Rowley

On Tue, 27 Sept 2022 at 06:08, James Coleman  wrote:
>
> On Mon, Sep 26, 2022 at 9:59 AM Wolfgang Walther
>  wrote:
> > So no need for me to distract this thread from $subject anymore. I think
> > the idea of allowing to create unique constraints on a superset of the
> > columns of an already existing unique index is a good one, so let's
> > discuss this further.
>
> Sounds good to me!

I don't see any immediate problems with allowing UNIQUE constraints to
be supported using a unique index which contains only a subset of
columns that are mentioned in the constraint.  There would be a few
things to think about.  e.g INSERT ON CONFLICT might need some
attention as a unique constraint can be specified for use as the
arbiter.

Perhaps the patch could be broken down as follows:

0001:

* Extend ALTER TABLE ADD CONSTRAINT UNIQUE syntax to allow a column
list when specifying USING INDEX.
* Add checks to ensure the index in USING INDEX contains only columns
mentioned in the column list.
* Do any required work for INSERT ON CONFLICT. I've not looked at the
code but maybe some adjustments are required for where it gets the
list of columns.
* Address any other places that assume the supporting index contains
all columns of the unique constraint.

0002:

* Adjust transformFkeyCheckAttrs() to have it look at UNIQUE
constraints as well as unique indexes
* Ensure information_schema.referential_constraints view still works correctly.

I think that would address all of Tom's concerns he mentioned in [1].
I wasn't quite sure I understood the NOT NULL concern there since
going by RI_FKey_pk_upd_check_required(), we don't enforce FKs when
the referenced table has a NULL in the FK's columns.

David

[1] https://www.postgresql.org/message-id/3057718.1658949...@sss.pgh.pa.us

Re: Add last_vacuum_index_scans in pg_stat_all_tables

2022-09-27 Thread Alvaro Herrera

On 2022-Sep-16, Fujii Masao wrote:

> Could you tell me why the number of index scans should be tracked for
> each table? Instead, isn't it enough to have one global counter, to
> check whether the current setting of maintenance_work_mem is sufficient
> or not? That is, I'm thinking to have something like pg_stat_vacuum view
> that reports, for example, the number of vacuum runs, the total
> number of index scans, the maximum number of index scans by one
> vacuum run, the number of cancellation of vacuum because of
> lock conflicts, etc. If so, when these global counters are high or
> increasing, we can think that it may worth tuning maintenance_work_mem.

I think that there are going to be cases where some tables in a database
definitely require multiple index scans no matter what; but you
definitely want to know how many occurred for others, not so highly
trafficked tables.  So I *think* a single counter across the whole
database might not be sufficient.

The way I imagine using this (and I haven't operated databases in quite
a while so this may be all wet) is that I would have a report of which
tables have the highest numbers of indexscans, then study the detailed
vacuum reports for those tables as a way to change autovacuum_work_mem.

On the other hand, we have an absolute high cap of 1 GB for autovacuum's
work_mem, and many systems are already using that as the configured
value.  Maybe trying to fine-tune it is a waste of time.  If a 1TB table
says that it had 4 index scans, what are you going to do about it?  It's
a lost cause.  It sounds like we need more code changes so that more
memory can be used; and also changes so that that memory is used more
efficiently.  We had a patch for this, I don't know if that was
committed already.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2022-09-27 Thread Maxim Orlov

On Thu, 22 Sept 2022 at 18:13, Andres Freund  wrote:

> Due to the merge of the meson based build this patch needs to be
> adjusted: https://cirrus-ci.com/build/6350479973154816
>
> Looks like you need to add amcheck--1.3--1.4.sql to the list of files to be
> installed and t/004_verify_nbtree_unique.pl to the tests.
>
> Greetings,
>
> Andres Freund
>

Thanks! Fixed.

-- 
Best regards,
Maxim Orlov.


v17-0001-Add-option-for-amcheck-and-pg_amcheck-to-check-u.patch
Description: Binary data

Re: pg_rewind WAL segments deletion pitfall

2022-09-27 Thread Kyotaro Horiguchi

At Thu, 1 Sep 2022 13:33:09 +0200, Polina Bungina  wrote in 
> Here is the new version of the patch that includes the changes you
> suggested. It is smaller now but I doubt if it is as easy to understand as
> it used to be.

pg_rewind works in two steps. First it constructs file map which
decides the action for each file, then second, it performs file
operations according to the file map. So, if we are going to do
something on some files, that action should be record that in the file
map, I think.

Regarding the the patch, pg_rewind starts reading segments from the
divergence point back to the nearest checkpoint, then moves foward
during rewinding. So, the fact that SimpleXLogPageRead have read a
segment suggests that the segment is required during the next startup.
So I don't think we need to move around the keepWalSeg flag.  All
files that are wanted while rewinding should be preserved
unconditionally.

It's annoying that the file path for file map and open(2) have
different top directory. But sharing the same path string between the
two seems rather ugly..

I feel uncomfortable to directly touch the internal of file_entry_t
outside filemap.c. I'd like to hide the internals in filemap.c, but
pg_rewind already does that..

+   /*
+* Some entries (WAL segments) already have an action assigned
+* (see SimpleXLogPageRead()).
+*/
+   if (entry->action == FILE_ACTION_NONE)
+   continue;
entry->action = decide_file_action(entry);

It might be more reasonable to call decide_file_action() when action
is UNDECIDED.

> The need of manipulations with the target’s pg_wal/archive_status directory
> is a question to discuss…
>
> At first glance it seems to be useless for .ready files: checkpointer
> process will anyway recreate them if archiving is enabled on the rewound
> old primary and we will finally have them in the archive. As for the .done
> files, it seems reasonable to follow the pg_basebackup logic and keep .done
> files together with the corresponding segments (those between the last
> common checkpoint and the point of divergence) to protect them from being
> archived once again.
> 
> But on the other hand it seems to be not that straightforward: imaging we
> have WAL segment X on the target along with X.done file and we decide to
> preserve them both (or we download it from archive and force .done file
> creation), while archive_mode was set to ‘always’ and the source (promoted
> replica) also still has WAL segment X and X.ready file. After pg_rewind we
> will end up with both X.ready and X.done, which seems to be not a good
> situation (but most likely not critical either).

Thanks for the thought. Yes, it's not so straight-forward. And, as you
mentioned, the worst result comes from not doing that is that some
already-archived segments are archived at next run, which is generally
harmless. So I think we're ok to ignore that in this patdh then create
other patch if we still want to do that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [RFC] building postgres with meson - v13

2022-09-27 Thread John Naylor

On Tue, Sep 27, 2022 at 2:06 AM Andres Freund  wrote:
>
> On 2022-09-26 15:18:29 +0700, John Naylor wrote:
> > Either way it doesn't exist on this machine. I was able to get a working
> > build with
> >
> > /usr/local/Homebrew/Library/Homebrew/os/mac/pkgconfig
>
> Hm - what did you need this path for - I don't think that should be
needed.

I just cargo-culted the pattern from Arm (before I figured out it was Arm)
and used the "find" command to look for the directories by name. I tried
again without specifying any of the three directory flags, and I can run
the tests getting:

Ok: 233
Expected Fail:  0
Fail:   0
Unexpected Pass:0
Skipped:2
Timeout:0

...which is fine for me since I don't do much development on MacOS nowadays.

> > 1) /opt/homebrew/ seems to be an "Apple silicon" path?
>
> Yea, it's /usr/local on x86-64, based on what was required to make macos
CI
> work. I updated the wiki page, half-blindly - it'd be nice if you could
> confirm that that works?

Not sure if you intended for me to try the full script in your last
response or just what's in the wiki page, but for the latter (on commit
bed0927aeb0c6), it fails at

[1656/2199] Linking target src/bin/psql/psql
FAILED: src/bin/psql/psql
clang  -o src/bin/psql/psql
src/bin/psql/psql.p/meson-generated_.._psqlscanslash.c.o
src/bin/psql/psql.p/meson-generated_.._sql_help.c.o
src/bin/psql/psql.p/command.c.o src/bin/psql/psql.p/common.c.o
src/bin/psql/psql.p/copy.c.o src/bin/psql/psql.p/crosstabview.c.o
src/bin/psql/psql.p/describe.c.o src/bin/psql/psql.p/help.c.o
src/bin/psql/psql.p/input.c.o src/bin/psql/psql.p/large_obj.c.o
src/bin/psql/psql.p/mainloop.c.o src/bin/psql/psql.p/prompt.c.o
src/bin/psql/psql.p/startup.c.o src/bin/psql/psql.p/stringutils.c.o
src/bin/psql/psql.p/tab-complete.c.o src/bin/psql/psql.p/variables.c.o
-L/usr/local/opt/readline/lib -L/usr/local/opt/gettext/lib
-L/usr/local/opt/zlib/lib -L/usr/local/opt/openssl/lib
-I/usr/local/opt/readline/include -I/usr/local/opt/gettext/include
-I/usr/local/opt/zlib/include -I/usr/local/opt/openssl/include
-Wl,-dead_strip_dylibs -Wl,-headerpad_max_install_names
-Wl,-undefined,error -isysroot
/Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk
-Wl,-rpath,@loader_path/../../interfaces/libpq -Wl,-rpath,/usr/local/lib
-Wl,-rpath,/usr/local/Cellar/zstd/1.5.2/lib src/fe_utils/libpgfeutils.a
src/common/libpgcommon.a src/common/libpgcommon_ryu.a
src/common/libpgcommon_config_info.a src/port/libpgport.a
src/port/libpgport_crc.a src/interfaces/libpq/libpq.5.dylib -lm
/usr/local/lib/libintl.dylib -ledit -lz
/usr/local/Cellar/zstd/1.5.2/lib/libzstd.dylib -lz -lz -lz
Undefined symbols for architecture x86_64:
  "_rl_completion_suppress_quote", referenced from:
  _psql_completion in tab-complete.c.o
  _quote_file_name in tab-complete.c.o
  _complete_from_files in tab-complete.c.o
  "_rl_filename_dequoting_function", referenced from:
  _initialize_readline in tab-complete.c.o
  "_rl_filename_quote_characters", referenced from:
  _initialize_readline in tab-complete.c.o
  "_rl_filename_quoting_function", referenced from:
  _initialize_readline in tab-complete.c.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see
invocation)

--
John Naylor
EDB: http://www.enterprisedb.com

Re: A doubt about a newly added errdetail

2022-09-27 Thread Alvaro Herrera

On 2022-Sep-27, Kyotaro Horiguchi wrote:

> By the way, this is not an issue caused by the proposed patch, I see
> the following message in the patch.
> 
> -  errdetail("Column list cannot be used 
> for a partitioned table when %s is false.",
> +  errdetail("Column list cannot be 
> specified for a partitioned table when %s is false.",
>  
> "publish_via_partition_root")));
> 
> I think that the purpose of such separation of variable names is to
> unify multiple messages differing only by the names (to keep
> translation labor (relatively:p) low). In that sense, that separation
> here is useless since I see no chance of having the same message with
> another variable in future.

Well, it also reduces chances for typos and such, so while it's not
strictly necessary to do it this way, I tend to prefer it on new
messages.  However, as you say it's not very interesting when there's no
possibility of duplication, so changing existing messages to this style
when we have no other reason to change the message, is not a useful use
of time.  In this case we're changing the message in another way too, so
I think it's okay.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)

Re: DROP OWNED BY is broken on master branch.

2022-09-27 Thread Rushabh Lathia

On Mon, Sep 26, 2022 at 11:46 PM Robert Haas  wrote:

> On Mon, Sep 26, 2022 at 3:44 AM Rushabh Lathia 
> wrote:
> > Commit 6566133c5f52771198aca07ed18f84519fac1be7 ensure that
> > pg_auth_members.grantor is always valid.  This commit did changes
> > into shdepDropOwned() function and combined the SHARED_DEPENDENCY_ACL
> > and SHARED_DEPENDENCY_OWNER.  In that process it removed condition for
> > local object in owner dependency.
> >
> > case SHARED_DEPENDENCY_OWNER:
> > -   /* If a local object, save it for deletion below */
> > -   if (sdepForm->dbid == MyDatabaseId)
> > +   /* Save it for deletion below */
> >
> > Case ending up with above error because of the above removed condition.
> >
> > Please find the attached patch which fixes the case.
>
> Thanks for the report. I think it would be preferable not to duplicate
> the logic as your version does, though, so here's a slightly different
> version that avoids that.
>

Yes, I was also thinking to avoid the duplicate logic but couldn't found
a way.  I did the quick testing with the patch, and reported test is working
fine.   But  "make check" is failing with few failures.


> Per Michael's suggestion, I have also written a test case and included
> it in this version.
>

Thanks for this.


> Comments?
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


-- 
Rushabh Lathia

Re: HOT chain validation in verify_heapam()

2022-09-27 Thread Himanshu Upadhyaya

On Tue, Sep 27, 2022 at 1:35 AM Robert Haas  wrote:

> On Sat, Sep 24, 2022 at 8:45 AM Himanshu Upadhyaya
>  wrote:
> > Here our objective is to validate if both Predecessor's xmin and current
> Tuple's xmin are same then cmin of predecessor must be less than current
> Tuple's cmin. In case when both tuple xmin's are same then I think
> predecessor's t_cid will always hold combo CID.
> > Then either one or both tuple will always have a combo CID and skipping
> this check based on "either tuple has a combo CID" will make this if
> condition to be evaluated to false ''.
>
> Fair point. I think maybe we should just remove the CID validation
> altogether. I thought initially that it would be possible to have a
> newer update with a numerically lower combo CID, but after some
> experimentation I don't see a way to do it. However, it also doesn't
> seem very useful to me to check that the combo CIDs are in ascending
> order. I mean, even if that's not supposed to happen and does anyway,
> there aren't really any enduring consequences, because command IDs are
> ignored completely outside of the transaction that performed the
> operation originally. So even if the combo CIDs were set to completely
> random values, is that really corruption? At most it messes things up
> for the duration of one transaction. And if we just have plain CIDs
> rather than combo CIDs, the same thing is true: they could be totally
> messed up and it wouldn't really matter beyond the lifetime of that
> one transaction.
>
> Also, it would be a lot more tempting to check this if we could check
> it in all cases, but we can't. If a tuple is inserted in transaction
> T1 and ten updated twice in transaction T2, we'll have only one combo
> CID and nothing to compare it against, nor any way to decode what CMIN
> and CMAX it originally represented. And this is probably a pretty
> common type of case.
>
> ok, I will be removing this entire validation of cmin/cid in my next patch.


> >> +if (TransactionIdEquals(pred_xmin, curr_xmin) &&
> >> +(TransactionIdEquals(curr_xmin, curr_xmax) ||
> >> + !TransactionIdIsValid(curr_xmax)) && pred_cmin >=
> curr_cmin)
> >>
> >> I don't understand the reason for the middle part of this condition --
> >> TransactionIdEquals(curr_xmin, curr_xmax) ||
> >> !TransactionIdIsValid(curr_xmax). I suppose the comment is meant to
> >> explain this, but I still don't get it. If a tuple with XMIN 12345
> >> CMIN 2 is updated to produce a tuple with XMIN 12345 CMIN 1, that's
> >> corruption, regardless of what the XMAX of the second tuple may happen
> >> to be.
> >
> > tuple | t_xmin | t_xmax | t_cid | t_ctid |
> tuple_data_split   |
>heap_tuple_infomask_flags
> >
> >
> ---+++---++-+--
> > -
> >  1 |971 |971 | 0 | (0,3)  |
> {"\\x1774657374312020202020","\\x0100"} |
> ("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED}",{})
> >  2 |971 |  0 | 1 | (0,2)  |
> {"\\x1774657374322020202020","\\x0200"} |
> ("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
> >  3 |971 |971 | 1 | (0,4)  |
> {"\\x1774657374322020202020","\\x0100"} |
> ("{HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY
> > _TUPLE}",{})
> >  4 |971 |972 | 0 | (0,5)  |
> {"\\x1774657374332020202020","\\x0100"} |
> ("{HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{})
> >  5 |972 |  0 | 0 | (0,5)  |
> {"\\x1774657374342020202020","\\x0100"} |
> ("{HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{})
> >
> > In the above case Tuple 1->3->4 is inserted and updated by xid 971 and
> tuple 4 is next update by xid 972, here t_cid of tuple 4 is 0 where as its
> predecessor's t_cid is 1, because in Tuple 4 t_cid is having command ID of
> deleting transaction(cmax), that is why we need to check xmax of the Tuple.
> >
> > Please correct me if I am missing anything here?
>
> Hmm, I see, so basically you're trying to check whether the CID field
> contains a CMIN as opposed to a CMAX. But I'm not sure this test is
> entirely reliable, because heap_prepare/execute_freeze_tuple() can set
> a tuple's xmax to InvalidTransactionId even after it's had some other
> value, and that won't do anything to the contents of the CID field.
>

ok, Got it, as we are removing this cmin/cid validation so we don't need
any change here.

-- 
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-27 Thread kuroda.hay...@fujitsu.com

Dear Wang,

Followings are comments for your patchset.


0001


01. launcher.c - logicalrep_worker_stop_internal()

```
+
+   Assert(LWLockHeldByMe(LogicalRepWorkerLock));
+
```

I think it should be Assert(LWLockHeldByMeInMode(LogicalRepWorkerLock, 
LW_SHARED))
because the lock is released one and acquired again as LW_SHARED.
If newer function has been acquired lock as LW_EXCLUSIVE and call 
logicalrep_worker_stop_internal(),
its lock may become weaker after calling it.

02. launcher.c - apply_handle_stream_start()

```
+   /*
+* Notify handle methods we're processing a remote 
in-progress
+* transaction.
+*/
+   in_streamed_transaction = true;
 
-   MyLogicalRepWorker->stream_fileset = palloc(sizeof(FileSet));
-   FileSetInit(MyLogicalRepWorker->stream_fileset);
+   /*
+* Start a transaction on stream start, this 
transaction will be
+* committed on the stream stop unless it is a 
tablesync worker in
+* which case it will be committed after processing all 
the
+* messages. We need the transaction for handling the 
buffile,
+* used for serializing the streaming data and subxact 
info.
+*/
+   begin_replication_step();
```

Previously in_streamed_transaction was set after the begin_replication_step(),
but the ordering is modified. Maybe we don't have to modify it if there is no 
particular reason.

03. launcher.c - apply_handle_stream_stop()

```
+   /* Commit the per-stream transaction */
+   CommitTransactionCommand();
+
+   /* Reset per-stream context */
+   MemoryContextReset(LogicalStreamingContext);
+
+   pgstat_report_activity(STATE_IDLE, NULL);
+
+   in_streamed_transaction = false;
```

Previously in_streamed_transaction was set after the MemoryContextReset(), but 
the ordering is modified.
Maybe we don't have to modify it if there is no particular reason.

04. applyparallelworker.c - LogicalParallelApplyLoop()

```
+   shmq_res = shm_mq_receive(mqh, , , false);
...
+   if (ConfigReloadPending)
+   {
+   ConfigReloadPending = false;
+   ProcessConfigFile(PGC_SIGHUP);
+   }
```


Here the parallel apply worker waits to receive messages and after dispatching 
it ProcessConfigFile() is called.
It means that .conf will be not read until the parallel apply worker receives 
new messages and apply them.

It may be problematic when users change log_min_message to debugXXX for 
debugging but the streamed transaction rarely come.
They expected that detailed description appears on the log from next streaming 
chunk, but it does not.

This does not occur in leader worker when it waits messages from publisher, 
because it uses libpqrcv_receive(), which works asynchronously.

I 'm not sure whether it should be documented that the evaluation of GUCs may 
be delayed, how do you think?

===
0004

05. logical-replication.sgml

```
...
In that case, it may be necessary to change the streaming mode to on or off and 
cause
the same conflicts again so the finish LSN of the failed transaction will be 
written to the server log.
 ...
```

Above sentence is added by 0001, but it is not modified by 0004.
Such transactions will be retried as streaming=on mode, so some descriptions 
related with it should be added.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: add checkpoint stats of snapshot and mapping files of pg_logical dir

2022-09-27 Thread Bharath Rupireddy

On Mon, Sep 26, 2022 at 3:21 AM Justin Pryzby  wrote:
>
> > This patch needs an update. It is failing on Windows for the test case
> > "33/238 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade with an
> > "exit status 2" error.
> >
> > The details are available on the cfbot.cputue.org.
> > https://cirrus-ci.com/task/5709014662119424?logs=check_world#L267
>
> It failed like:
> https://api.cirrus-ci.com/v1/artifact/task/5709014662119424/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/regress_log_002_pg_upgrade
> [19:35:31.017](76.307s) ok 10 - run of pg_upgrade for new instance
> [19:35:31.017](0.000s) not ok 11 - pg_upgrade_output.d/ removed after 
> pg_upgrade success
> [19:35:31.018](0.001s) #   Failed test 'pg_upgrade_output.d/ removed after 
> pg_upgrade success'
> #   at C:/cirrus/src/bin/pg_upgrade/t/002_pg_upgrade.pl line 261.
>
> This is a transient failure; it normally passes:
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3389
>
> Actually, it's the same problem as reported here:
> 20220919213217.ptqfdlcc5idk5...@awork3.anarazel.de

Thanks Justin for identifying and linking the issue, I've sent a note
in the related thread [1], just for the records.

I've changed the CF entry for this thread
https://commitfest.postgresql.org/39/3389/ back to RfC as the issue is
unrelated to this patch.

[1] 
https://www.postgresql.org/message-id/CALj2ACXAQqwwfPzbQr0C6spKq6f9rR6PYHWwm-_o%3D_pMuJ2oxg%40mail.gmail.com

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

1 2 >

1 - 100 of 103 matches

Mail list logo