Re: HOT chain validation in verify_heapam()

2023-03-22 Thread Himanshu Upadhyaya
On Thu, Mar 23, 2023 at 2:15 AM Andres Freund  wrote:

>
> Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is
> set
> and expects to find an item it can dereference - but I don't think that's
> something we can rely on: Afaics HOT pruning can break chains, but doesn't
> reset xmax.
>
> We have below code which I think takes care of xmin and xmax matching and
if they match then only we add them to the predecessor array.
/*
 * If the next line pointer is a redirect, or if
it's a tuple
 * but the XMAX of this tuple doesn't match the
XMIN of the next
 * tuple, then the two aren't part of the same
update chain and
 * there is nothing more to do.
 */
if (ItemIdIsRedirected(next_lp))
continue;
curr_htup = (HeapTupleHeader) PageGetItem(ctx.page,
curr_lp);
curr_xmax = HeapTupleHeaderGetUpdateXid(curr_htup);
next_htup = (HeapTupleHeader) PageGetItem(ctx.page,
next_lp);
next_xmin = HeapTupleHeaderGetXmin(next_htup);
if (!TransactionIdIsValid(curr_xmax) ||
!TransactionIdEquals(curr_xmax, next_xmin))
continue;

-- 
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com


Re: Allow logical replication to copy tables in binary format

2023-03-22 Thread Amit Kapila
On Wed, Mar 22, 2023 at 4:06 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > The patch looks mostly good to me. However, I have one
> > question/comment as follows:
> >
> > -   
> > +   
> >  binary (boolean)
> >  
> >
> > To allow references to the binary option, we add the varlistentry id
> > here. It looks slightly odd to me to add id for just one entry, see
> > commit 78ee60ed84bb3a1cf0b6bd9a715dcbcf252a90f5 where we have
> > purposefully added ids to allow future references. Shall we add id to
> > other options as well on this page?
>
> I have analyzed same points and made patch that could be applied atop 
> v19-0001.
> Please check 0002 patch.
>

Pushed the 0001. It may be better to start a separate thread for 0002.

-- 
With Regards,
Amit Kapila.




Re: Add n_tup_newpage_upd to pg_stat table views

2023-03-22 Thread Corey Huinker
>
>
> * No more dedicated struct to carry around the type of an update.
>
> We just use two boolean arguments to the pgstats function instead. The
> struct didn't seem to be adding much, and it was distracting to track
> the information this way within heap_update().
>

That's probably a good move, especially if we start tallying updates that
use TOAST.


Re: Error "initial slot snapshot too large" in create replication slot

2023-03-22 Thread Kyotaro Horiguchi
At Wed, 22 Mar 2023 14:27:40 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Mon, 20 Mar 2023 13:46:51 -0400, "Gregory Stark (as CFM)" 
>  wrote in 
> > Kyotoro Horiguchi, any chance you'll be able to work on this for this
> > commitfest? If so shout (or anyone else is planning to push it over
> > the line Andres?) otherwise I'll move it on to the next release.
> 
> Ugg. sorry for being lazy.  I have lost track of the conversation. I'm
> currently working on this and will come back soon with a new version.

I relized that attempting to make SnapshotData.xip expansible was
making procarray.c and snapmgr.c too complicated. The main reason is
that SnapShotData is allocated in various ways, like on the stack,
using palloc including xip/subxip arrays, with palloc then allocating
xip/subxip arrays separately, or statically allocated and then having
xip/subxip arrays malloc'ed later. This variety was making the
expansion logic a mess.

So I went back to square one and decided to use subxip as an extension
for the xip array instead.

Like the comment added in the function SnapBuildInitialSnapshot
mentions, I don't think we can reliably identify top-level XIDs. So,
this patch just increases the allowed number of XIDs by using the
subxip array.

(The title of the patch was changed accordingly.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 4a41002eea7eaa18bd51521798c19d191d9c6d8c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 19 Jul 2022 11:50:29 +0900
Subject: [PATCH v7] Lift initial snapshot limit for logical replication

The replication command CREATE_REPLICATION_SLOT tends to fail with
"snapshot too large" when many subtransactions are active. This
problem stems from SnapBuildInitialSnapshot attempting to generate a
snapshot in which all active XIDs, including subxids, are stored
within the xip array. This patch mitigates the constraint on the
acceptable number of transaction IDs by utilizing the subxid array.

Author: Dilip Kumar and Kyotaro Horiguchi
---
 src/backend/replication/logical/snapbuild.c | 50 +
 src/backend/storage/ipc/procarray.c | 18 ++--
 src/include/access/transam.h| 33 ++
 3 files changed, 77 insertions(+), 24 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 62542827e4..516e9ddfc8 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -565,10 +565,14 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
+	int			ntxn;
+	int			xiplen;
 	TransactionId xid;
 	TransactionId safeXid;
 	TransactionId *newxip;
+	TransactionId *newsubxip;
 	int			newxcnt = 0;
+	int			newsubxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
 	Assert(builder->building_full_snapshot);
@@ -610,9 +614,35 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
-	/* allocate in transaction context */
-	newxip = (TransactionId *)
-		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+	/*
+	 * Aallocate in transaction context
+	 *
+	 * Since we know all the active XIDs, this snapshot won't be
+	 * suboverflowed. However, if the number of XIDs surpasses the XID arrays'
+	 * capacity, we cannot establish this snapshot because we don't know which
+	 * XIDs are top-level. Since the original snapshot is historical, we can't
+	 * reliably idetify the top-level XIDs. Thus, we have no choice but fail
+	 * when there are too many active XIDs.
+	 */
+	ntxn = TransactionIdDistance(snap->xmin, snap->xmax) - snap->xcnt;
+	xiplen = GetMaxSnapshotXidCount();
+	newxip = (TransactionId *) palloc(sizeof(TransactionId) * xiplen);
+
+	if (ntxn > GetMaxSnapshotXidCount())
+	{
+		/*
+		 * Since we can't identify the top-level XIDs, we can't carete a
+		 * suboverflowed snapshot.
+		 */
+		if (ntxn > xiplen + GetMaxSnapshotSubxidCount())
+			ereport(ERROR,
+	(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+	 errmsg("initial slot snapshot too large")));
+			
+		newsubxip = (TransactionId *)
+			palloc(sizeof(TransactionId) * GetMaxSnapshotSubxidCount());
+	}
+		
 
 	/*
 	 * snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -633,21 +663,23 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 		if (test == NULL)
 		{
-			if (newxcnt >= GetMaxSnapshotXidCount())
-ereport(ERROR,
-		(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-		 errmsg("initial slot snapshot too large")));
-
-			newxip[newxcnt++] = xid;
+			if (newxcnt < xiplen)
+newxip[newxcnt++] = xid;
+			else
+newsubxip[newsubxcnt++] = xid;
 		}
 
 		TransactionIdAdvance(xid);
 	}
 
+	Assert(newxcnt + newsubxcnt == ntxn);
+
 	/* adjust remaining snapshot fields as needed */
 	snap->snapshot_type = SNAPSHOT_MVCC;
 	snap->xcnt = newxcnt;
 	snap->xip = newxip;
+	snap->subxcnt = newsubxcnt;
+	snap->subxip = newsubxip;
 
 	return snap;
 }
diff --git 

Re: Data is copied twice when specifying both child and parent table in publication

2023-03-22 Thread Peter Smith
Here are some review comments for patch v20-0001.

==
General.

1.
That function 'pg_get_publication_tables' does not seem to be
described in the PG documentation. Why isn't it in the "System Catalog
Information Functions" table [1] ?

I asked this same question a long time ago but then the reply [2] was
like "it doesn't seem to be a function provided to users".

Well, perhaps that just means that the documentation has been
accidentally missing for a long time. Does anybody know for sure if
the omission of this function from the documentation is deliberate? If
nobody here knows, then maybe this can be asked/addressed in a
separate thread.

==
src/backend/catalog/pg_publication.c

2. filter_partitions

(review comment from my v19 review)

> 2a.
> My previous review [1] (see #1) suggested putting most code within the
> condition. AFAICT my comment still is applicable but was not yet
> addressed.

22/3 Wang-san replied: "Personally, I prefer the current style because
the approach you mentioned adds some indentations."

Sure, but there is more than just indentation/style differences here.
Currently, there is some unnecessary code executed if the table is not
a partition. And the reader cannot tell at-a-glance if (skip) will be
true/false without looking more closely at the loop logic. So, I think
changing it would be better, but anyway I won’t debate about it any
more because it's not a functional problem.

==
src/backend/commands/subscriptioncmds.c

3. fetch_table_list

+ /* Get the list of tables from the publisher. */
+ if (server_version >= 16)
+ {
+ StringInfoData pub_names;

- appendStringInfoString(, "FROM pg_catalog.pg_publication_tables t\n"
-" WHERE t.pubname IN (");
- get_publications_str(publications, , true);
- appendStringInfoChar(, ')');
+ initStringInfo(_names);
+ get_publications_str(publications, _names, true);
+
+ /*
+ * From version 16, we allowed passing multiple publications to the
+ * function pg_get_publication_tables. This helped to filter out the
+ * partition table whose ancestor is also published in this
+ * publication array.
+ *
+ * Join pg_get_publication_tables with pg_publication to exclude
+ * non-existing publications.
+ */
+ appendStringInfo(, "SELECT DISTINCT N.nspname, C.relname,\n"
+ "  ( SELECT array_agg(a.attname ORDER BY a.attnum)\n"
+ "FROM pg_attribute a\n"
+ "WHERE a.attrelid = GPT.relid AND\n"
+ "  a.attnum = ANY(GPT.attrs)\n"
+ "  ) AS attnames\n"
+ " FROM pg_class C\n"
+ "   JOIN pg_namespace N ON N.oid = C.relnamespace\n"
+ "   JOIN ( SELECT (pg_get_publication_tables(VARIADIC
array_agg(pubname::text))).*\n"
+ "  FROM pg_publication\n"
+ "  WHERE pubname IN ( %s )) as GPT\n"
+ "   ON GPT.relid = C.oid\n",
+ pub_names.data);
+
+ pfree(pub_names.data);
+ }
+ else
+ {
+ appendStringInfoString(, "SELECT DISTINCT t.schemaname, t.tablename \n");
+
+ /* Get column lists for each relation if the publisher supports it */
+ if (check_columnlist)
+ appendStringInfoString(, ", t.attnames\n");
+
+ appendStringInfoString(, "FROM pg_catalog.pg_publication_tables t\n"
+" WHERE t.pubname IN (");
+ get_publications_str(publications, , true);
+ appendStringInfoChar(, ')');
+ }

I noticed the SQL "if" part is using uppercase aliases, but the SQL in
the "else" part is using lowercase aliases. I think it would be better
to be consistent (pick one).

==
src/test/subscription/t/013_partition.pl

4.
-# for tab4, we publish changes through the "middle" partitioned table
+# If we subscribe only to pub_lower_level, changes for tab4 will be published
+# through the "middle" partition table. However, since we will be subscribing
+# to both pub_lower_level and pub_all (see subscription sub2 below), we will
+# publish changes via the root table (tab4).
 $node_publisher->safe_psql('postgres',
  "CREATE PUBLICATION pub_lower_level FOR TABLE tab4_1 WITH
(publish_via_partition_root = true)"
 );

~

This comment seemed a bit overkill IMO. I don't think you need to say
much here except maybe:

# Note that subscription "sub2" will later subscribe simultaneously to
both pub_lower_level (i.e. just table tab4_1) and  pub_all.

~~~

5.
I think maybe you could have another test scenario where you INSERT
something into tab4_1_1 while only the publication for tab4_1 has
publish_via_partition_root=true

~~~

6.
AFAICT the tab4 tests are only testing the initial sync, but are not
testing normal replication. Maybe some more normal (post sync) INSERTS
are needed to tab4, tab4_1, tab4_1_1 ?


==
src/test/subscription/t/028_row_filter.pl

7.
+# insert data into partitioned table.
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO tab_rowfilter_viaroot_part(a) VALUES(13), (17)");
+
 $node_subscriber->safe_psql('postgres',
  "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
application_name=$appname' PUBLICATION tap_pub_1, tap_pub_2,
tap_pub_3, tap_pub_4a, tap_pub_4b, 

Re: doc: add missing "id" attributes to extension packaging page

2023-03-22 Thread Karl O. Pinc
On Tue, 21 Mar 2023 23:16:25 +0100
Brar Piening  wrote:

> On 17.01.2023 at 23:43, Karl O. Pinc wrote:
> > It's good you asked.  After poking about the XSLT 1.0 spec to
> > refresh my memory I abandoned the "mode" approach entirely.  The
> > real "right way" is to use the import mechanism.  

> After quite some time trying to figure out why things don't work as
> intended, I ended up reading the XSLT 1.0 spec.
> 
> As the name already suggests,  is closely related
> to  with the difference that it *applies* a
> *template rule* from an imported style sheet instead of applying a
> template rule from the current style sheet
> (https://www.w3.org/TR/1999/REC-xslt-19991116#apply-imports). What it
> does not do is *calling* a *named template*
> (https://www.w3.org/TR/1999/REC-xslt-19991116#named-templates).
> 
> What this basically means is that in XSLT 1.0 you can use
>  to override template rules ( match="this-pattern-inside-match-makes-it-a-template-rule">) but you
> cannot use it to override named templates ( name="this-id-inside-name-makes-it-a-named-template">). If you want to
> override named templates you basically have to duplicate and change
> them.
> 
> While there are mechanisms to call overriden named templates in XSLT
> 3, this is out of scope here, since we're bound to XSLT 1.0

(It was reassuring to see you take the steps above; I once did exactly
the same with and had the same excitements and disappointments.  I
feel validated.  ;-)

(One of my disappointments is that xsltproc supports only XSLT 1.0,
and nothing later.  IIRC, apparently one big reason is not the amount
work needed to develop the program but the work required to develop a
test suite to validate conformance.)

> As a consequence, there was little I could change in the initial patch
> to avoid the code duplication and all attempts to do so, ultimately
> led to even longer and more complex code without really reducing the
> amount of duplication.

You're quite right.  I clearly didn't have my XSLT turned on.  Importing
only works when templates are matched, not called by name.

Sorry for the extra work I've put you through.

> The  approach actually does work in the
> varlistentry case, although this doesn't really change a lot
> regarding the length of the patch (it's a bit nicer though since in
> this case it really avoids duplication).


You've put in a lot of good work.  I'm attaching 2 patches
with only minor changes.


001-add-needed-ids_v1.patch

This separates out the addition of ids from the XSLT changes, just
to keep things tidy.  Content is from your patch.


002-make_html_ids_discoverable_v4.patch

I changed the linked text, the #, so that the leading space
is not linked.  This is arguable, as the extra space makes
it easier to put the mouse on the region.  But it seems
tidy.

I've tided up so the lines are no longer than 80 chars.

> I've also taken the advice
> to terminate the build and print the xpath if a required id is
> missing.

This looks awesome.  I love the xpath!  I've changed the format of the
error message.  What do you think?  (Try it out by _not_ applying
001-add-needed-ids_v1.patch.)

Also, the error message now has leading and trailing newlines to make
it stand out.  I'm normally against this sort of thing but thought I'd
add it anyway for others to review.

I'm ready to send these on to a committer but if you don't
like what I did please send more patches for me to review.


Outstanding questions (for committer?):

The 002-make_html_ids_discoverable_v4.patch generates xhtml ,
, etc. attributes using a XSLT  element with a
"namespace" attribute.  I'm unclear on the relationship PG has with
xhtml and namespaces.  Looks right to me, since the generated html has
the same namespace name appearing in the xmlns attribute of the html
tag, but somebody who knows more than me might want to review this.

Using the namespace attribute does not seem to have affected the
generated html, as far as my random sampling of output can tell.


What character should be used to represent a link anchor?

I've left your choice of "#" in the patch.

If we go to unicode, My preference is the text paperclip  ︎

Here's a table of all the choices I came up with, there may be others
that are suitable.  (The hex representations are Python 3 string
literals.  So print("\U0001f4ce\ufe0e") prints the text paperclip.)

Hash mark  #   (ASCII, used in the patch, \u0023)
Anchor ⚓  \u2693
Place of interest  ⌘   \u2318
Bullseye   ◎   \u25ce
Link (emoji variant)     \U0001f517
Link (text variant)︎   \U0001f517\ufe0e
Paperclip (emoji variant)    \U0001f4ce
Paperclip (text variant)   ︎  \U0001f4ce\ufe0e


Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
diff --git a/doc/src/sgml/pgwalinspect.sgml b/doc/src/sgml/pgwalinspect.sgml
index 9a0241a8d6..62d9f9eb22 100644
--- a/doc/src/sgml/pgwalinspect.sgml
+++ 

Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-22 Thread Michael Paquier
On Thu, Mar 16, 2023 at 01:17:59PM +0530, Bharath Rupireddy wrote:
> FWIW, I rebased the tests tweaking patch and attached it here as v9.
> This should keep the pg_walinspect tests consistent across comments,
> spaces, new lines and using count(*) >= 1 for all successful function
> executions. Thoughts?

Mostly OK by me, so applied after tweaking a few tiny things.  The
rewrites of the queries where we should have more than one record and
the removal of count() for the failure cases have been kept as
proposed, as are most of the comments.
--
Michael


signature.asc
Description: PGP signature


回复: WAL Insertion Lock Improvements

2023-03-22 Thread adherent postgres
Hi Andres Freund
This patch improves performance significantly,Commitfest 2023-03 is coming 
to an end,Is it not submitted yet since the patch still needs to be improved?

Best wish

发件人: Nathan Bossart 
发送时间: 2023年2月21日 13:49
收件人: Bharath Rupireddy 
抄送: Andres Freund ; PostgreSQL Hackers 

主题: Re: WAL Insertion Lock Improvements

On Thu, Feb 09, 2023 at 11:51:28AM +0530, Bharath Rupireddy wrote:
> On Thu, Feb 9, 2023 at 3:36 AM Nathan Bossart  
> wrote:
>> Overall, I think this patch is in reasonable shape.
>
> Thanks for reviewing. Please see the attached v5 patch.

I'm marking this as ready-for-committer.  I think a couple of the comments
could use some small adjustments, but that probably doesn't need to hold up
this patch.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Orphaned wait event

2023-03-22 Thread Thomas Munro
Hi,

Commit dee663f7 made WAIT_EVENT_SLRU_FLUSH_SYNC redundant, so here's a
patch to remove it.

In case it's useful again, here's how I noticed:

for X in ` grep WAIT_EVENT_ src/include/utils/wait_event.h |
   sed '/^#/d;s/,//;s/ = .*//' `
do
  if ! ( git grep $X |
 grep -v src/include/utils/wait_event.h |
 grep -v src/backend/utils/activity/wait_event.c |
 grep $X > /dev/null )
  then
echo "$X is not used"
  fi
done
From 5bd5d5448b1fbd01833069af72bf12d19eef42dd Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 23 Mar 2023 14:36:27 +1300
Subject: [PATCH] Remove orphaned wait event.

Commit dee663f7 made WAIT_EVENT_SLRU_FLUSH_SYNC redundant.

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 7ab4424bf1..45259efd0f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1468,11 +1468,6 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   ReplicationSlotWrite
   Waiting for a write to a replication slot control file.
  
- 
-  SLRUFlushSync
-  Waiting for SLRU data to reach durable storage during a checkpoint
-   or database shutdown.
- 
  
   SLRURead
   Waiting for a read of an SLRU page.
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 7940d64639..23a85345b7 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -672,9 +672,6 @@ pgstat_get_wait_io(WaitEventIO w)
 		case WAIT_EVENT_REPLICATION_SLOT_WRITE:
 			event_name = "ReplicationSlotWrite";
 			break;
-		case WAIT_EVENT_SLRU_FLUSH_SYNC:
-			event_name = "SLRUFlushSync";
-			break;
 		case WAIT_EVENT_SLRU_READ:
 			event_name = "SLRURead";
 			break;
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 518d3b0a1f..14b7ac5cf6 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -207,7 +207,6 @@ typedef enum
 	WAIT_EVENT_REPLICATION_SLOT_RESTORE_SYNC,
 	WAIT_EVENT_REPLICATION_SLOT_SYNC,
 	WAIT_EVENT_REPLICATION_SLOT_WRITE,
-	WAIT_EVENT_SLRU_FLUSH_SYNC,
 	WAIT_EVENT_SLRU_READ,
 	WAIT_EVENT_SLRU_SYNC,
 	WAIT_EVENT_SLRU_WRITE,
-- 
2.39.2



Re: refactoring basebackup.c

2023-03-22 Thread Thomas Munro
On Thu, Mar 23, 2023 at 2:50 PM Thomas Munro  wrote:
> In rem: commit 3500ccc3,
>
> for X in ` grep -E '^[^*]+event_name = "'
> src/backend/utils/activity/wait_event.c |
>sed 's/^.* = "//;s/";$//;/unknown/d' `
> do
>   if ! git grep "$X" doc/src/sgml/monitoring.sgml > /dev/null
>   then
> echo "$X is not documented"
>   fi
> done
>
> BaseBackupSync is not documented
> BaseBackupWrite is not documented

[Resending with trimmed CC: list, because the mailing list told me to
due to a blocked account, sorry if you already got the above.]




[PATCH] Remove unnecessary unbind in LDAP search+bind mode

2023-03-22 Thread Anatoly Zaretsky
Hi!

Comments in src/backend/libpq/auth.c [1] say:
(after successfully finding the final DN to check the user-supplied
password against)
/* Unbind and disconnect from the LDAP server */
and later
/*
* Need to re-initialize the LDAP connection, so that we can bind to
* it with a different username.
*/

But the protocol actually permits multiple subsequent authentications
("binds" in LDAP parlance) over a single connection [2].
Moreover, inspection of the code revision history of mod_authnz_ldap,
pam_ldap, Bugzilla, and MediaWiki LDAP authentication plugin, shows that
they've been doing this bind-after-search over the same LDAP connection for
~20 years without any evidence of interoperability troubles.

(mod_authnz_ldap and pam_ldap are listed in the PostgreSQL documentation as
examples of other software implementing this scheme. Bugzilla and MediaWiki
are the original patch author's motivating examples [3])

Also it might be interesting to consider this note from the current
revision of the protocol RFC [4]:
"The Unbind operation is not the antithesis of the Bind operation as the
name implies. The naming of these operations are historical. The Unbind
operation should be thought of as the "quit" operation."

So, it seems like the whole connection re-initialization thing was just a
confusion caused by this very unfortunate "historical" naming, and can be
safely removed, thus saving quite a few network round-trips, especially for
the case of ldaps/starttls.

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/libpq/auth.c;h=bc0cf26b122a1b28c20fe037ec851c0e99b1ffb6;hb=HEAD#l2603
[2] https://www.rfc-editor.org/rfc/rfc4511#section-4.2.1
[3]
https://www.postgresql.org/message-id/4c0112730909141334n201cadf3x2e288528a97883ca%40mail.gmail.com
[4] https://www.rfc-editor.org/rfc/rfc4511#section-4.3
-- 
Best regards,
Anatoly Zaretsky


v1-0001-Remove-unnecessary-unbind-in-LDAP-search-bind-mod.patch
Description: Binary data


RE: Simplify some codes in pgoutput

2023-03-22 Thread houzj.f...@fujitsu.com
On Monday, March 20, 2023 5:20  pmhouzj.f...@fujitsu.com wrote:
> 
> On Thursday, March 16, 2023 12:30 PM Amit Kapila 
> wrote:
> 
> >
> > On Wed, Mar 15, 2023 at 2:00 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > I noticed that there are some duplicated codes in pgoutput_change()
> > function
> > > which can be simplified, and here is an attempt to do that.
> > >
> >
> > For REORDER_BUFFER_CHANGE_DELETE, when the old tuple is missing, after
> > this patch, we will still send BEGIN and do OutputPluginWrite, etc.
> > Also, it will try to perform row_filter when none of old_slot or
> > new_slot is set. I don't know for which particular case we have s
> > handling missing old tuples for deletes but that might require changes
> > in your proposed patch.
> 
> I researched this a bit. I think the old tuple will be null only if the 
> modified table
> doesn't have PK or RI when the DELETE happens (referred to the heap_delete()),
> but in that case the DELETE won't be allowed to be replicated(e.g. the DELETE
> will either error out or be filtered by table level filter in 
> pgoutput_change).
> 
> I also checked this for system table and in that case it is null but 
> reorderbuffer
> doesn't forward it. For user_catalog_table, similarily, the DELETE should be
> filtered by table filter in pgoutput_change as well.
> 
> So, I think we can remove this check and log.
> And here is the new version patch which removes that for now.

After rethinking about this, it seems better leave this check for now. Although
it may be unnecessary, but we can remove that later as a separate patch when we
are sure about this. So, here is a patch that add this check back.

Best Regards,
Hou zj



v3-0001-simplify-the-code-in-pgoutput_change.patch
Description: v3-0001-simplify-the-code-in-pgoutput_change.patch


Re: HOT chain validation in verify_heapam()

2023-03-22 Thread Andres Freund
Hi,

On 2023-03-22 13:45:52 -0700, Andres Freund wrote:
> On 2023-03-22 09:19:18 -0400, Robert Haas wrote:
> > On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev
> >  wrote:
> > > The patch needed a rebase due to a4f23f9b. PFA v12.
> >
> > I have committed this after tidying up a bunch of things in the test
> > case file that I found too difficult to understand -- or in some cases
> > just incorrect, like:
> 
> As noticed by Andrew
> https://postgr.es/m/bfa5bd2b-c0e6-9d65-62ce-97f4766b1c42%40dunslane.net and
> then reproduced on HEAD by me, there's something not right here.

skink / valgrind reported in a while back and found another issue:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2023-03-22%2021%3A53%3A41

==2490364== VALGRINDERROR-BEGIN
==2490364== Conditional jump or move depends on uninitialised value(s)
==2490364==at 0x11D459F2: check_tuple_visibility (verify_heapam.c:1379)
==2490364==by 0x11D46262: check_tuple (verify_heapam.c:1812)
==2490364==by 0x11D46FDF: verify_heapam (verify_heapam.c:535)
==2490364==by 0x3D5B2C: ExecMakeTableFunctionResult (execSRF.c:235)
==2490364==by 0x3E8225: FunctionNext (nodeFunctionscan.c:95)
==2490364==by 0x3D6685: ExecScanFetch (execScan.c:133)
==2490364==by 0x3D6709: ExecScan (execScan.c:182)
==2490364==by 0x3E813A: ExecFunctionScan (nodeFunctionscan.c:270)
==2490364==by 0x3D31C4: ExecProcNodeFirst (execProcnode.c:464)
==2490364==by 0x3FF7E7: ExecProcNode (executor.h:262)
==2490364==by 0x3FFB15: ExecNestLoop (nodeNestloop.c:160)
==2490364==by 0x3D31C4: ExecProcNodeFirst (execProcnode.c:464)
==2490364==  Uninitialised value was created by a stack allocation
==2490364==at 0x11D45325: check_tuple_visibility (verify_heapam.c:994)
==2490364== 
==2490364== VALGRINDERROR-END
==2490364== VALGRINDERROR-BEGIN
==2490364== Conditional jump or move depends on uninitialised value(s)
==2490364==at 0x11D45AC6: check_tuple_visibility (verify_heapam.c:1379)
==2490364==by 0x11D46262: check_tuple (verify_heapam.c:1812)
==2490364==by 0x11D46FDF: verify_heapam (verify_heapam.c:535)
==2490364==by 0x3D5B2C: ExecMakeTableFunctionResult (execSRF.c:235)
==2490364==by 0x3E8225: FunctionNext (nodeFunctionscan.c:95)
==2490364==by 0x3D6685: ExecScanFetch (execScan.c:133)
==2490364==by 0x3D6709: ExecScan (execScan.c:182)
==2490364==by 0x3E813A: ExecFunctionScan (nodeFunctionscan.c:270)
==2490364==by 0x3D31C4: ExecProcNodeFirst (execProcnode.c:464)
==2490364==by 0x3FF7E7: ExecProcNode (executor.h:262)
==2490364==by 0x3FFB15: ExecNestLoop (nodeNestloop.c:160)
==2490364==by 0x3D31C4: ExecProcNodeFirst (execProcnode.c:464)
==2490364==  Uninitialised value was created by a stack allocation
==2490364==at 0x11D45325: check_tuple_visibility (verify_heapam.c:994)
==2490364==

Greetings,

Andres Freund




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-22 Thread Andres Freund
Hi,

On 2023-03-21 01:25:11 +0300, Alexander Korotkov wrote:
> I'm going to push patchset v15 if no objections.

Just saw that this went in - didn't catch up with the thread before,
unfortunately. At the very least I'd like to see some more work on cleaning up
the lazy tuple slot stuff. It's replete with unnecessary multiple-evaluation
hazards - I realize that there's some of those already, but I don't think we
should go further down that route. As far as I can tell there's no need for
any of this to be macros.


> From a8b4e8a7b27815e013ea07b8cc9ac68541a9ac07 Mon Sep 17 00:00:00 2001
> From: Alexander Korotkov 
> Date: Tue, 21 Mar 2023 00:34:15 +0300
> Subject: [PATCH 1/2] Evade extra table_tuple_fetch_row_version() in
>  ExecUpdate()/ExecDelete()
>
> When we lock tuple using table_tuple_lock() then we at the same time fetch
> the locked tuple to the slot.  In this case we can skip extra
> table_tuple_fetch_row_version() thank to we've already fetched the 'old' tuple
> and nobody can change it concurrently since it's locked.
>
> Discussion: 
> https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com
> Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp
> Reviewed-by: Andres Freund, Chris Travers
> ---
>  src/backend/executor/nodeModifyTable.c | 48 +++---
>  1 file changed, 35 insertions(+), 13 deletions(-)
>
> diff --git a/src/backend/executor/nodeModifyTable.c 
> b/src/backend/executor/nodeModifyTable.c
> index 3a673895082..93ebfdbb0d8 100644
> --- a/src/backend/executor/nodeModifyTable.c
> +++ b/src/backend/executor/nodeModifyTable.c
> @@ -1559,6 +1559,22 @@ ldelete:
>   {
>   case TM_Ok:
>   
> Assert(context->tmfd.traversed);
> +
> + /*
> +  * Save locked tuple 
> for further processing of
> +  * RETURNING clause.
> +  */
> + if (processReturning &&
> + 
> resultRelInfo->ri_projectReturning &&
> + 
> !resultRelInfo->ri_FdwRoutine)
> + {
> + TupleTableSlot 
> *returningSlot;
> +
> + returningSlot = 
> ExecGetReturningSlot(estate, resultRelInfo);
> + 
> ExecCopySlot(returningSlot, inputslot);
> + 
> ExecMaterializeSlot(returningSlot);
> + }
> +
>   epqslot = 
> EvalPlanQual(context->epqstate,
>   
>resultRelationDesc,
>   
>resultRelInfo->ri_RangeTableIndex,

This seems a bit byzantine. We use inputslot = EvalPlanQualSlot(...) to make
EvalPlanQual() a bit cheaper, because that avoids a slot copy inside
EvalPlanQual().  But now we copy and materialize that slot anyway - and we do
so even if EPQ fails. And we afaics also do it when epqreturnslot is set, in
which case we'll afaics never use the copied slot.

Read the next paragraph below before replying to the above - I don't think
this is right for other reasons:

> @@ -1673,12 +1689,17 @@ ldelete:
>   }
>   else
>   {
> + /*
> +  * Tuple can be already fetched to the returning slot 
> in case
> +  * we've previously locked it.  Fetch the tuple only if 
> the slot
> +  * is empty.
> +  */
>   slot = ExecGetReturningSlot(estate, resultRelInfo);
>   if (oldtuple != NULL)
>   {
>   ExecForceStoreHeapTuple(oldtuple, slot, false);
>   }
> - else
> + else if (TupIsNull(slot))
>   {
>   if 
> (!table_tuple_fetch_row_version(resultRelationDesc, tupleid,
>   
>SnapshotAny, slot))


I don't think this is correct as-is - what if ExecDelete() is called with some
older tuple in the returning slot? If we don't enter the TM_Updated path, it
won't get updated, and we'll return the wrong tuple.  It 

Re: Add n_tup_newpage_upd to pg_stat table views

2023-03-22 Thread Michael Paquier
On Wed, Mar 22, 2023 at 05:14:08PM -0700, Peter Geoghegan wrote:
> * Small adjustments to the documentation.
> 
> Nearby related items were tweaked slightly to make everything fit
> together a bit better. For example, the description of n_tup_hot_upd
> is revised to make it obvious that n_tup_hot_upd counts row updates
> that can never get counted under the new n_tup_newpage_upd counter.

@@ -168,6 +168,7 @@ typedef struct PgStat_TableCounts
PgStat_Counter t_tuples_updated;
PgStat_Counter t_tuples_deleted;
PgStat_Counter t_tuples_hot_updated;
+   PgStat_Counter t_tuples_newpage_updated;
boolt_truncdropped;

I have in the works something that's going to rename these fields to
not have the "t_" prefix anymore, to ease some global refactoring in
pgstatfuncs.c so as we have less repetitive code with the functions 
that grab these counters.  I don't think that's something you need to
name without the prefix here, just a FYI that this is going to be
immediately renamed ;)
--
Michael


signature.asc
Description: PGP signature


Re: Add pg_walinspect function with block info columns

2023-03-22 Thread Peter Geoghegan
On Wed, Mar 22, 2023 at 5:14 PM Michael Paquier  wrote:
> On Wed, Mar 22, 2023 at 05:05:10PM -0700, Peter Geoghegan wrote:
> > I'd go further than that myself: I haven't had any use for FPIs at
> > all. If I was going to do something with FPIs then I'd just use
> > pg_waldump, since I'd likely want to get them onto the filesystem for
> > analysis anyway. (Just my experience.)
>
> FWIW, being able to get access to raw FPIs with a SQL interface is
> useful if you cannot log into the host, which is something that a lot
> of cloud providers don't allow, and not everybody is able to have
> access to archived WAL segments in a different host.

I'm not saying that it's not ever useful. Just that finding FPIs
interesting isn't necessarily all that common when using
pg_get_wal_block_info (or won't be, once it has those additional
columns from pg_get_wal_records_info).


-- 
Peter Geoghegan




Re: Add pg_walinspect function with block info columns

2023-03-22 Thread Michael Paquier
On Wed, Mar 22, 2023 at 05:05:10PM -0700, Peter Geoghegan wrote:
> I'd go further than that myself: I haven't had any use for FPIs at
> all. If I was going to do something with FPIs then I'd just use
> pg_waldump, since I'd likely want to get them onto the filesystem for
> analysis anyway. (Just my experience.)

FWIW, being able to get access to raw FPIs with a SQL interface is
useful if you cannot log into the host, which is something that a lot
of cloud providers don't allow, and not everybody is able to have
access to archived WAL segments in a different host.
--
Michael


signature.asc
Description: PGP signature


Re: Add n_tup_newpage_upd to pg_stat table views

2023-03-22 Thread Peter Geoghegan
On Fri, Mar 17, 2023 at 3:22 PM Peter Geoghegan  wrote:
> I think that this is pretty close to being committable already.

Attached revision has some small tweaks by me. Going to commit this
revised version tomorrow morning.

Changes:

* No more dedicated struct to carry around the type of an update.

We just use two boolean arguments to the pgstats function instead. The
struct didn't seem to be adding much, and it was distracting to track
the information this way within heap_update().

* Small adjustments to the documentation.

Nearby related items were tweaked slightly to make everything fit
together a bit better. For example, the description of n_tup_hot_upd
is revised to make it obvious that n_tup_hot_upd counts row updates
that can never get counted under the new n_tup_newpage_upd counter.

--
Peter Geoghegan
From 23d768e87e95e421e583e2f0dc06bd36534081a6 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Wed, 22 Mar 2023 11:58:18 -0700
Subject: [PATCH v2] Count the number of new page updates in pgstats.

Bump catalog and stats format versions.

Corey Huinker, with some tweaks by me.

Author: Corey Huinker 
Reviewed-By: Peter Geoghegan 
Reviewed-By: Andres Freund 
Discussion: https://postgr.es/m/CADkLM=ded21M9iZ36hHm-vj2rE2d=zcKpUQMds__Xm2pxLfHKA@mail.gmail.com
---
 src/include/catalog/catversion.h |  2 +-
 src/include/catalog/pg_proc.dat  | 10 
 src/include/pgstat.h | 10 
 src/backend/access/heap/heapam.c |  2 +-
 src/backend/catalog/system_views.sql |  4 +++-
 src/backend/utils/activity/pgstat_relation.c | 12 --
 src/backend/utils/adt/pgstatfuncs.c  | 18 ++
 doc/src/sgml/monitoring.sgml | 25 
 src/test/regress/expected/rules.out  | 12 +++---
 9 files changed, 78 insertions(+), 17 deletions(-)

diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index e94528a7c..0c0915885 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
  */
 
 /*			mmddN */
-#define CATALOG_VERSION_NO	202303181
+#define CATALOG_VERSION_NO	202303221
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5cf87aeb2..7c358cff1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5360,6 +5360,11 @@
   proname => 'pg_stat_get_tuples_hot_updated', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_tuples_hot_updated' },
+{ oid => '8614',
+  descr => 'statistics: number of tuples updated onto a new page',
+  proname => 'pg_stat_get_tuples_newpage_updated', provolatile => 's',
+  proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_tuples_newpage_updated' },
 { oid => '2878', descr => 'statistics: number of live tuples',
   proname => 'pg_stat_get_live_tuples', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
@@ -5823,6 +5828,11 @@
   proname => 'pg_stat_get_xact_tuples_hot_updated', provolatile => 'v',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_xact_tuples_hot_updated' },
+{ oid => '8615',
+  descr => 'statistics: number of tuples updated onto a new page in current transaction',
+  proname => 'pg_stat_get_xact_tuples_newpage_updated', provolatile => 'v',
+  proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_xact_tuples_newpage_updated' },
 { oid => '3044',
   descr => 'statistics: number of blocks fetched in current transaction',
   proname => 'pg_stat_get_xact_blocks_fetched', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1e418b682..46d053422 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -151,8 +151,8 @@ typedef struct PgStat_BackendSubEntry
  * the index AM, while tuples_fetched is the number of tuples successfully
  * fetched by heap_fetch under the control of simple indexscans for this index.
  *
- * tuples_inserted/updated/deleted/hot_updated count attempted actions,
- * regardless of whether the transaction committed.  delta_live_tuples,
+ * tuples_inserted/updated/deleted/hot_updated/newpage_updated count attempted
+ * actions, regardless of whether the transaction committed.  delta_live_tuples,
  * delta_dead_tuples, and changed_tuples are set depending on commit or abort.
  * Note that delta_live_tuples and delta_dead_tuples can be negative!
  * --
@@ -168,6 +168,7 @@ typedef struct PgStat_TableCounts
 	PgStat_Counter t_tuples_updated;
 	PgStat_Counter t_tuples_deleted;
 	PgStat_Counter t_tuples_hot_updated;
+	PgStat_Counter t_tuples_newpage_updated;
 	bool		t_truncdropped;
 
 	PgStat_Counter t_delta_live_tuples;
@@ -234,7 +235,7 @@ typedef struct PgStat_TableXactStatus
  * 

Re: Add pg_walinspect function with block info columns

2023-03-22 Thread Peter Geoghegan
On Tue, Mar 21, 2023 at 11:33 PM Michael Paquier  wrote:
> > The new pg_get_wal_block_info outputs columns in an order that doesn't
> > seem like the most useful possible order to me. This gives us another
> > reason to have separate GetWALRecordInfo and GetWALBlockInfo utility
> > functions rather than sharing logic for building output tuples.
> >
> > Specifically, I think that pg_get_wal_block_info should ouput the
> > "primary key" columns first:
> >
> > reltablespace, reldatabase, relfilenode, blockid, start_lsn, end_lsn
>
> It seems to me that this is up to the one using this SQL?

If you see it that way, then why does it matter what I may want to do
with the declared order?

> I am not
> sure to follow why this is important.  For the cases you have poked
> at, I guess it is, but is strikes me that it is just natural to shape
> that to  match the C structures we use for the WAL records
> themselves, so the other way around.

I don't feel very strongly about it, but it seems better to highlight
the difference that exists between this and pg_get_wal_records_info.

> Hence, it seems to me that 0002 has the order pretty much right.
> What's the point in adding the description, by the way?  Only
> consistency with the other function?  Is that really useful if you
> want to apply more quals when retrieving some block data?

I don't understand. It's useful to include the description for the
same reason as it's useful to include it in pg_get_wal_records_info.
Why wouldn't it be useful?

Most individual records that have any block_ref blocks have exactly
one. Most individual WAL records are very simple record types. So
pg_get_wal_block_info just isn't going to look that different to
pg_get_wal_records_info, once they share most of the same columns. The
way that pg_get_wal_block_info disaggregates on block won't make the
output look all that different. So each distinct "description" will
usually only appear once in pg_get_wal_block_info anyway.

-- 
Peter Geoghegan




Re: Remove nonmeaningful prefixes in PgStat_* fields

2023-03-22 Thread Michael Paquier
On Wed, Mar 22, 2023 at 02:52:23PM -0400, Melanie Plageman wrote:
> This comment still has the t_ prefix as does the one for tuples_updated
> and deleted.
> 
> otherwise, LGTM.

Good catch.  pgstat_count_heap_update() has a t_tuples_hot_updated,
and pgstat_update_heap_dead_tuples() a t_delta_dead_tuples on top of
the three you have just reported.

I have grepped for all the fields renamed, and nothing else stands
out.  However, my eyes don't have a 100% accuracy, either.
--
Michael


signature.asc
Description: PGP signature


Re: Add pg_walinspect function with block info columns

2023-03-22 Thread Peter Geoghegan
On Wed, Mar 22, 2023 at 8:35 AM Melanie Plageman
 wrote:
> After reading what you said, I was interested to see how substantial the
> I/O cost with non-compressed FPI would be.
>
> Using a patch with a parameter to pg_get_wal_block_info() to skip
> outputting FPI, I found that on a fast local nvme ssd, the timing
> difference between doing so and not still isn't huge -- 9 seconds when
> outputting the FPI vs 8.5 seconds when skipping outputting FPI. (with
> ~50,000 records all with non-compressed FPIs).
>
> However, perhaps obviously, the I/O cost is worse.
> Doing nothing but
>
>   SELECT *  FROM pg_get_wal_block_info(:start_lsn, :end_lsn, true)
> where fpi is not null;
>
> per iostat, the write latency was double for the query which output fpi
> from the one that didn't and the wkB/s was much higher.

I think that we should also have something like the patch that you
wrote to skip FPIs. It's not something that I feel as strongly about
as the main point about including all the fields from
pg_get_wal_records_info. but it does seem worth doing.

> I have had use for block info without seeing the FPIs, personally.

I'd go further than that myself: I haven't had any use for FPIs at
all. If I was going to do something with FPIs then I'd just use
pg_waldump, since I'd likely want to get them onto the filesystem for
analysis anyway. (Just my experience.)

-- 
Peter Geoghegan




Re: Options to rowwise persist result of stable/immutable function with RECORD result

2023-03-22 Thread David G. Johnston
On Wed, Mar 22, 2023 at 4:46 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Wed, Mar 22, 2023 at 4:32 PM Eske Rahn  wrote:
>
>> Hi,
>>
>> Thanks for the quick answer *:-D*
>>
>> That was a nice sideeffect of lateral.
>>
>> In the example, the calling code also gets simplified:
>>
>> WITH x AS (
>>   SELECT clock_timestamp() rowstart, *, clock_timestamp() rowend FROM (
>> SELECT '1' inp UNION
>> SELECT '2'
>>   ) y,  LATERAL septima.foo(inp) g
>> )
>> SELECT * FROM x;
>>
>>
>> That solved the issue at hand, in a much better way. Thanks
>>
>> Though I still fail to see *why* the other way should generally call the
>> function for every column in the *result* record - if the function is
>> STABLE or IMMUTABLE.
>>
>
> It gets rewritten to be effectively:
>
> select func_call(...).col1, func_call(...).col2, func_call(...).col3
>
> under the assumption that repeating the function call will be cheap and
> side-effect free.  It was never ideal but fixing that form of optimization
> was harder than implementing LATERAL where the multi-column result has a
> natural output in the form of a multi-column table.  A normal function call
> in the target list really means "return a single value" which is at odds
> with writing .* after it.
>
>
Actually, it is less "optimization" and more "SQL is strongly typed and all
columns must be defined during query compilation".

David J.


Re: Options to rowwise persist result of stable/immutable function with RECORD result

2023-03-22 Thread David G. Johnston
On Wed, Mar 22, 2023 at 4:32 PM Eske Rahn  wrote:

> Hi,
>
> Thanks for the quick answer *:-D*
>
> That was a nice sideeffect of lateral.
>
> In the example, the calling code also gets simplified:
>
> WITH x AS (
>   SELECT clock_timestamp() rowstart, *, clock_timestamp() rowend FROM (
> SELECT '1' inp UNION
> SELECT '2'
>   ) y,  LATERAL septima.foo(inp) g
> )
> SELECT * FROM x;
>
>
> That solved the issue at hand, in a much better way. Thanks
>
> Though I still fail to see *why* the other way should generally call the
> function for every column in the *result* record - if the function is
> STABLE or IMMUTABLE.
>

It gets rewritten to be effectively:

select func_call(...).col1, func_call(...).col2, func_call(...).col3

under the assumption that repeating the function call will be cheap and
side-effect free.  It was never ideal but fixing that form of optimization
was harder than implementing LATERAL where the multi-column result has a
natural output in the form of a multi-column table.  A normal function call
in the target list really means "return a single value" which is at odds
with writing .* after it.

David J.


Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

2023-03-22 Thread Michael Paquier
On Wed, Mar 22, 2023 at 02:21:12PM -0400, Melanie Plageman wrote:
> Apologies as I know this docs update has already been committed, but
> buffers fetched and blocks fetched both feel weird to me. If you have a
> cache hit, you don't end up really "fetching" anything at all (since
> pgstat_count_buffer_read() is called before ReadBuffer_common() and we
> don't know if it is a hit or miss yet). And, I would normally associate
> fetching with fetching a block into a buffer. It seems like this counter
> is really reflecting the number of buffers acquired or used.

Well, it is the number of times we've requested a block read, though
it may not actually be a read if the block was in the cache already.

> This isn't really the fault of this patch since that member was already
> called blocks_fetched.

The original documentation of these functions added by 46aa77c refers
to "block fetch requests" and "block requests found in cache", so that
would not be right either based on your opinion here.  If you find
"fetch" to be incorrect in this context, here is another idea:
- "block read requests" for blocks_fetched().
- "block read requested but actually found in cache" for blocks_hit().

All the system views care only about the difference between both
counters to count the number of physical reads really done.
--
Michael


signature.asc
Description: PGP signature


Re: Improve logging when using Huge Pages

2023-03-22 Thread Michael Paquier
On Wed, Mar 22, 2023 at 05:18:28PM -0500, Justin Pryzby wrote:
> Wow, good point.  I think to make it work we'd need put
> huge_pages_active into BackendParameters and handle it in
> save_backend_variables().  If so, that'd be strong argument for using a
> GUC, which already has all the necessary infrastructure for exposing the
> server's state.

I am afraid so, duplicating an existing infrastructure for a need like
the one of this thread is not really appealing.
--
Michael


signature.asc
Description: PGP signature


Re: Improve logging when using Huge Pages

2023-03-22 Thread Justin Pryzby
On Wed, Mar 22, 2023 at 04:37:01PM +0900, Michael Paquier wrote:
> I have noted something..  For the WIN32 case, we have that:
> 
> +++ b/src/backend/port/win32_shmem.c
> @@ -327,6 +327,8 @@ retry:
> Sleep(1000);
> continue;
> }
> +
> +   huge_pages_active = ((flProtect & SEC_LARGE_PAGES) != 0);
> break;
> 
> Are you sure that this is correct?  This is set in
> PGSharedMemoryCreate(), part of CreateSharedMemoryAndSemaphores() in
> the startup sequence that creates the shmem segment.  However, for a
> normal backend created by EXEC_BACKEND, SubPostmasterMain() reattaches
> to an existing shared memory segment, so we don't go through the
> creation path that would set huge_pages_active for the process just
> started, (note that InitPostmasterChild() switches IsUnderPostmaster,
> bypassing the shmem segment creation).

Wow, good point.  I think to make it work we'd need put
huge_pages_active into BackendParameters and handle it in
save_backend_variables().  If so, that'd be strong argument for using a
GUC, which already has all the necessary infrastructure for exposing the
server's state.

-- 
Justin




Re: HOT chain validation in verify_heapam()

2023-03-22 Thread Andres Freund
Hi,

On 2023-03-22 14:56:22 -0700, Andres Freund wrote:
> A patch addressing some, but not all, of those is attached. With that I don't
> see any crashes or false-positives anymore.

That patch missed that, as committed, the first if (ItemIdIsRedirected())
check sets lp_valid[n] = true even if the target of the redirect is unused.

With that fixed, 004_verify_heapam doesn't cause crash anymore - it doesn't
pass though, because there's a bunch of unadjusted error messages.

Andres
diff --git i/contrib/amcheck/verify_heapam.c w/contrib/amcheck/verify_heapam.c
index 663fb65dee6..d0c462e28ea 100644
--- i/contrib/amcheck/verify_heapam.c
+++ w/contrib/amcheck/verify_heapam.c
@@ -486,14 +486,24 @@ verify_heapam(PG_FUNCTION_ARGS)
 	report_corruption(,
 	  psprintf("line pointer redirection to unused item at offset %u",
 			   (unsigned) rdoffnum));
-
-/*
- * Record the fact that this line pointer has passed basic
- * sanity checking, and also the offset number to which it
- * points.
- */
-lp_valid[ctx.offnum] = true;
-successor[ctx.offnum] = rdoffnum;
+else if (ItemIdIsDead(rditem))
+	report_corruption(,
+	  psprintf("line pointer redirection to dead item at offset %u",
+			   (unsigned) rdoffnum));
+else if (ItemIdIsRedirected(rditem))
+	report_corruption(,
+	  psprintf("line pointer redirection to another redirect at offset %u",
+			   (unsigned) rdoffnum));
+else
+{
+	/*
+	 * Record the fact that this line pointer has passed basic
+	 * sanity checking, and also the offset number to which it
+	 * points.
+	 */
+	lp_valid[ctx.offnum] = true;
+	successor[ctx.offnum] = rdoffnum;
+}
 continue;
 			}
 
@@ -564,16 +574,19 @@ verify_heapam(PG_FUNCTION_ARGS)
 			TransactionId next_xmin;
 			OffsetNumber nextoffnum = successor[ctx.offnum];
 
+			/* the current line pointer may not have a successor */
+			if (nextoffnum == InvalidOffsetNumber)
+continue;
+
 			/*
-			 * The current line pointer may not have a successor, either
-			 * because it's not valid or because it didn't point to anything.
-			 * In either case, we have to give up.
-			 *
-			 * If the current line pointer does point to something, it's
-			 * possible that the target line pointer isn't valid. We have to
-			 * give up in that case, too.
+			 * The successor is located beyond the end of the line item array,
+			 * which can happen when the array is truncated.
 			 */
-			if (nextoffnum == InvalidOffsetNumber || !lp_valid[nextoffnum])
+			if (nextoffnum > maxoff)
+continue;
+
+			/* the successor is not valid, have to give up */
+			if (!lp_valid[nextoffnum])
 continue;
 
 			/* We have two valid line pointers that we can examine. */
@@ -583,14 +596,8 @@ verify_heapam(PG_FUNCTION_ARGS)
 			/* Handle the cases where the current line pointer is a redirect. */
 			if (ItemIdIsRedirected(curr_lp))
 			{
-/* Can't redirect to another redirect. */
-if (ItemIdIsRedirected(next_lp))
-{
-	report_corruption(,
-	  psprintf("redirected line pointer points to another redirected line pointer at offset %u",
-			   (unsigned) nextoffnum));
-	continue;
-}
+/* should have filtered out all the other cases */
+Assert(ItemIdIsNormal(next_lp));
 
 /* Can only redirect to a HOT tuple. */
 next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp);
@@ -602,8 +609,12 @@ verify_heapam(PG_FUNCTION_ARGS)
 }
 
 /*
- * Redirects are created by updates, so successor should be
- * the result of an update.
+ * Redirects are created by HOT updates, so successor should
+ * be the result of an HOT update.
+ *
+ * XXX: HeapTupleHeaderIsHeapOnly() should always imply
+ * HEAP_UPDATED. This should be checked even when the tuple
+ * isn't a target of a redirect.
  */
 if ((next_htup->t_infomask & HEAP_UPDATED) == 0)
 {
@@ -665,15 +676,18 @@ verify_heapam(PG_FUNCTION_ARGS)
 			 * tuple should be marked as a heap-only tuple. Conversely, if the
 			 * current tuple isn't marked as HOT-updated, then the next tuple
 			 * shouldn't be marked as a heap-only tuple.
+			 *
+			 * NB: Can't use HeapTupleHeaderIsHotUpdated() as it checks if
+			 * hint bits indicate xmin/xmax aborted.
 			 */
-			if (!HeapTupleHeaderIsHotUpdated(curr_htup) &&
+			if (!(curr_htup->t_infomask2 & HEAP_HOT_UPDATED) &&
 HeapTupleHeaderIsHeapOnly(next_htup))
 			{
 report_corruption(,
   psprintf("non-heap-only update produced a heap-only tuple at offset %u",
 		   (unsigned) nextoffnum));
 			}
-			if (HeapTupleHeaderIsHotUpdated(curr_htup) &&
+			if ((curr_htup->t_infomask2 & HEAP_HOT_UPDATED) &&
 !HeapTupleHeaderIsHeapOnly(next_htup))
 			{
 report_corruption(,


Re: HOT chain validation in verify_heapam()

2023-03-22 Thread Andres Freund
Hi,

On 2023-03-22 13:45:52 -0700, Andres Freund wrote:
> On 2023-03-22 09:19:18 -0400, Robert Haas wrote:
> > On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev
> >  wrote:
> > > The patch needed a rebase due to a4f23f9b. PFA v12.
> >
> > I have committed this after tidying up a bunch of things in the test
> > case file that I found too difficult to understand -- or in some cases
> > just incorrect, like:
>
> As noticed by Andrew
> https://postgr.es/m/bfa5bd2b-c0e6-9d65-62ce-97f4766b1c42%40dunslane.net and
> then reproduced on HEAD by me, there's something not right here.
>
> At the very least there's missing verification that tids actually exists in 
> the
> "Update chain validation" loop, leading to:
> TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: 
> "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: 
> 355, PID: 645093
>
> Which makes sense - the earlier loop adds t_ctid to the successor array, which
> we then query without checking if there still is such a tid on the page.

It's not quite so simple - I see now that the lp_valid check tries to prevent
that. However, it's not sufficient, because there is no guarantee that
lp_valid[nextoffnum] is initialized. Consider what happens if t_ctid of a heap
tuple points to beyond the end of the item array - lp_valid[nextoffnum] won't
be initialized.

Why are redirections now checked in two places? There already was a
ItemIdIsUsed() check in the "/* Perform tuple checks */" loop, but now there's
the ItemIdIsRedirected() check in the "Update chain validation." loop as well
- and the output of that is confusing, because it'll just mention the target
of the redirect.


I also think it's not quite right that some of checks inside if
(ItemIdIsRedirected()) continue in case of corruption, others don't. While
there's a later continue, that means the corrupt tuples get added to the
predecessor array.  Similarly, in the non-redirect portion, the successor
array gets filled with corrupt tuples, which doesn't seem quite right to me.


A patch addressing some, but not all, of those is attached. With that I don't
see any crashes or false-positives anymore.

Greetings,

Andres Freund
diff --git i/contrib/amcheck/verify_heapam.c w/contrib/amcheck/verify_heapam.c
index 663fb65dee6..43f93f7784a 100644
--- i/contrib/amcheck/verify_heapam.c
+++ w/contrib/amcheck/verify_heapam.c
@@ -486,6 +486,14 @@ verify_heapam(PG_FUNCTION_ARGS)
 	report_corruption(,
 	  psprintf("line pointer redirection to unused item at offset %u",
 			   (unsigned) rdoffnum));
+else if (ItemIdIsDead(rditem))
+	report_corruption(,
+	  psprintf("line pointer redirection to dead item at offset %u",
+			   (unsigned) rdoffnum));
+else if (ItemIdIsRedirected(rditem))
+	report_corruption(,
+	  psprintf("line pointer redirection to another redirect at offset %u",
+			   (unsigned) rdoffnum));
 
 /*
  * Record the fact that this line pointer has passed basic
@@ -564,16 +572,19 @@ verify_heapam(PG_FUNCTION_ARGS)
 			TransactionId next_xmin;
 			OffsetNumber nextoffnum = successor[ctx.offnum];
 
+			/* the current line pointer may not have a successor */
+			if (nextoffnum == InvalidOffsetNumber)
+continue;
+
 			/*
-			 * The current line pointer may not have a successor, either
-			 * because it's not valid or because it didn't point to anything.
-			 * In either case, we have to give up.
-			 *
-			 * If the current line pointer does point to something, it's
-			 * possible that the target line pointer isn't valid. We have to
-			 * give up in that case, too.
+			 * The successor is located beyond the end of the line item array,
+			 * which can happen when the array is truncated.
 			 */
-			if (nextoffnum == InvalidOffsetNumber || !lp_valid[nextoffnum])
+			if (nextoffnum > maxoff)
+continue;
+
+			/* the successor is not valid, have to give up */
+			if (!lp_valid[nextoffnum])
 continue;
 
 			/* We have two valid line pointers that we can examine. */
@@ -583,14 +594,8 @@ verify_heapam(PG_FUNCTION_ARGS)
 			/* Handle the cases where the current line pointer is a redirect. */
 			if (ItemIdIsRedirected(curr_lp))
 			{
-/* Can't redirect to another redirect. */
-if (ItemIdIsRedirected(next_lp))
-{
-	report_corruption(,
-	  psprintf("redirected line pointer points to another redirected line pointer at offset %u",
-			   (unsigned) nextoffnum));
-	continue;
-}
+/* should have filtered out all the other cases */
+Assert(ItemIdIsNormal(next_lp));
 
 /* Can only redirect to a HOT tuple. */
 next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp);
@@ -602,8 +607,12 @@ verify_heapam(PG_FUNCTION_ARGS)
 }
 
 /*
- * Redirects are created by updates, so successor should be
- * the result of an update.
+ * Redirects are created by HOT updates, so successor should
+ * be the result of an HOT update.
+ *
+ * XXX: 

Re: Options to rowwise persist result of stable/immutable function with RECORD result

2023-03-22 Thread David G. Johnston
On Tuesday, March 21, 2023, Eske Rahn  wrote:

> Hi,
>
> I have noticed a rather odd behaviour that is not strictly a bug, but is
> unexpected.
>
> It is when a immutable (or stable) PG function is returning results in a
> record structure a select on these calls the function repeatedly for each
> element in the output record.
>

The LATERAL join modifier exists to handle this kind of situation.

David J.


Re: HOT chain validation in verify_heapam()

2023-03-22 Thread Peter Geoghegan
On Wed, Mar 22, 2023 at 2:14 PM Peter Geoghegan  wrote:
> > Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is 
> > set
> > and expects to find an item it can dereference - but I don't think that's
> > something we can rely on: Afaics HOT pruning can break chains, but doesn't
> > reset xmax.
>
> I think that we need two passes to be completely thorough. An initial
> pass, that works pretty much as-is, plus a second pass that locates
> any orphaned heap-only tuples -- heap-only tuples that were not deemed
> part of a valid HOT chain during the first pass. These remaining
> orphaned heap-only tuples should be verified as having come from
> now-aborted transactions (they should definitely be fully DEAD) --
> otherwise we have corruption.

I see that there is a second pass over the heap page in
verify_heapam(), in fact. Kind of. I'm referring to this loop:

/*
 * An update chain can start either with a non-heap-only tuple or with
 * a redirect line pointer, but not with a heap-only tuple.
 *
 * (This check is in a separate loop because we need the predecessor
 * array to be fully populated before we can perform it.)
 */
for (ctx.offnum = FirstOffsetNumber;
 ctx.offnum <= maxoff;
 ctx.offnum = OffsetNumberNext(ctx.offnum))
{
if (xmin_commit_status_ok[ctx.offnum] &&
(xmin_commit_status[ctx.offnum] == XID_COMMITTED ||
 xmin_commit_status[ctx.offnum] == XID_IN_PROGRESS) &&
predecessor[ctx.offnum] == InvalidOffsetNumber)
{
ItemId  curr_lp;

curr_lp = PageGetItemId(ctx.page, ctx.offnum);
if (!ItemIdIsRedirected(curr_lp))
{
HeapTupleHeader curr_htup;

curr_htup = (HeapTupleHeader)
PageGetItem(ctx.page, curr_lp);
if (HeapTupleHeaderIsHeapOnly(curr_htup))
report_corruption(,
  psprintf("tuple is root of
chain but is marked as heap-only tuple"));
}
}
}


However, this "second pass over page" loop has roughly the same
problem as the nearby HeapTupleHeaderIsHotUpdated() coding pattern: it
doesn't account for the fact that a tuple whose xmin was
XID_IN_PROGRESS a little earlier on may not be in that state once we
reach the second pass loop. Concurrent transaction abort needs to be
accounted for. The loop needs to recheck xmin status, at least in the
initially-XID_IN_PROGRESS-xmin case.

--
Peter Geoghegan




Re: [BUG] pg_stat_statements and extended query protocol

2023-03-22 Thread Imseih (AWS), Sami
> What about using an uint64 for calls? That seems more appropriate to me (even 
> if
> queryDesc->totaltime->calls will be passed (which is int64), but that's 
> already
> also the case for the "rows" argument and 
> queryDesc->totaltime->rows_processed)

That's fair


> I'm not sure it's worth mentioning that the new counters are "currently" used 
> with the ExecutorRun.

Sure, I suppose these fields could be used outside of ExecutorRun. Good point.


> Also, I wonder if "rows" (and not rows_processed) would not be a better 
> naming.

Agree.

I went with rows_processed initially, since it was accumulating es_processed,
but as the previous point, this instrumentation could be used outside of
ExecutorRun.

v3 addresses the comments.


Regards,


--
Sami Imseih
Amazon Web Services (AWS)





v3-0001-Correct-accumulation-of-counters-for-extended-query-.patch
Description: v3-0001-Correct-accumulation-of-counters-for-extended-query-.patch


Options to rowwise persist result of stable/immutable function with RECORD result

2023-03-22 Thread Eske Rahn
Hi,

I have noticed a rather odd behaviour that is not strictly a bug, but is
unexpected.

It is when a immutable (or stable) PG function is returning results in a
record structure a select on these calls the function repeatedly for each
element in the output record.

See below for an example.

Sure I can work around this by returning in an array, or materialised as a
whole by e.g. a materialised CTE, but what I'm looking for is *materialising
of just the individual row *during processing, if the function is to be
called on many rows.

Obviously in theory the returned record could be very complex, so we might
not want it materialised in general, but an option to do so would be nice.
I would suggest that a WITH could be marked with a new "MATERIALIZED *ROW*"
option (reusing already reserved keywords).

Note how I below have set the cost extreme, in this test, the value does
not affect the behaviour..

The result set here have five elements, if i change the type to VOLATILE,
the execution time is reduced by a factor of five (see the difference
between the stamp of line one and two). It is directly proportional to the
number of elements requested from the record (here I requested all)

(The real life scenario is a function that by a list of reg_ex expessions,
splits up the input in numerous fields, And I noticed the behaviour as a
raise function added for debug, put out the same repeatedly.)

-

DROP TYPE IF EXISTS septima.foo_type CASCADE;
CREATE TYPE septima.foo_type AS (a text, b text, c text, d text, e text);
DROP FUNCTION IF EXISTS septima.foo(text);
CREATE OR REPLACE FUNCTION septima.foo(inp text) RETURNS septima.foo_type
AS
$BODY$
DECLARE
  result_record septima.foo_type;
  i BIGINT :=12345678;
BEGIN
  WHILE 0https://septima.dk


Re: Initial Schema Sync for Logical Replication

2023-03-22 Thread Euler Taveira
On Tue, Mar 21, 2023, at 8:18 AM, Amit Kapila wrote:
> Now, how do we avoid these problems even if we have our own version of
> functionality similar to pg_dump for selected objects? I guess we will
> face similar problems. If so, we may need to deny schema sync in any
> such case.
There are 2 approaches for initial DDL synchronization:

1) generate the DDL command on the publisher, stream it and apply it as-is on
the subscriber;
2) generate a DDL representation (JSON, for example) on the publisher, stream
it, transform it into a DDL command on subscriber and apply it.

The option (1) is simpler and faster than option (2) because it does not
require an additional step (transformation). However, option (2) is more
flexible than option (1) because it allow you to create a DDL command even if a
feature was removed from the subscriber and the publisher version is less than
the subscriber version or a feature was added to the publisher and the
publisher version is greater than the subscriber version. Of course there are
exceptions and it should forbid the transformation (in this case, it can be
controlled by the protocol version -- LOGICALREP_PROTO_FOOBAR_VERSION_NUM). A
decision must be made: simple/restrict vs complex/flexible.

One of the main use cases for logical replication is migration (X -> Y where X
< Y). Postgres generally does not remove features but it might happen (such as
WITH OIDS syntax) and it would break the DDL replication (option 1). In the
downgrade case (X -> Y where X > Y), it might break the DDL replication if a
new syntax is introduced in X. Having said that, IMO option (1) is fragile if
we want to support DDL replication between different Postgres versions. It
might eventually work but there is no guarantee.

Per discussion [1], I think if we agree that the Alvaro's DDL deparse patch is
the way to go with DDL replication, it seems wise that it should be used for
initial DDL synchronization as well.


[1] 
https://www.postgresql.org/message-id/CAA4eK1%2Bw_dFytBiv3RxbOL76_noMzmX0QGTc8uS%3Dbc2WaPVoow%40mail.gmail.com



--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: HOT chain validation in verify_heapam()

2023-03-22 Thread Peter Geoghegan
On Wed, Mar 22, 2023 at 1:45 PM Andres Freund  wrote:
> At the very least there's missing verification that tids actually exists in 
> the
> "Update chain validation" loop, leading to:
> TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: 
> "../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: 
> 355, PID: 645093
>
> Which makes sense - the earlier loop adds t_ctid to the successor array, which
> we then query without checking if there still is such a tid on the page.
>
> I suspect we don't just need a !ItemIdIsUsed(), but also a check gainst the
> max offset on the page.

We definitely need to do it that way, since a heap-only tuple's t_ctid
is allowed to point to almost anything. I guess it can't point to some
completely different heap block, but that's about the only
restriction. In particular, it can point to an item that's past the
end of the page following line pointer array truncation (truncation
can happen during pruning or when the second heap pass takes place in
VACUUM).

OTOH the rules for LP_REDIRECT items *are* very strict. They need to
be, since it's the root item of the HOT chain, referenced by TIDs in
indexes, and have no heap tuple header metadata to use in cross-checks
that take place during HOT chain traversal (traversal by code in
places such as heap_hot_search_buffer).

> WRT these failures:
> non-heap-only update produced a heap-only tuple at offset 20
>
> I think that's largely a consequence of HeapTupleHeaderIsHotUpdated()'s
> definition:

That has to be a problem for verify_heapam.

> Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is set
> and expects to find an item it can dereference - but I don't think that's
> something we can rely on: Afaics HOT pruning can break chains, but doesn't
> reset xmax.

I think that we need two passes to be completely thorough. An initial
pass, that works pretty much as-is, plus a second pass that locates
any orphaned heap-only tuples -- heap-only tuples that were not deemed
part of a valid HOT chain during the first pass. These remaining
orphaned heap-only tuples should be verified as having come from
now-aborted transactions (they should definitely be fully DEAD) --
otherwise we have corruption.

That's what my abandoned patch to make heap pruning more robust did,
you'll recall.

-- 
Peter Geoghegan




Re: HOT chain validation in verify_heapam()

2023-03-22 Thread Andres Freund
Hi,

On 2023-03-22 09:19:18 -0400, Robert Haas wrote:
> On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev
>  wrote:
> > The patch needed a rebase due to a4f23f9b. PFA v12.
>
> I have committed this after tidying up a bunch of things in the test
> case file that I found too difficult to understand -- or in some cases
> just incorrect, like:

As noticed by Andrew
https://postgr.es/m/bfa5bd2b-c0e6-9d65-62ce-97f4766b1c42%40dunslane.net and
then reproduced on HEAD by me, there's something not right here.

At the very least there's missing verification that tids actually exists in the
"Update chain validation" loop, leading to:
TRAP: failed Assert("ItemIdHasStorage(itemId)"), File: 
"../../../../home/andres/src/postgresql/src/include/storage/bufpage.h", Line: 
355, PID: 645093

Which makes sense - the earlier loop adds t_ctid to the successor array, which
we then query without checking if there still is such a tid on the page.

I suspect we don't just need a !ItemIdIsUsed(), but also a check gainst the
max offset on the page.

WRT these failures:
non-heap-only update produced a heap-only tuple at offset 20

I think that's largely a consequence of HeapTupleHeaderIsHotUpdated()'s
definition:
/*
 * Note that we stop considering a tuple HOT-updated as soon as it is known
 * aborted or the would-be updating transaction is known aborted.  For best
 * efficiency, check tuple visibility before using this macro, so that the
 * INVALID bits will be as up to date as possible.
 */
#define HeapTupleHeaderIsHotUpdated(tup) \
( \
((tup)->t_infomask2 & HEAP_HOT_UPDATED) != 0 && \
((tup)->t_infomask & HEAP_XMAX_INVALID) == 0 && \
!HeapTupleHeaderXminInvalid(tup) \
)


Currently the new verify_heapam() follows ctid chains when XMAX_INVALID is set
and expects to find an item it can dereference - but I don't think that's
something we can rely on: Afaics HOT pruning can break chains, but doesn't
reset xmax.

Greetings,

Andres Freund




Re: Set arbitrary GUC options during initdb

2023-03-22 Thread Tom Lane
Andres Freund  writes:
> This commit unfortunately broke --wal-segsize. If I use a slightly larger than
> the default setting, I get:
> initdb --wal-segsize 64 somepath
> running bootstrap script ... 2023-03-22 13:06:41.282 PDT [639848] FATAL:  
> "min_wal_size" must be at least twice "wal_segment_size"

[ confused... ]  Oh, I see the problem.  This:

/* set default max_wal_size and min_wal_size */
snprintf(repltok, sizeof(repltok), "min_wal_size = %s",
 pretty_wal_size(DEFAULT_MIN_WAL_SEGS));
conflines = replace_token(conflines, "#min_wal_size = 80MB", repltok);

looks like it's setting a compile-time-constant value of min_wal_size;
at least that's what I thought it was doing when I revised the code.
But it isn't, because somebody had the brilliant idea of making
pretty_wal_size() depend on the wal_segment_size_mb variable.

Will fix, thanks for report.

regards, tom lane




Re: Set arbitrary GUC options during initdb

2023-03-22 Thread Andres Freund
Hi,

This commit unfortunately broke --wal-segsize. If I use a slightly larger than
the default setting, I get:

initdb --wal-segsize 64 somepath
running bootstrap script ... 2023-03-22 13:06:41.282 PDT [639848] FATAL:  
"min_wal_size" must be at least twice "wal_segment_size"

Greetings,

Andres Freund




Re: Non-superuser subscription owners

2023-03-22 Thread Jeff Davis
On Wed, 2023-03-22 at 12:16 -0400, Robert Haas wrote:
> If nobody's too unhappy with the idea, I plan to commit this soon,
> both because I think that the feature is useful, and also because I
> think it's an important security improvement.

Is there any chance I can convince you to separate the privileges of
using a connection string and creating a subscription, as I
suggested[1] earlier?

It would be useful for dblink, and I also plan to propose CREATE
SUBSCRIPTION ... SERVER for v17 (it was too late for 16), for which it
would also be useful to make the distinction.

You seemed to generally think it was a reasonable idea, but wanted to
wait for the other patch. I think it's the right breakdown of
privileges even now, and I don't see a reason to give ourselves a
headache later trying to split up the privileges later.

Regards,
Jeff Davis

[1]
https://www.postgresql.org/message-id/fa1190c117c2455f2dd968a1a09f796ccef27b29.ca...@j-davis.com




Re: Can we avoid chdir'ing in resolve_symlinks() ?

2023-03-22 Thread Tom Lane
Peter Eisentraut  writes:
> On 05.10.22 15:59, Tom Lane wrote:
>> What did you think of the compromise proposal to change only
>> the paths that pg_config outputs?  I've not tried to code that,
>> but I think it should be feasible.

> I don't think I understand what this proposal actually means.  What 
> would be the behavior of pg_config and how would it be different from 
> before?

What I had in mind was:

* server and most frontend programs keep the same behavior, that is
fully resolve their executable's path to an absolute path (and then
navigate to the rest of the installation from there); but now they'll
use realpath() to avoid chdir'ing while they do that.

* pg_config applies realpath() if its initial PATH search produces a
relative path to the executable, or if the last component of that path
is a symlink.  Otherwise leave it alone, which would have the effect of
not expanding directory-level symlinks.

I think that changing pg_config's behavior would be enough to resolve
the complaints you listed, but perhaps I'm missing some fine points.

regards, tom lane




Re: On login trigger: take three

2023-03-22 Thread Daniel Gustafsson
> On 22 Mar 2023, at 18:54, Robert Haas  wrote:

> Basically, I think 0001 is a good idea -- I'm much more nervous about
> 0002. I think we should get 0001 polished up and committed.

Correct me if I'm wrong, but I believe you commented on v27-0001 of the login
event trigger patch series?  Sorry about the confusion if so, this is a very
old and lengthy thread with many twists and turns.  This series was closed
downthread when the original authors of login EVT left, and the 0001 GUC patch
extracted into its own thread.  That patch now lives at:

https://commitfest.postgresql.org/42/4013/

This thread was then later revived by Mikhail Gribkov but without 0001 instead
referring to the above patch for that part.

The new patch for 0001 is quite different, and I welcome your eyes on that
since I agree with you that it would be a good idea.

--
Daniel Gustafsson





Re: HOT chain validation in verify_heapam()

2023-03-22 Thread Tom Lane
Robert Haas  writes:
> I have committed this after tidying up a bunch of things in the test
> case file that I found too difficult to understand -- or in some cases
> just incorrect, like:

My animal mamba doesn't like this one bit.

I suspect the reason is that it's big-endian (PPC) and the endianness
hacking in the test is simply wrong:

syswrite($file,
 pack("L", $ENDIANNESS eq 'little' ? 0x00010019 : 0x19000100))
or BAIL_OUT("syswrite failed: $!");

pack's L code should already be performing an endianness swap, so why
are we doing another one in the argument?

regards, tom lane




Re: Request for comment on setting binary format output per session

2023-03-22 Thread Jeff Davis
On Wed, 2023-03-22 at 14:42 -0400, Tom Lane wrote:
> This isn't going to help much unless we change the wire protocol
> so that RowDescription messages carry these UUIDs instead of
> (or in addition to?) the OIDs of the column datatypes.  While
> that's not completely out of the question, it's a heavy lift
> that will affect multiple layers of client code along with the
> server.

I'm not sure that's a hard requirement. I pointed out a similar
solution for type names here:

https://www.postgresql.org/message-id/4297b9e310172b9a1e6d737e21ad8796d0ab7b03.ca...@j-davis.com

In other words: if the Bind message depends on knowing the OID
mappings, that forces an extra round-trip; but if the client doesn't
need the mapping until it receives its first result, then it can use
pipelining to avoid the extra round-trip.

(I haven't actually tried it and I don't know if it's very reasonable
to expect the client to do this.)

> Also, what about container types?  I doubt it's sane for
> array-of-foo to have a UUID that's unrelated to the one for foo.
> Composites and ranges would need some intelligence too if we
> don't want them to be unduly complicated to process.

That's a good point. I don't know if that is a major design issue or
not; but it certainly adds complexity to the proposal and/or clients
implementing it.

Regards,
Jeff Davis





Re: meson documentation build open issues

2023-03-22 Thread Andres Freund
Hi,

On 2023-03-20 10:32:49 -0700, Andres Freund wrote:
> On 2023-03-20 11:58:08 +0100, Peter Eisentraut wrote:
> > Oh, this patch set grew quite quickly. ;-)
> 
> Yep :)

Unless somebody sees a reason to wait, I am planning to commit:
  meson: add install-{quiet, world} targets
  meson: add install-{docs,doc-html,doc-man} targets
  meson: make install_test_files more generic, rename to install_files

While I don't think we have necessarily the path forward around .css and .svg,
the above are independent of that.


For the .svg: I wonder if we should just inline the images in the chunked
html, just like we do in the single page one. It's not like we reuse one image
across a lot of pages, so there's no bandwidth saved from having the images
separate...

For the .css: docbook-xsl actually has support for writing the .css: [1] - but
it requires the .css file be valid xml. I wonder if the cleanest approch would
be to have a build step to create .css.xml - then the non-chunked build's
generate.css.header would do the right thing.


I'll start a new thread for
  docs: speed up docs build by special-casing the gentext.template
  VERY WIP: parallel doc generation
after the feature freeze.


After looking into it a tiny bit more, it seems we should use neither pandoc
nor dbtoepub for epub generation.

All the dbtoepub does is to invoke the docbook-xsl support for epubs and zip
the result - except it doesn't use our stylesheets, so it looks randomly
different and doesn't use our speedups. At the very least we should use our
customizations, if we want epub support. Or we should just remove it.

Pandoc unfortunately doesn't do docbook well enough to be usable for now to
directly parse our docbook.

Regards,

Andres

[1] 
https://docbook.sourceforge.net/release/xsl/current/doc/html/custom.css.source.html




Re: Remove nonmeaningful prefixes in PgStat_* fields

2023-03-22 Thread Melanie Plageman
On Mon, Mar 20, 2023 at 10:05:21AM +0100, Drouvot, Bertrand wrote:
> Hi,
> 
> On 3/20/23 8:32 AM, Michael Paquier wrote:
> > 
> >  /* Total time previously charged to function, as of function start 
> > */
> > -   instr_time  save_f_total_time;
> > +   instr_time  save_total_time;
> > I have something to say about this one, though..  I find this change a
> > bit confusing.  It may be better kept as it is, or I think that we'd
> > better rename also "save_total" and "start" to reflect in a better way
> > what they do, because removing "f_" reduces the meaning of the field
> > with the two others in the same structure.
> 
> Thanks for looking at it!
> 
> Good point and keeping it as it is currently would not
> affect the work that is/will be done in [1].
> 
> So, please find attached V2 attached taking this comment into account.

> diff --git a/src/backend/utils/adt/pgstatfuncs.c 
> b/src/backend/utils/adt/pgstatfuncs.c
> index 35c6d46555..4f21fb2dc2 100644
> --- a/src/backend/utils/adt/pgstatfuncs.c
> +++ b/src/backend/utils/adt/pgstatfuncs.c
> @@ -1552,7 +1552,7 @@ pg_stat_get_xact_tuples_inserted(PG_FUNCTION_ARGS)
>   result = 0;
>   else
>   {
> - result = tabentry->t_counts.t_tuples_inserted;
> + result = tabentry->counts.tuples_inserted;

This comment still has the t_ prefix as does the one for tuples_updated
and deleted.

otherwise, LGTM.

>   /* live subtransactions' counts aren't in t_tuples_inserted yet 
> */
>   for (trans = tabentry->trans; trans != NULL; trans = 
> trans->upper)
>   result += trans->tuples_inserted;
> @@ -1573,7 +1573,7 @@ pg_stat_get_xact_tuples_updated(PG_FUNCTION_ARGS)
>   result = 0;
>   else
>   {
> - result = tabentry->t_counts.t_tuples_updated;
> + result = tabentry->counts.tuples_updated;
>   /* live subtransactions' counts aren't in t_tuples_updated yet 
> */
>   for (trans = tabentry->trans; trans != NULL; trans = 
> trans->upper)
>   result += trans->tuples_updated;
> @@ -1594,7 +1594,7 @@ pg_stat_get_xact_tuples_deleted(PG_FUNCTION_ARGS)
>   result = 0;
>   else
>   {
> - result = tabentry->t_counts.t_tuples_deleted;
> + result = tabentry->counts.tuples_deleted;
>   /* live subtransactions' counts aren't in t_tuples_deleted yet 
> */
>   for (trans = tabentry->trans; trans != NULL; trans = 
> trans->upper)
>   result += trans->tuples_deleted;
> @@ -1613,7 +1613,7 @@ pg_stat_get_xact_tuples_hot_updated(PG_FUNCTION_ARGS)
>   if ((tabentry = find_tabstat_entry(relid)) == NULL)
>   result = 0;
>   else
> - result = (int64) (tabentry->t_counts.t_tuples_hot_updated);
> + result = (int64) (tabentry->counts.tuples_hot_updated);
>  
>   PG_RETURN_INT64(result);
>  }

- Melanie




Re: Request for comment on setting binary format output per session

2023-03-22 Thread Tom Lane
Jeff Davis  writes:
> On Wed, 2023-03-22 at 10:21 +0100, Peter Eisentraut wrote:
>> I've been thinking that we need some new kind of identifier to allow 
>> clients to process types in more sophisticated ways.
>> For example, each type could be (self-)assigned a UUID, which is fixed 
>> for that type no matter in which schema or under what extension name or 
>> with what OID it is installed.  Client libraries could then hardcode 
>> that UUID for processing the types.  Conversely, the UUID could be 
>> changed if the wire format of the type is changed, without having to 
>> change the type name.

> That sounds reasonable to me. It could also be useful for other
> extension objects (or the extension itself) to avoid other kinds of
> weirdness from name collisions or major version updates or extensions
> that depend on other extensions.

This isn't going to help much unless we change the wire protocol
so that RowDescription messages carry these UUIDs instead of
(or in addition to?) the OIDs of the column datatypes.  While
that's not completely out of the question, it's a heavy lift
that will affect multiple layers of client code along with the
server.

Also, what about container types?  I doubt it's sane for
array-of-foo to have a UUID that's unrelated to the one for foo.
Composites and ranges would need some intelligence too if we
don't want them to be unduly complicated to process.

regards, tom lane




Re: Set arbitrary GUC options during initdb

2023-03-22 Thread Tom Lane
I wrote:
> Peter Eisentraut  writes:
>> I would remove the
>> #if DEF_PGPORT != 5432
>> This was in the previous code too, but now if we remove it, then we 
>> don't have any more hardcoded 5432 left, which seems like a nice 
>> improvement in cleanliness.

> Hm.  That'll waste a few cycles during initdb; not sure if the extra
> cleanliness is worth it.  It's not like that number is going to change.

After further thought I did it as you suggest.  I think the only case
where we really care about shaving milliseconds from initdb is in debug
builds (e.g. buildfarm), which very likely get built with nondefault
DEF_PGPORT anyway.

I did get a bee in my bonnet about how replace_token (and now
replace_guc_value) leak memory like there's no tomorrow.  The leakage
amounts to about a megabyte per run according to valgrind, and it's
not going anywhere but up as we add more calls of those functions.
So I made a quick change to redefine them in a less leak-prone way.

regards, tom lane




Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

2023-03-22 Thread Melanie Plageman
On Mon, Mar 20, 2023 at 6:58 AM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 3/20/23 12:43 AM, Michael Paquier wrote:
> > At the
> > end, documenting both still sounds like the best move to me.
>
> Agree.
>
> Please find attached 
> v1-0001-pg_stat_get_xact_blocks_fetched-and_hit-doc.patch doing so.
>
> I did not put the exact same wording as the one being removed in ddfc2d9, as:
>
> - For pg_stat_get_xact_blocks_hit(), I think it's better to be closer to say 
> the
> pg_statio_all_tables.heap_blks_hit definition.
>
> - For pg_stat_get_xact_blocks_fetched(), I think that using "buffer" is 
> better (than block) as at the
> end it's related to pgstat_count_buffer_read().
>
> At the end there is a choice to be made for both for the wording between 
> block and buffer. Indeed their
> counters get incremented in "buffer" macros while retrieved in those "blocks" 
> functions.
>
> "Buffer" sounds more appropriate to me, so the attached has been done that 
> way.

Apologies as I know this docs update has already been committed, but
buffers fetched and blocks fetched both feel weird to me. If you have a
cache hit, you don't end up really "fetching" anything at all (since
pgstat_count_buffer_read() is called before ReadBuffer_common() and we
don't know if it is a hit or miss yet). And, I would normally associate
fetching with fetching a block into a buffer. It seems like this counter
is really reflecting the number of buffers acquired or used.

tuples_fetched makes more sense because a tuple is "fetched" into a
slot.

This isn't really the fault of this patch since that member was already
called blocks_fetched.

- Melanie




Re: psql: Add role's membership options to the \du+ command

2023-03-22 Thread Pavel Luzanov
In the previous version, I didn't notice (unlike cfbot) the compiler 
warning. Fixed in version 6.


-
Pavel Luzanov
From 1b8b5743df23637b70e8d4ad0df0e1f892c595f3 Mon Sep 17 00:00:00 2001
From: Pavel Luzanov 
Date: Wed, 22 Mar 2023 20:54:41 +0300
Subject: [PATCH v6] psql: show membership options in the \du command

Format for the "Member of" column completely redesigned.
Shown within each row, in newline-separated format, are the memberships granted to
the role.  The presentation includes both the name of the grantor
as well as the membership permissions (in an abbreviated format:
a for admin option, i for inherit option, s for set option.)
The word 'empty' is printed in the case that none of those permissions are granted.
---
 doc/src/sgml/ref/psql-ref.sgml | 30 +++---
 src/bin/psql/describe.c| 63 --
 src/test/regress/expected/psql.out | 49 +++
 src/test/regress/sql/psql.sql  | 30 ++
 4 files changed, 146 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 7b8ae9fac3..03e7da93de 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1727,9 +1727,18 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g
 S modifier to include system roles.
 If pattern is specified,
 only those roles whose names match the pattern are listed.
-If the form \dg+ is used, additional information
-is shown about each role; currently this adds the comment for each
-role.
+
+
+Shown within each row, in newline-separated format, are the memberships granted to
+the role.  The presentation includes both the name of the grantor
+as well as the membership permissions (in an abbreviated format:
+a for admin option, i for inherit option,
+s for set option.) The word empty is printed in
+the case that none of those permissions are granted.
+See the GRANT command for their meaning.
+
+
+If the form \dg+ is used the comment attached to the role is shown.
 
 
   
@@ -1969,9 +1978,18 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g
 S modifier to include system roles.
 If pattern is specified,
 only those roles whose names match the pattern are listed.
-If the form \du+ is used, additional information
-is shown about each role; currently this adds the comment for each
-role.
+
+
+Shown within each row, in newline-separated format, are the memberships granted to
+the role.  The presentation includes both the name of the grantor
+as well as the membership permissions (in an abbreviated format:
+a for admin option, i for inherit option,
+s for set option.) The word empty is printed in
+the case that none of those permissions are granted.
+See the GRANT command for their meaning.
+
+
+If the form \du+ is used the comment attached to the role is shown.
 
 
   
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 99e28f607e..7f2b7c9363 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3631,24 +3631,42 @@ describeRoles(const char *pattern, bool verbose, bool showSystem)
 	printfPQExpBuffer(,
 	  "SELECT r.rolname, r.rolsuper, r.rolinherit,\n"
 	  "  r.rolcreaterole, r.rolcreatedb, r.rolcanlogin,\n"
-	  "  r.rolconnlimit, r.rolvaliduntil,\n"
-	  "  ARRAY(SELECT b.rolname\n"
-	  "FROM pg_catalog.pg_auth_members m\n"
-	  "JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid)\n"
-	  "WHERE m.member = r.oid) as memberof");
+	  "  r.rolconnlimit, r.rolvaliduntil, r.rolreplication,\n");
 
-	if (verbose)
-	{
-		appendPQExpBufferStr(, "\n, pg_catalog.shobj_description(r.oid, 'pg_authid') AS description");
-		ncols++;
-	}
-	appendPQExpBufferStr(, "\n, r.rolreplication");
+	if (pset.sversion >= 16)
+		appendPQExpBufferStr(,
+			 "  (SELECT pg_catalog.string_agg(\n"
+			 "pg_catalog.format('%I from %I (%s)',\n"
+			 "  b.rolname, m.grantor::pg_catalog.regrole::pg_catalog.text,\n"
+			 "  pg_catalog.regexp_replace(\n"
+			 "pg_catalog.concat_ws(', ',\n"
+			 "  CASE WHEN m.admin_option THEN 'a' END,\n"
+			 "  CASE WHEN m.inherit_option THEN 'i' END,\n"
+			 "  CASE WHEN m.set_option THEN 's' END),\n"
+			 "  '^$', 'empty')\n"
+			 "), E'\\n'\n"
+			 "ORDER BY b.rolname, m.grantor::pg_catalog.regrole::pg_catalog.text)\n"
+			 "  FROM pg_catalog.pg_auth_members m\n"
+			 "   JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid)\n"
+			 "  WHERE m.member = r.oid) as memberof");
+	else
+		appendPQExpBufferStr(,
+			 

Re: ICU locale validation / canonicalization

2023-03-22 Thread Jeff Davis
On Tue, 2023-03-21 at 10:35 +0100, Peter Eisentraut wrote:
> [PATCH v6 1/6] Support language tags in older ICU versions (53 and
>   earlier).
> 
> In pg_import_system_collations(), this is now redundant and can be 
> simplified:
> 
> -   if (!pg_is_ascii(langtag) || !pg_is_ascii(iculocstr))
> +   if (!pg_is_ascii(langtag) || !pg_is_ascii(langtag))
> 
> icu_set_collation_attributes() needs more commenting about what is
> going 
> on.  My guess is that uloc_canonicalize() converts from language tag
> to 
> ICU locale ID, and then the existing logic to parse that apart would 
> apply.  Is that how it works?

Fixed the redundancy, added some comments, and committed 0001.

> [PATCH v6 2/6] Wrap ICU ucol_open().
> 
> It makes sense to try to unify some of this.  But I find the naming 
> confusing.  If I see pg_ucol_open(), then I would expect that all
> calls 
> to ucol_open() would be replaced by this.  But here it's only a few, 
> without explanation.  (pg_ucol_open() has no explanation at all
> AFAICT.)

The remaining callsite which doesn't use the wrapper is in initdb.c,
which can't call into pg_locale.c, and has different intentions. initdb
uses ucol_open to get the default locale if icu_locale is not
specified; and it also uses ucol open to verify that the locale can be
opened (whether specified or the default). (Aside: I created a tiny
0004 patch which makes this difference more clear and adds a nice
comment.)

There's no reason to use a wrapper when getting the default locale,
because it's just passing NULL anyway.

When verifying that the locale can be opened, ucol_open() doesn't catch
many problems anyway, so I'm not sure it's worth a lot of effort to
copy these extra checks that the wrapper does into initdb.c. For
instance, what's the value in replacing "und" with "root" if opening
either will succeed? Parsing the attributes can potentially catch
problems, but the later patch 0006 will check the attributes when
converting to a language tag at initdb time.

So I'm inclined to just leave initdb alone in patches 0002 and 0003.

> I have in my notes that check_icu_locale() and make_icu_collator() 
> should be combined into a single function.  I think that would be a 
> better way to slice it.

That would leave out get_collation_actual_version(), which should
handle the same fixups for attributes and the "und" locale.

> Btw., I had intentionally not written code like this
> 
> +#if U_ICU_VERSION_MAJOR_NUM < 54
> +   icu_set_collation_attributes(collator, loc_str);
> +#endif
> 
> The disadvantage of doing it that way is that you then need to dig
> out 
> an old version of ICU in order to check whether the code compiles at 
> all.  With the current code, you can be sure that that code compiles
> if 
> you make changes elsewhere.

I was wondering about that -- thank you, I changed it back to use "if"
rather than "#ifdef".


New series attached (starting at 0002 to better correspond to the
previous series).


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From fbe03dc596b5e12f4dda60269e044caa58f8be32 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 14 Mar 2023 21:21:17 -0700
Subject: [PATCH v7 2/7] Wrap ICU ucol_open().

Hide details of supporting older ICU versions in a wrapper
function. The current code only needs to handle
icu_set_collation_attributes(), but a subsequent commit will add
additional version-specific code.
---
 src/backend/utils/adt/pg_locale.c | 70 +++
 1 file changed, 43 insertions(+), 27 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index c3ede994be..dd0786dff5 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -140,6 +140,7 @@ static char *IsoLocaleName(const char *);
  */
 static UConverter *icu_converter = NULL;
 
+static UCollator *pg_ucol_open(const char *loc_str);
 static void init_icu_converter(void);
 static size_t uchar_length(UConverter *converter,
 		   const char *str, int32_t len);
@@ -1430,17 +1431,8 @@ make_icu_collator(const char *iculocstr,
 {
 #ifdef USE_ICU
 	UCollator  *collator;
-	UErrorCode	status;
 
-	status = U_ZERO_ERROR;
-	collator = ucol_open(iculocstr, );
-	if (U_FAILURE(status))
-		ereport(ERROR,
-(errmsg("could not open collator for locale \"%s\": %s",
-		iculocstr, u_errorName(status;
-
-	if (U_ICU_VERSION_MAJOR_NUM < 54)
-		icu_set_collation_attributes(collator, iculocstr);
+	collator = pg_ucol_open(iculocstr);
 
 	/*
 	 * If rules are specified, we extract the rules of the standard collation,
@@ -1451,6 +1443,7 @@ make_icu_collator(const char *iculocstr,
 		const UChar *default_rules;
 		UChar	   *agg_rules;
 		UChar	   *my_rules;
+		UErrorCode	status;
 		int32_t		length;
 
 		default_rules = ucol_getRules(collator, );
@@ -1722,16 +1715,11 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	if (collprovider == COLLPROVIDER_ICU)
 	{
 		UCollator  *collator;
-		

Re: Request for comment on setting binary format output per session

2023-03-22 Thread Jeff Davis
On Wed, 2023-03-22 at 10:12 +0100, Peter Eisentraut wrote:
> Sending type names is kind of useless if what comes back with the
> result 
> (RowDescription) are OIDs anyway.
> 
> The client would presumably have some code like
> 
> if (typeoid == 555)
>  parseThatType();
> 
> So it already needs to know about the OIDs of all the types it is 
> interested in.

Technically it's still an improvement because you can avoid an extra
round-trip. The client library can pipeline a query like:

   SELECT typname, oid FROM pg_type
 WHERE typname IN (...list of supported type names...);

when the client first connects, and then go ahead and send whatever
queries you want without waiting for the response. When you get back
the result of the pg_type query, you cache the mapping, and use it to
process any other results you get.

That avoids introducing an extra round-trip. I'm not sure if that's a
reasonable thing to expect the client to do, so I agree that we should
offer a better way.

Regards,
Jeff Davis





Re: Initial Schema Sync for Logical Replication

2023-03-22 Thread Zheng Li
> > Yes. Do we have any concrete use case where the subscriber is an older
> > version, in the first place?
> >
>
> As per my understanding, it is mostly due to the reason that it can
> work today. Today, during an off-list discussion with Jonathan on this
> point, he pointed me to a similar incompatibility in MySQL
> replication. See the "SQL incompatibilities" section in doc[1]. Also,
> please note that this applies not only to initial sync but also to
> schema sync during replication. I don't think it would be feasible to
> keep such cross-version compatibility for DDL replication.

I think it's possible to make DDL replication cross-version
compatible, by making the DDL deparser version-aware: the deparsed
JSON blob can have a PG version in it, and the destination server can
process the versioned JSON blob by transforming anything incompatible
according to the original version and its own version.

Regards,
Zane




Re: Request for comment on setting binary format output per session

2023-03-22 Thread Jeff Davis
On Wed, 2023-03-22 at 10:21 +0100, Peter Eisentraut wrote:
> I've been thinking that we need some new kind of identifier to allow 
> clients to process types in more sophisticated ways.
> 
> For example, each type could be (self-)assigned a UUID, which is
> fixed 
> for that type no matter in which schema or under what extension name
> or 
> with what OID it is installed.  Client libraries could then hardcode 
> that UUID for processing the types.  Conversely, the UUID could be 
> changed if the wire format of the type is changed, without having to 
> change the type name.

That sounds reasonable to me. It could also be useful for other
extension objects (or the extension itself) to avoid other kinds of
weirdness from name collisions or major version updates or extensions
that depend on other extensions.

Regards,
Jeff Davis





Re: On login trigger: take three

2023-03-22 Thread Robert Haas
On Tue, Mar 15, 2022 at 4:52 PM Daniel Gustafsson  wrote:
> Yeah, that was the previously posted v25 from the author (who adopted it from
> the original author).  I took the liberty to quickly poke at the review
> comments you had left as well as the ones that I had found to try and progress
> the patch.  0001 should really go in it's own thread though to not hide it 
> from
> anyone interested who isn't looking at this thread.

Some comments on 0001:

- In general, I think we should prefer to phrase options in terms of
what is done, rather than what is not done. For instance, the
corresponding GUC for row-level security is row_security={on|off}, not
ignore_row_security.

- I think it's odd that the GUC in question doesn't accept true and
false and our usual synonyms for those values. I suggest that it
should, even if we want to add more possible values later.

- "ignoreing" is mispleled. So is gux-ignore-event-trigger. "Even
triggers" -> "Event triggers".

- Perhaps the documentation for the GUC should mention that the GUC is
not relevant in single-user mode because event triggers don't fire
then anyway.

- "Disable event triggers during the session." isn't a very good
description because there is in theory nothing to prevent this from
being set in postgresql.conf.

Basically, I think 0001 is a good idea -- I'm much more nervous about
0002. I think we should get 0001 polished up and committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-22 Thread Andres Freund
Hi,

Tom, see below - I wonder if should provide one more piece of infrastructure
around the saved error stuff...


Have you measured whether this has negative performance effects when *NOT*
using the new option?


As-is this does not work with FORMAT BINARY - and converting the binary input
functions to support soft errors won't happen for 16. So I think you need to
raise an error if BINARY and IGNORE_DATATYPE_ERRORS are specified.


On 2023-03-22 22:34:20 +0900, torikoshia wrote:
> @@ -985,9 +986,28 @@ CopyFrom(CopyFromState cstate)
>  
>   ExecClearTuple(myslot);
>  
> + if (cstate->opts.ignore_datatype_errors)
> + {
> + escontext.details_wanted = true;
> + cstate->escontext = escontext;
> + }

I think it might be worth pulling this out of the loop. That does mean you'd
have to reset escontext.error_occurred after an error, but that doesn't seem
too bad, you need to do other cleanup anyway.


> @@ -956,10 +957,20 @@ NextCopyFrom(CopyFromState cstate, ExprContext 
> *econtext,
>   values[m] = ExecEvalExpr(defexprs[m], econtext, 
> [m]);
>   }
>   else
> - values[m] = InputFunctionCall(_functions[m],
> - 
>   string,
> - 
>   typioparams[m],
> - 
>   att->atttypmod);
> + /* If IGNORE_DATATYPE_ERRORS is enabled skip 
> rows with datatype errors */
> + if (!InputFunctionCallSafe(_functions[m],
> + 
>string,
> + 
>typioparams[m],
> + 
>att->atttypmod,
> + 
>(Node *) >escontext,
> + 
>[m]))
> + {
> + cstate->ignored_errors++;
> +
> + ereport(WARNING,
> + errmsg("%s", 
> cstate->escontext.error_data->message));

That isn't right - you loose all the details of the message. As is you'd also
leak the error context.

I think the best bet for now is to do something like
/* adjust elevel so we don't jump out */
cstate->escontext.error_data->elevel = WARNING;
/* despite the name, this won't raise an error if elevel < ERROR */
ThrowErrorData(cstate->escontext.error_data);

I wonder if we ought to provide a wrapper for this? It could e.g. know to
mention the original elevel and such?


I don't think NextCopyFrom() is the right place to emit this warning - it
e.g. is also called from file_fdw.c, which might want to do something else
with the error. From a layering POV it seems cleaner to do this in
CopyFrom(). You already have a check for escontext.error_occurred there
anyway.



> @@ -3378,6 +3378,10 @@ copy_opt_item:
>   {
>   $$ = makeDefElem("freeze", (Node *) 
> makeBoolean(true), @1);
>   }
> + | IGNORE_DATATYPE_ERRORS
> + {
> + $$ = 
> makeDefElem("ignore_datatype_errors", (Node *)makeBoolean(true), @1);
> + }
>   | DELIMITER opt_as Sconst
>   {
>   $$ = makeDefElem("delimiter", (Node *) 
> makeString($3), @1);


I think we shouldn't add a new keyword for this, but only support this via
/* new COPY option syntax */
copy_generic_opt_list:
copy_generic_opt_elem

Further increasing the size of the grammar with random keywords when we have
more generic ways to represent them seems unnecessary.


> +-- tests for IGNORE_DATATYPE_ERRORS option
> +CREATE TABLE check_ign_err (n int, m int[], k int);
> +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
> +1{1} 1
> +a{2} 2
> +3{3} 33
> +4{a, 4}  4
> +
> +5{5} 5
> +\.
> +SELECT * FROM check_ign_err;
> +

I suggest adding a few more tests:

- COPY with a datatype error that can't be handled as a soft error
- test documenting that COPY FORMAT BINARY is incompatible with 
IGNORE_DATATYPE_ERRORS
- a soft error showing the error context - although that will require some
  care to avoid the function name + line in the output

Greetings,

Andres Freund




Re: Add pg_walinspect function with block info columns

2023-03-22 Thread Melanie Plageman
On Mon, Mar 20, 2023 at 7:34 PM Peter Geoghegan  wrote:
> On Sun, Mar 19, 2023 at 8:21 PM Kyotaro Horiguchi  
> wrote:
> > It is not an issue with this patch, but as I look at this version, I'm
> > starting to feel uneasy about the subtle differences between what
> > GetWALRecordsInfo and GetWALBlockInfo do. One solution might be to
> > have GetWALBlockInfo return a values array for each block, but that
> > could make things more complex than needed. Alternatively, could we
> > get GetWALRecordsInfo to call tuplestore_putvalues() internally? This
> > way, both functions can manage the temporary memory context within
> > themselves.
>
>  Agreed. I'm also not sure what to do about it, though.
>
> > This means GetWALBlockInfo overwrites the last two columns generated
> > by GetWalRecordInfo, but I don't think this approach is clean and
> > stable. I agree we don't want the final columns in a block info tuple
> > but we don't want to duplicate the common code path.
>
> > I initially thought we could devide the function into
> > GetWALCommonInfo(), GetWALRecordInfo() and GetWALBlockInfo(), but it
> > doesn't seem that simple.. In the end, I think we should have separate
> > GetWALRecordInfo() and GetWALBlockInfo() that have duplicate
> > "values[i++] = .." lines.
>
> I agree. A little redundancy is better when the alternative is fragile
> code, and I'm pretty sure that that applies here -- there won't be
> very many duplicated lines, and the final code will be significantly
> clearer. There can be a comment about keeping GetWALRecordInfo and
> GetWALBlockInfo in sync.

So, I also agree that it is better to have the two separate functions
instead of overwriting the last two columns. As for keeping them in
sync, we could define the number of common columns as a macro like:

#define WALINSPECT_INFO_NUM_COMMON_COLS 10

and use that to calculate the size of the values/nulls array in
GetWalRecordInfo() and GetWALBlockInfo() (assuming a new version where
those two functions duplicate the setting of values[x] = y).

That way, if a new column of information is added and one of the two
functions forgets to set it in the values array, it would still cause an
empty column and it will be easier for the programmer to see it needs to
be added.

We could even define an enum like:
  typedef enum walinspect_common_col
  {
WALINSPECT_START_LSN,
WALINSPECT_END_LSN,
WALINSPECT_PREV_LSN,
WALINSPECT_XID,
WALINSPECT_RMGR,
WALINSPECT_REC_TYPE,
WALINSPECT_REC_LENGTH,
WALINSPECT_MAIN_DATA_LENGTH,
WALINSPECT_FPILEN,
WALINSPECT_DESC,
WALINSPECT_NUM_COMMON_COL,
  } walinspect_common_col;

and set values in both functions like
  values[WALINSPECT_FPILEN] = y
if we kept the order of common columns the same and as the first N
columns for both functions. This would keep us from having to manually
update a macro like WALINSPECT_INFO_NUM_COMMON_COLS.

Though, I'm not sure how much value that actually adds.

- Melanie




Re: allow_in_place_tablespaces vs. pg_basebackup

2023-03-22 Thread Robert Haas
On Tue, Mar 21, 2023 at 7:59 PM Michael Paquier  wrote:
> On Mon, Mar 20, 2023 at 07:56:42AM -0400, Robert Haas wrote:
> > Anyone want to comment on this?
>
> I have not checked the patch in details, but perhaps this needs at
> least one test?

Sure. I was sort of hoping to get feedback on the overall plan first,
but it's easy enough to add a test so I've done that in the attached
version. I've put it in a separate file because 010_pg_basebackup.pl
is pushing a thousand lines and it's got a ton of unrelated tests in
there already. The test included here fails on master, but passes with
the patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v2-0001-Fix-pg_basebackup-with-in-place-tablespaces-some-.patch
Description: Binary data


Re: Set arbitrary GUC options during initdb

2023-03-22 Thread Tom Lane
Peter Eisentraut  writes:
> This patch looks good to me.  It's a very nice simplification of the 
> initdb.c code, even without the new feature.

Thanks for looking!

> I found that the addition of
> #include 
> didn't appear to be necessary.  Maybe it was required before 
> guc_value_requires_quotes() was changed?

There's still an isspace() added by the patch ... oh, the #include
is not needed because port.h includes ctype.h.  That's spectacularly
awful from an include-footprint standpoint, but I can't say that
I want to go fix it right this minute.

> I would remove the
> #if DEF_PGPORT != 5432
> This was in the previous code too, but now if we remove it, then we 
> don't have any more hardcoded 5432 left, which seems like a nice 
> improvement in cleanliness.

Hm.  That'll waste a few cycles during initdb; not sure if the extra
cleanliness is worth it.  It's not like that number is going to change.

regards, tom lane




Re: CREATE DATABASE ... STRATEGY WAL_LOG issues

2023-03-22 Thread Andres Freund
On 2023-03-22 09:58:58 -0400, Robert Haas wrote:
> On Wed, Mar 22, 2023 at 1:12 AM Andres Freund  wrote:
> > Patch with the two minimal fixes attached. As we don't know whether it's 
> > worth
> > changing the strategy, the more minimal fixes seem more appropriate.
> 
> LGTM.

Thanks for checking. Pushed.




Re: Should we remove vacuum_defer_cleanup_age?

2023-03-22 Thread Andres Freund
Hi,

On 2023-03-22 11:44:20 -0500, Justin Pryzby wrote:
> On Sat, Mar 18, 2023 at 10:33:57AM +0100, Alvaro Herrera wrote:
> > On 2023-Mar-17, Andres Freund wrote:
> > 
> > > I started writing a test for vacuum_defer_cleanup_age while working on 
> > > the fix
> > > referenced above, but now I am wondering if said energy would be better 
> > > spent
> > > removing vacuum_defer_cleanup_age alltogether.
> > 
> > +1  I agree it's not useful anymore.
> > 
> > > I don't think I have the cycles to push this through in the next weeks, 
> > > but if
> > > we agree removing vacuum_defer_cleanup_age is a good idea, it seems like a
> > > good idea to mark it as deprecated in 16?
> > 
> > Hmm, for the time being, can we just "disable" it by disallowing to set
> > the GUC to a value different from 0?  Then we can remove the code later
> > in the cycle at leisure.
> 
> It can be useful to do a "rolling transition", and it's something I do
> often.
> 
> But I can't see why that would be useful here?  It seems like something
> that could be done after the feature freeze.  It's removing a feature,
> not adding one.

It wasn't actually that much work to write a patch to remove
vacuum_defer_cleanup_age, see the attached.

I don't know whether others think we should apply it this release, given the
"late submission", but I tend to think it's not worth caring the complication
of vacuum_defer_cleanup_age forward.

Greetings,

Andres Freund
>From 24b519015bb1922a9746eda2ebea8702ccdd486f Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 22 Mar 2023 09:56:39 -0700
Subject: [PATCH v1] Remove vacuum_defer_cleanup_age

Discussion: https://postgr.es/m/20230317230930.nhsgk3qfk7f4a...@awork3.anarazel.de
---
 src/include/storage/standby.h |   1 -
 src/backend/storage/ipc/procarray.c   | 105 ++
 src/backend/storage/ipc/standby.c |   1 -
 src/backend/utils/misc/guc_tables.c   |   9 --
 src/backend/utils/misc/postgresql.conf.sample |   1 -
 src/bin/pg_upgrade/server.c   |   5 +-
 doc/src/sgml/config.sgml  |  35 --
 doc/src/sgml/high-availability.sgml   |  19 +---
 8 files changed, 11 insertions(+), 165 deletions(-)

diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 2effdea126f..4bc23adf516 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -21,7 +21,6 @@
 #include "storage/standbydefs.h"
 
 /* User-settable GUC parameters */
-extern PGDLLIMPORT int vacuum_defer_cleanup_age;
 extern PGDLLIMPORT int max_standby_archive_delay;
 extern PGDLLIMPORT int max_standby_streaming_delay;
 extern PGDLLIMPORT bool log_recovery_conflict_waits;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ea91ce355f2..106b184a3e6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -367,9 +367,6 @@ static inline void ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId l
 static void ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid);
 static void MaintainLatestCompletedXid(TransactionId latestXid);
 static void MaintainLatestCompletedXidRecovery(TransactionId latestXid);
-static void TransactionIdRetreatSafely(TransactionId *xid,
-	   int retreat_by,
-	   FullTransactionId rel);
 
 static inline FullTransactionId FullXidRelativeTo(FullTransactionId rel,
   TransactionId xid);
@@ -1709,10 +1706,7 @@ TransactionIdIsActive(TransactionId xid)
  * do about that --- data is only protected if the walsender runs continuously
  * while queries are executed on the standby.  (The Hot Standby code deals
  * with such cases by failing standby queries that needed to access
- * already-removed data, so there's no integrity bug.)  The computed values
- * are also adjusted with vacuum_defer_cleanup_age, so increasing that setting
- * on the fly is another easy way to make horizons move backwards, with no
- * consequences for data integrity.
+ * already-removed data, so there's no integrity bug.)
  *
  * Note: the approximate horizons (see definition of GlobalVisState) are
  * updated by the computations done here. That's currently required for
@@ -1877,50 +1871,11 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
 			TransactionIdOlder(h->data_oldest_nonremovable, kaxmin);
 		/* temp relations cannot be accessed in recovery */
 	}
-	else
-	{
-		/*
-		 * Compute the cutoff XID by subtracting vacuum_defer_cleanup_age.
-		 *
-		 * vacuum_defer_cleanup_age provides some additional "slop" for the
-		 * benefit of hot standby queries on standby servers.  This is quick
-		 * and dirty, and perhaps not all that useful unless the primary has a
-		 * predictable transaction rate, but it offers some protection when
-		 * there's no walsender connection.  Note that we are assuming
-		 * vacuum_defer_cleanup_age isn't large enough to cause wraparound ---
-		 * so guc.c should limit it to no 

Re: Should we remove vacuum_defer_cleanup_age?

2023-03-22 Thread Justin Pryzby
On Sat, Mar 18, 2023 at 10:33:57AM +0100, Alvaro Herrera wrote:
> On 2023-Mar-17, Andres Freund wrote:
> 
> > I started writing a test for vacuum_defer_cleanup_age while working on the 
> > fix
> > referenced above, but now I am wondering if said energy would be better 
> > spent
> > removing vacuum_defer_cleanup_age alltogether.
> 
> +1  I agree it's not useful anymore.
> 
> > I don't think I have the cycles to push this through in the next weeks, but 
> > if
> > we agree removing vacuum_defer_cleanup_age is a good idea, it seems like a
> > good idea to mark it as deprecated in 16?
> 
> Hmm, for the time being, can we just "disable" it by disallowing to set
> the GUC to a value different from 0?  Then we can remove the code later
> in the cycle at leisure.

It can be useful to do a "rolling transition", and it's something I do
often.

But I can't see why that would be useful here?  It seems like something
that could be done after the feature freeze.  It's removing a feature,
not adding one.

-- 
Justin




Re: Add pg_walinspect function with block info columns

2023-03-22 Thread Melanie Plageman
On Wed, Mar 22, 2023 at 11:35 AM Melanie Plageman
 wrote:
>
> On Fri, Mar 17, 2023 at 8:51 PM Michael Paquier  wrote:
> >
> > On Fri, Mar 17, 2023 at 04:36:58PM -0700, Peter Geoghegan wrote:
> > > I'm sure that they will do that much more than they would have
> > > otherwise. Since we'll have made pg_get_wal_block_info() so much more
> > > useful than pg_get_wal_records_info() for many important use cases.
> > > Why is that a bad thing? Are you concerned about the overhead of
> > > pulling in FPIs when pg_get_wal_block_info() is run, if Bharath's
> > > patch is committed? That could be a problem, I suppose -- but it would
> > > be good to get more data on that. Do you think that this will be much
> > > of an issue, Bharath?
> >
> > Yes.  The CPU cost is one thing, but I am also worrying about the
> > I/O cost with a tuplestore spilling to disk a large number of FPIs,
> > and some workloads can generate WAL so as FPIs is what makes for most
> > of the contents stored in the WAL.  (wal_compression is very effective
> > in such cases, for example.)
>
> I had done some analysis about CPU costs for decompressing FPI upthread
> in [1], finding that adding a parameter to allow skipping outputting FPI
> would not have much impact when FPI are compressed, as decompressing the
> images comprised very little of the overall time.
>
> After reading what you said, I was interested to see how substantial the
> I/O cost with non-compressed FPI would be.
>
> Using a patch with a parameter to pg_get_wal_block_info() to skip
> outputting FPI, I found that on a fast local nvme ssd, the timing
> difference between doing so and not still isn't huge -- 9 seconds when
> outputting the FPI vs 8.5 seconds when skipping outputting FPI. (with
> ~50,000 records all with non-compressed FPIs).
>
> However, perhaps obviously, the I/O cost is worse.
> Doing nothing but
>
>   SELECT *  FROM pg_get_wal_block_info(:start_lsn, :end_lsn, true)
> where fpi is not null;

Sorry, I should have been more clear: similar results with a select list
simply excluding fpi and no where clause.

- Melanie




Re: Bytea PL/Perl transform

2023-03-22 Thread Иван Панченко

 
  
>Среда, 22 марта 2023, 12:45 +03:00 от Daniel Gustafsson :
> 
>> On 18 Mar 2023, at 23:25, Иван Панченко < w...@mail.ru > wrote:
>>
>> Hi,
>> PostgreSQL passes bytea arguments to PL/Perl functions as hexadecimal 
>> strings, which is not only inconvenient, but also memory and time consuming.
>> So I decided to propose a simple transform extension to pass bytea as native 
>> Perl octet strings.
>> Please find the patch attached.
>Thanks for the patch, I recommend registering this in the currently open
>Commitfest to make sure it's kept track of:
>
>https://commitfest.postgresql.org/43/
Thanks, done:
https://commitfest.postgresql.org/43/4252/
>
>--
>Daniel Gustafsson
> 
--
Ivan
 

Re: Non-superuser subscription owners

2023-03-22 Thread Robert Haas
On Wed, Mar 8, 2023 at 2:47 PM Andres Freund  wrote:
> Hm - it still feels wrong that we error out in case of failure, despite the
> comment to the function saying:
>  * Returns NULL on error and fills the err with palloc'ed error message.

I've amended the comment so that it explains why it's done that way.

> Other than this, the change looks ready to me.

Well, it still needed documentation changes and pg_dump changes. I've
added those in the attached version.

If nobody's too unhappy with the idea, I plan to commit this soon,
both because I think that the feature is useful, and also because I
think it's an important security improvement. Since replication is
currently run as the subscription owner, any table owner can
compromise the subscription owner's account, which is really bad, but
if the subscription owner can be a non-superuser, it's a little bit
less bad. From a security point of view, I think the right thing to do
and what would improve security a lot more is to run replication as
the table owner rather than the subscription owner. I've posted a
patch for that at
http://postgr.es/m/CA+TgmoaSCkg9ww9oppPqqs+9RVqCexYCE6Aq=usypfnoode...@mail.gmail.com
and AFAICT everyone agrees with the idea, even if the patch itself
hasn't yet attracted any code reviews. But although the two patches
are fairly closely related, this seems to be a good idea whether that
moves forward or not, and that seems to be a good idea whether this
moves forward or not. As this one has had more review and discussion,
my current thought is to try to get this one committed first.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v5-0001-Add-new-predefined-role-pg_create_subscriptions.patch
Description: Binary data


Re: [PATCH] Report the query string that caused a memory error under Valgrind

2023-03-22 Thread Peter Eisentraut

On 31.01.23 15:00, Onur Tirtir wrote:
We use Valgrind --together with the suppression file provided in 
Postgres repo-- to test Citus extension against memory errors.


We replace /bin/postgres executable with a simple bash script that 
executes the original postgres executable under Valgrind and then we run 
our usual regression tests.


However, it is quite hard to understand which query caused a memory 
error in the stack traces that has been dumped into valgrind logfile.


For this reason, we propose the attached patch to allow Valgrind to 
report the query string that caused a memory error right after the 
relevant stack trace.


I belive this would not only be useful for Citus but also for Postgres 
and other extensions in their valgrind-testing process.


I can see how this could be useful.  But this only handles queries using 
the simple protocol.  At least the extended protocol should be handled 
as well.  Maybe it would be better to move this up to PostgresMain() and 
handle all protocol messages?






Re: Add pg_walinspect function with block info columns

2023-03-22 Thread Melanie Plageman
On Fri, Mar 17, 2023 at 8:51 PM Michael Paquier  wrote:
>
> On Fri, Mar 17, 2023 at 04:36:58PM -0700, Peter Geoghegan wrote:
> > I'm sure that they will do that much more than they would have
> > otherwise. Since we'll have made pg_get_wal_block_info() so much more
> > useful than pg_get_wal_records_info() for many important use cases.
> > Why is that a bad thing? Are you concerned about the overhead of
> > pulling in FPIs when pg_get_wal_block_info() is run, if Bharath's
> > patch is committed? That could be a problem, I suppose -- but it would
> > be good to get more data on that. Do you think that this will be much
> > of an issue, Bharath?
>
> Yes.  The CPU cost is one thing, but I am also worrying about the
> I/O cost with a tuplestore spilling to disk a large number of FPIs,
> and some workloads can generate WAL so as FPIs is what makes for most
> of the contents stored in the WAL.  (wal_compression is very effective
> in such cases, for example.)

I had done some analysis about CPU costs for decompressing FPI upthread
in [1], finding that adding a parameter to allow skipping outputting FPI
would not have much impact when FPI are compressed, as decompressing the
images comprised very little of the overall time.

After reading what you said, I was interested to see how substantial the
I/O cost with non-compressed FPI would be.

Using a patch with a parameter to pg_get_wal_block_info() to skip
outputting FPI, I found that on a fast local nvme ssd, the timing
difference between doing so and not still isn't huge -- 9 seconds when
outputting the FPI vs 8.5 seconds when skipping outputting FPI. (with
~50,000 records all with non-compressed FPIs).

However, perhaps obviously, the I/O cost is worse.
Doing nothing but

  SELECT *  FROM pg_get_wal_block_info(:start_lsn, :end_lsn, true)
where fpi is not null;

per iostat, the write latency was double for the query which output fpi
from the one that didn't and the wkB/s was much higher. This is probably
obvious, but I'm just wondering if it makes sense to have such a
parameter to avoid impacting a system which is doing concurrent I/O with
walinspect.

I have had use for block info without seeing the FPIs, personally.

- Melanie

[1] 
https://www.postgresql.org/message-id/CAAKRu_bJvbcYBRj2cN6G2xV7B7-Ja%2BpjTO1nEnEhRR8OXYiABA%40mail.gmail.com




Re: meson: Non-feature feature options

2023-03-22 Thread Peter Eisentraut

On 22.03.23 11:16, Nazir Bilal Yavuz wrote:

Hi,

On Wed, 15 Mar 2023 at 11:12, Peter Eisentraut
 wrote:


On 14.03.23 15:07, Nazir Bilal Yavuz wrote:

I think the uuid side of this is making this way too complicated.  I'm
content leaving this as a manual option for now.

There is much more value in making the ssl option work automatically.
So I would welcome a patch that just makes -Dssl=auto work smoothly,
perhaps using the "trick" that Andres described.

I tried to implement what we did for ssl to uuid as well, do you have
any comments?


For the uuid option, we have three different choices.  What should be
the search order and why?


Docs [1] say that: OSSP uuid library is not well maintained, and is
becoming increasingly difficult to port to newer platforms; so we can
put 'uuid-ossp' to the end. Between 'uuid-e2fs' and 'uuid-bsd', I
believe 'uuid-e2fs' is used more often than 'uuid-bsd'.
Hence, they can be ordered as 'uuid-e2fs', 'uuid-bsd', 'uuid-ossp'.

Does that make sense?


I think that's fair.





Re: ResourceOwner refactoring

2023-03-22 Thread Aleksander Alekseev
Hi,

I noticed that the patchset didn't make much progress since February
and decided to give it another round of code review.

> [...]. But in general, end-of-transaction activities should be kept
> simple, especially between the release phases, so I feel that having to
> remember extra resources there is a bad code smell and we shouldn't
> encourage that.

+1

> I added a test module in src/test/modules/test_resowner to test those cases.

This is much appreciated, as well as extensive changes made to READMEs
and the comments.

> [...] New patchset attached.
> I added it as a separate patch on top of the previous patches, as
> v13-0003-Release-resources-in-
> priority-order.patch, to make it easier to
> see what changed after the previous patchset version. But it should be
> squashed with patch 0002 before committing.

My examination, which besides reading the code included running it
under sanitizer and checking the code coverage, didn't reveal any
major problems with the patchset.

Certain "should never happen in practice" scenarios seem not to be
test-covered in resowner.c, particularly:

```
elog(ERROR, "ResourceOwnerEnlarge called after release started");
elog(ERROR, "ResourceOwnerRemember called but array was full");
elog(ERROR, "ResourceOwnerForget called for %s after release started",
kind->name);
elog(ERROR, "%s %p is not owned by resource owner %s",
elog(ERROR, "ResourceOwnerForget called for %s after release started",
kind->name);
elog(ERROR, "lock reference %p is not owned by resource owner %s"
```

I didn't check whether these or similar code paths were tested before
the patch and I don't have a strong opinion on whether we should test
these scenarios. Personally I'm OK with the fact that these few lines
are not covered with tests.

The following procedures are never executed:

* RegisterResourceReleaseCallback()
* UnregisterResourceReleaseCallback()

And are never actually called anymore due to changes in 0005.
Previously they were used by contrib/pgcrypto. I suggest dropping this
part of the API since it seems to be redundant now. This will simplify
the implementation even further.

This patch, although moderately complicated, was moved between several
commitfests. I think it would be great if it made it to PG16. I'm
inclined to change the status of the patchset to RfC in a bit, unless
anyone has a second opinion.

--
Best regards,
Aleksander Alekseev




Re: User functions for building SCRAM secrets

2023-03-22 Thread Jonathan S. Katz

On 3/22/23 2:48 AM, Michael Paquier wrote:

On Tue, Feb 14, 2023 at 06:16:18PM -0500, Jonathan S. Katz wrote:

I opted for the approach in [2]. v5 contains the branching logic for the
UTF8 only tests, and the corresponding output files. I tested locally on
macOS against both UTF8 +  C locales.


I was reading this thread again, and pondered on this particular
point:
https://www.postgresql.org/message-id/caawbhmhjcfc4oaga_7yluhtj6j+rxey+bodrygzndaflgfz...@mail.gmail.com

We've had our share of complains over the years that Postgres logs
password data in the logs with various DDLs, so I'd tend to agree that
this is not a practice we should try to encourage more.  The
parameterization of the SCRAM verifiers through GUCs (like Daniel's
https://commitfest.postgresql.org/42/4201/ for the iteration number)
is more promising because it is possible to not have to send the
password over the wire with once we let libpq take care of the
computation, and the server would not know about that


I generally agree with not allowing password data to be in logs, but in 
practice, there are a variety of tools and extensions that obfuscate or 
remove passwords from PostgreSQL logs. Additionally, this function is 
not targeted for SQL statements directly, but stored procedures.


For example, an extension like "pg_tle" exposes the ability for someone 
to write a "check_password_hook" directly from PL/pgSQL[1] (and other 
languages). As we've made it a best practice to pre-hash the password on 
the client-side, a user who wants to write a check password hook against 
a SCRAM verifier needs to be able to compare the verifier against some 
existing set of plaintext criteria, and has to write their own function 
to do it. I have heard several users who have asked to do this, and the 
only feedback I can give them is "implement your own SCRAM build secret 
function."


And, if my PostgreSQL server _is_ the client, e.g. it's making a dblink 
call to another PostgreSQL server, the only way it can modify a password 
is by sending the plaintext credential over the wire.


I don't see how the parameterization work applies here -- would we allow 
salts to be parameterized? -- and it still would not allow the server to 
build out a SCRAM secret for these cases.


Maybe I'm not conveying the problem this is solving -- I'm happy to go 
one more round trying to make it clearer -- but if this is not clear, 
it'd be good to at develop an alternative approach to this before 
withdrawing the patch.


Thanks,

Jonathan

[1] 
https://github.com/aws/pg_tle/blob/main/docs/06_plpgsql_examples.md#example-password-check-hook-against-bad-password-dictionary


OpenPGP_signature
Description: OpenPGP digital signature


Re: CREATE DATABASE ... STRATEGY WAL_LOG issues

2023-03-22 Thread Robert Haas
On Wed, Mar 22, 2023 at 1:12 AM Andres Freund  wrote:
> Patch with the two minimal fixes attached. As we don't know whether it's worth
> changing the strategy, the more minimal fixes seem more appropriate.

LGTM.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Initial Schema Sync for Logical Replication

2023-03-22 Thread Masahiko Sawada
On Wed, Mar 22, 2023 at 2:16 PM Amit Kapila  wrote:
>
> On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada  wrote:
> >
> > On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila  wrote:
> > >
> > > On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira  wrote:
> > >
> > > > You should
> > > > exclude them removing these objects from the TOC before running 
> > > > pg_restore or
> > > > adding a few pg_dump options to exclude these objects. Another issue is 
> > > > related
> > > > to different version. Let's say the publisher has a version ahead of the
> > > > subscriber version, a new table syntax can easily break your logical
> > > > replication setup. IMO pg_dump doesn't seem like a good solution for 
> > > > initial
> > > > synchronization.
> > > >
> > > > Instead, the backend should provide infrastructure to obtain the 
> > > > required DDL
> > > > commands for the specific (set of) tables. This can work around the 
> > > > issues from
> > > > the previous paragraph:
> > > >
> > > ...
> > > > * don't need to worry about different versions.
> > > >
> > >
> > > AFAICU some of the reasons why pg_dump is not allowed to dump from the
> > > newer version are as follows: (a) there could be more columns in the
> > > newer version of the system catalog and then Select * type of stuff
> > > won't work because the client won't have knowledge of additional
> > > columns. (b) the newer version could have new features (represented by
> > > say new columns in existing catalogs or new catalogs) that the older
> > > version of pg_dump has no knowledge of and will fail to get that data
> > > and hence an inconsistent dump. The subscriber will easily be not in
> > > sync due to that.
> > >
> > > Now, how do we avoid these problems even if we have our own version of
> > > functionality similar to pg_dump for selected objects? I guess we will
> > > face similar problems.
> >
> > Right. I think that such functionality needs to return DDL commands
> > that can be executed on the requested version.
> >
> > > If so, we may need to deny schema sync in any such case.
> >
> > Yes. Do we have any concrete use case where the subscriber is an older
> > version, in the first place?
> >
>
> As per my understanding, it is mostly due to the reason that it can
> work today. Today, during an off-list discussion with Jonathan on this
> point, he pointed me to a similar incompatibility in MySQL
> replication. See the "SQL incompatibilities" section in doc[1]. Also,
> please note that this applies not only to initial sync but also to
> schema sync during replication. I don't think it would be feasible to
> keep such cross-version compatibility for DDL replication.

Makes sense to me.

> Having said above, I don't intend that we must use pg_dump from the
> subscriber for the purpose of initial sync. I think the idea at this
> stage is to primarily write a POC patch to see what difficulties we
> may face. The other options that we could try out are (a) try to
> duplicate parts of pg_dump code in some way (by extracting required
> code) for the subscription's initial sync, or (b) have a common code
> (probably as a library or some other way) for the required
> functionality. There could be more possibilities that we may not have
> thought of yet. But the main point is that for approaches other than
> using pg_dump, we should consider ways to avoid duplicity of various
> parts of its code. Due to this, I think before ruling out using
> pg_dump, we should be clear about its risks and limitations.
>
> Thoughts?
>

Agreed. My biggest concern about approaches other than using pg_dump
is the same; the code duplication that could increase the maintenance
costs. We should clarify what points of using pg_dump is not a good
idea, and also analyze alternative ideas in depth.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-22 Thread torikoshia

On 2023-03-17 21:23, torikoshia wrote:

On 2023-03-07 18:09, Daniel Gustafsson wrote:

On 7 Mar 2023, at 09:35, Damir Belyalov  wrote:


I felt just logging "Error: %ld" would make people wonder the meaning 
of

the %ld. Logging something like ""Error: %ld data type errors were
found" might be clearer.

Thanks. For more clearance change the message to: "Errors were found: 
%".


I'm not convinced that this adds enough clarity to assist the user.  
We also
shouldn't use "error" in a WARNING log since the user has explicitly 
asked to

skip rows on error, so it's not an error per se.

+1


How about something like:

  ereport(WARNING,
  (errmsg("%ld rows were skipped due to data type
incompatibility", cstate->ignored_errors),
   errhint("Skipped rows can be inspected in the database log
for reprocessing.")));

Since skipped rows cannot be inspected in the log when
log_error_verbosity is set to terse,
it might be better without this errhint.


Removed errhint.

Modified some codes since v3 couldn't be applied HEAD anymore.

Also modified v3 patch as below:


65 +   if (cstate->opts.ignore_datatype_errors)
66 +   cstate->ignored_errors = 0;
67 +


It seems not necessary since cstate is initialized by palloc0() in 
BeginCopyFrom().



134 +   ereport(LOG,
135 +   errmsg("%s", 
cstate->escontext.error_data->message));

136 +
137 +   return true;


Since LOG means 'Reports information of interest to administrators'
according to the manual[1], datatype error should not be logged as
LOG. I put it back in WARNING.

[1] 
https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS



--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 6764d7e0f21ca266d7426cb922fd00e5138ec857 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Wed, 22 Mar 2023 22:00:15 +0900
Subject: [PATCH v4] Add new COPY option IGNORE_DATATYPE_ERRORS

Add new COPY option IGNORE_DATATYPE_ERRORS.

Currently entire COPY fails even when there is one unexpected
data regarding data type or range.
IGNORE_DATATYPE_ERRORS ignores these errors and skips them and
COPY data which don't contain problem.

This patch uses the soft error handling infrastructure, which
is introduced by d9f7f5d32f20.

Author: Damir Belyalov, Atsushi Torikoshi

---
 doc/src/sgml/ref/copy.sgml   | 12 
 src/backend/commands/copy.c  |  8 
 src/backend/commands/copyfrom.c  | 20 
 src/backend/commands/copyfromparse.c | 19 +++
 src/backend/parser/gram.y|  8 +++-
 src/include/commands/copy.h  |  1 +
 src/include/commands/copyfrom_internal.h |  3 +++
 src/include/parser/kwlist.h  |  1 +
 src/test/regress/expected/copy2.out  | 15 +++
 src/test/regress/sql/copy2.sql   | 12 
 10 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 5e591ed2e6..168b1c05d9 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { table_name [ ( format_name
 FREEZE [ boolean ]
+IGNORE_DATATYPE_ERRORS [ boolean ]
 DELIMITER 'delimiter_character'
 NULL 'null_string'
 HEADER [ boolean | MATCH ]
@@ -234,6 +235,17 @@ COPY { table_name [ ( 

 
+   
+IGNORE_DATATYPE_ERRORS
+
+ 
+  Drops rows that contain malformed data while copying. These are rows
+  with columns where the data type's input-function raises an error.
+  Outputs warnings about rows with incorrect data to system logfile.
+ 
+
+   
+

 DELIMITER
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f14fae3308..02d911abbe 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
+	bool		ignore_datatype_errors_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+		{
+			if (ignore_datatype_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_datatype_errors_specified = true;
+			opts_out->ignore_datatype_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 80bca79cd0..85c47f54b2 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -953,6 +953,7 @@ CopyFrom(CopyFromState cstate)
 	{
 		TupleTableSlot *myslot;
 		bool		skip_tuple;
+		ErrorSaveContext 

Re: HOT chain validation in verify_heapam()

2023-03-22 Thread Robert Haas
On Fri, Mar 17, 2023 at 8:31 AM Aleksander Alekseev
 wrote:
> The patch needed a rebase due to a4f23f9b. PFA v12.

I have committed this after tidying up a bunch of things in the test
case file that I found too difficult to understand -- or in some cases
just incorrect, like:

 elsif ($offnum == 35)
 {
-# set xmax to invalid transaction id.
 $tup->{t_xmin} = $in_progress_xid;
 $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
 push @expected,

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Make EXPLAIN generate a generic plan for a parameterized query

2023-03-22 Thread Laurenz Albe
Thanks for the review!

On Tue, 2023-03-21 at 16:32 +0100, Christoph Berg wrote:
> I have some comments:
> 
> > This allows EXPLAIN to generate generic plans for parameterized statements
> > (that have parameter placeholders like $1 in the statement text).
> 
> > +   
> > +    GENERIC_PLAN
> > +    
> > + 
> > +  Generate a generic plan for the statement (see  > linkend="sql-prepare"/>
> > +  for details about generic plans).  The statement can contain 
> > parameter
> > +  placeholders like $1 (but then it has to be a 
> > statement
> > +  that supports parameters).  This option cannot be used together with
> > +  ANALYZE, since a statement with unknown parameters
> > +  cannot be executed.
> 
> Like in the commit message quoted above, I would put more emphasis on
> "parameterized query" here:
> 
>   Allow the statement to contain parameter placeholders like
>   $1 and generate a generic plan for it.
>   This option cannot be used together with ANALYZE.

I went with

   Allow the statement to contain parameter placeholders like
   $1 and generate a generic plan for it.
   See  for details about generic plans
   and the statements that support parameters.
   This option cannot be used together with ANALYZE.

> > +   /* check that GENERIC_PLAN is not used with EXPLAIN ANALYZE */
> > +   if (es->generic && es->analyze)
> > +   ereport(ERROR,
> > +   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > +    errmsg("EXPLAIN ANALYZE cannot be used 
> > with GENERIC_PLAN")));
> 
> To put that in line with the other error messages in that context, I'd
> inject an extra "option":
> 
>   errmsg("EXPLAIN option ANALYZE cannot be used with GENERIC_PLAN")));

Done.

> > --- a/src/test/regress/sql/explain.sql
> > +++ b/src/test/regress/sql/explain.sql
> > [...]
> > +create extension if not exists postgres_fdw;
> 
> "create extension postgres_fdw" cannot be used from src/test/regress/
> since contrib/ might not have been built.

Ouch.  Good catch.

> I suggest leaving this test in place here, but with local tables (to
> show that plan time pruning using the one provided parameter works),
> and add a comment here explaining that is being tested:
> 
> -- create a partition hierarchy to show that plan time pruning removes
> -- the key1=2 table but generates a generic plan for key2=$1

I did that, with a different comment.

> The test involving postgres_fdw is still necessary to exercise the new
> EXEC_FLAG_EXPLAIN_GENERIC code path, but needs to be moved elsewhere,
> probably src/test/modules/.

Tests for postgres_fdw are in contrib/postgres_fdw/sql/postgres_fdw.sql,
so I added the test there.

Version 9 of the patch is attached.

Yours,
Laurenz Albe
From 85aa88280069ca2befe7f4308d7e6f724cdb143a Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Wed, 22 Mar 2023 14:08:49 +0100
Subject: [PATCH] Add EXPLAIN option GENERIC_PLAN

This allows EXPLAIN to generate generic plans for parameterized statements
(that have parameter placeholders like $1 in the statement text).
Invent a new executor flag EXEC_FLAG_EXPLAIN_GENERIC that disables runtime
partition pruning for such plans: that would fail if the non-existing
parameters are involved, and we want to show all subplans anyway.

Author: Laurenz Albe
Reviewed-by: Julien Rouhaud, Christoph Berg, Michel Pelletier, Jim Jones
Discussion: https://postgr.es/m/0a29b954b10b57f0d135fe12aa0909bd41883eb0.camel%40cybertec.at
---
 .../postgres_fdw/expected/postgres_fdw.out| 30 +
 contrib/postgres_fdw/sql/postgres_fdw.sql | 25 +++
 doc/src/sgml/ref/explain.sgml | 14 +++
 src/backend/commands/explain.c| 11 +
 src/backend/executor/execMain.c   |  3 ++
 src/backend/executor/execPartition.c  |  9 ++--
 src/backend/parser/analyze.c  | 29 +
 src/include/commands/explain.h|  1 +
 src/include/executor/executor.h   | 18 +---
 src/test/regress/expected/explain.out | 42 +++
 src/test/regress/sql/explain.sql  | 24 +++
 11 files changed, 197 insertions(+), 9 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 04a3ef450c..25b91ab2e1 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11783,3 +11783,33 @@ ANALYZE analyze_table;
 -- cleanup
 DROP FOREIGN TABLE analyze_ftable;
 DROP TABLE analyze_table;
+-- ===
+-- test EXPLAIN (GENERIC_PLAN) with foreign partitions
+-- ===
+-- this is needed to exercise the EXEC_FLAG_EXPLAIN_GENERIC flag
+CREATE TABLE gen_part (
+key1 integer NOT NULL,
+key2 integer NOT NULL
+) PARTITION BY LIST (key1);
+CREATE TABLE gen_part_1
+PARTITION OF 

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2023-03-22 Thread Jelte Fennema
Rebased after conflicts with bfc9497ece01c7c45437bc36387cb1ebe346f4d2

Also included the fix for feedback from Daniel on patch 2, which he
had shared in the load balancing thread.

On Wed, 15 Mar 2023 at 09:49, Jelte Fennema  wrote:
>
> The rebase was indeed trivial (git handled everything automatically),
> because my first patch was doing a superset of the changes that were
> committed in b6dfee28f. Attached are the new patches.
>
> On Tue, 14 Mar 2023 at 19:04, Greg Stark  wrote:
> >
> > On Tue, 14 Mar 2023 at 13:59, Tom Lane  wrote:
> > >
> > > "Gregory Stark (as CFM)"  writes:
> > > > It looks like this needs a big rebase in fea-uth.c fe-auth-scram.c and
> > > > fe-connect.c. Every hunk is failing which perhaps means the code
> > > > you're patching has been moved or refactored?
> > >
> > > The cfbot is giving up after
> > > v14-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch fails,
> > > but that's been superseded (at least in part) by b6dfee28f.
> >
> > Ah, same with Jelte Fennema's patch for load balancing in libpq.
> >
> > --
> > greg


v16-0003-Return-2-from-pqReadData-on-EOF.patch
Description: Binary data


v16-0005-Start-using-new-libpq-cancel-APIs.patch
Description: Binary data


v16-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch
Description: Binary data


v16-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data


v16-0004-Add-non-blocking-version-of-PQcancel.patch
Description: Binary data


Re: Doc: Improve note about copying into postgres_fdw foreign tables in batch

2023-03-22 Thread Tatsuo Ishii
> While not the fault of this patch I find it confusing that we mix 
> and  for marking up "postgres_fdw", the latter seemingly more correct
> (and less commonly used) than .

I think we traditionally use  for an extension module (file)
name. It seems the  is used when we want to refer to objects
other than files.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: [EXTERNAL] Support load balancing in libpq

2023-03-22 Thread Jelte Fennema
Rebased patch after conflicts with bfc9497ece01c7c45437bc36387cb1ebe346f4d2


v14-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data


v14-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch
Description: Binary data


v14-0003-Support-load-balancing-in-libpq.patch
Description: Binary data


v14-0005-Remove-unnecessary-check-from-Fisher-Yates-imple.patch
Description: Binary data


v14-0004-Add-DNS-based-libpq-load-balancing-test.patch
Description: Binary data


Re: Avoid use deprecated Windows Memory API

2023-03-22 Thread Ranier Vilela
Em qua., 22 de mar. de 2023 às 07:01, Daniel Gustafsson 
escreveu:

> > On 19 Mar 2023, at 23:41, Michael Paquier  wrote:
> >
> > On Fri, Mar 17, 2023 at 12:19:56PM -0300, Ranier Vilela wrote:
> >> Rebased to latest Head.
> >
> > I was looking at this thread, and echoing Daniel's and Alvaro's
> > arguments, I don't quite see why this patch is something we need.  I
> > would recommend to mark it as rejected and move on.
>
> Unless the claimed performance improvement is measured and provides a
> speedup,
> and the loss of zeroing memory is guaranteed safe, there doesn't seem to be
> much value provided.
>
At no time was it suggested that there would be performance gains.
The patch proposes to adjust the API that the manufacturer asks you to use.
However, I see no point in discussing a defunct patch.

regards,
Ranier Vilela


Re: Request for comment on setting binary format output per session

2023-03-22 Thread Dave Cramer
If I recall the protocol-extension design correctly, such a setting
could only be set at session start, which could be annoying --- at the
very least we'd have to tolerate entries for unrecognized data types,
since clients couldn't be expected to have checked the list against
the current server in advance.

The protocol extension design has the drawback that it can only be set at
startup.
What if we were to allow changes to the setting after startup if the client
passed the cancel key as a unique identifier that only the driver would
know?

Dave Cramer



>
>


Re: Request for comment on setting binary format output per session

2023-03-22 Thread Dave Cramer
If there's some extension that offers type "mytype", and perhaps allows
it to be installed in any schema, then it seems that the client library
would know how to parse all instances of "mytype" regardless of the
schema or search_path.

I may be overthinking this.

Dave Cramer


On Tue, 21 Mar 2023 at 17:47, Jeff Davis  wrote:

> On Tue, 2023-03-21 at 09:22 -0400, Dave Cramer wrote:
> > As Jeff mentioned there is a visibility problem if the search path is
> > changed. The simplest solution IMO is to look up the OID at the time
> > the format is requested and use the OID going forward to format the
> > output as binary. If the search path changes and a type with the same
> > name is now first in the search path then the data would be returned
> > in text.
>
> The binary format parameter would ordinarily be set by the maintainer
> of the client library, who knows nothing about the schema the client
> might be accessing, and nothing about the search_path that might be
> set. They would only know which binary parsers they've already written
> and included with their client library.
>
> With that in mind, using search_path at all seems weird. Why would a
> change in search_path affect which types the client library knows how
> to parse? If the client library knows how to parse "foo.mytype"'s
> binary representation, and you change the search path such that it
> finds "bar.mytype" instead, did the client library all of a sudden
> forget how to parse "foo.mytype" and learn to parse "bar.mytype"?
>
> If there's some extension that offers type "mytype", and perhaps allows
> it to be installed in any schema, then it seems that the client library
> would know how to parse all instances of "mytype" regardless of the
> schema or search_path.
>
> Of course, a potential problem is that ordinary users can create types
> (e.g. enum types) and so you'd have to be careful about some tricks
> where someone shadows a well-known extension in order to confuse the
> client with unexpected binary data (not sure if that's a security
> concern or not, just thinking out loud).
>
> One solution might be that unqualified type names would work on all
> types of that name (in any schema) that are owned by a superuser,
> regardless of search_path. Most extension scripts will be run as
> superuser anyway. It would feel a little magical, which I don't like,
> but would work in any practical case I can think of.
>
> Another solution would be to have some extra catalog field in pg_type
> that would be a "binary format identifier" and use that rather than the
> type name to match up binary parsers with the proper type.
>
> Am I over-thinking this?
>
> Regards,
> Jeff Davis
>
>


Re: Doc: Improve note about copying into postgres_fdw foreign tables in batch

2023-03-22 Thread Daniel Gustafsson
> On 22 Mar 2023, at 12:58, Etsuro Fujita  wrote:
> 
> On Fri, Feb 17, 2023 at 5:45 PM Etsuro Fujita  wrote:
>> Here is a small patch to improve the note, which was added by commit
>> 97da48246 ("Allow batch insertion during COPY into a foreign table."),
>> by adding an explanation about how the actual number of rows
>> postgres_fdw inserts at once is determined in the COPY case, including
>> a limitation that does not apply to the INSERT case.
> 
> Does anyone want to comment on this?

Patch looks good to me, but I agree with Tatsuo downthread that "similar way to
the insert case" reads better.  Theoretically the number could be different
from 1000 if MAX_BUFFERED_TUPLES was changed in the build, but that's a
non-default not worth spending time explaining.

+   the actual number of rows postgres_fdw copies at

While not the fault of this patch I find it confusing that we mix 
and  for marking up "postgres_fdw", the latter seemingly more correct
(and less commonly used) than .

--
Daniel Gustafsson





Re: Doc: Improve note about copying into postgres_fdw foreign tables in batch

2023-03-22 Thread Tatsuo Ishii
> On Fri, Feb 17, 2023 at 5:45 PM Etsuro Fujita  wrote:
>> Here is a small patch to improve the note, which was added by commit
>> 97da48246 ("Allow batch insertion during COPY into a foreign table."),
>> by adding an explanation about how the actual number of rows
>> postgres_fdw inserts at once is determined in the COPY case, including
>> a limitation that does not apply to the INSERT case.
> 
> Does anyone want to comment on this?

>
> -   This option also applies when copying into foreign tables.
> +   This option also applies when copying into foreign tables.  In that 
> case
> +   the actual number of rows postgres_fdw copies at
> +   once is determined in a similar way to in the insert case, but it is

"similar way to in" should be "similar way to", maybe?

> +   limited to at most 1000 due to implementation restrictions of the
> +   COPY command.
>
>   
>  

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


Re: Doc: Improve note about copying into postgres_fdw foreign tables in batch

2023-03-22 Thread Etsuro Fujita
On Fri, Feb 17, 2023 at 5:45 PM Etsuro Fujita  wrote:
> Here is a small patch to improve the note, which was added by commit
> 97da48246 ("Allow batch insertion during COPY into a foreign table."),
> by adding an explanation about how the actual number of rows
> postgres_fdw inserts at once is determined in the COPY case, including
> a limitation that does not apply to the INSERT case.

Does anyone want to comment on this?

Best regards,
Etsuro Fujita




Re: Comment in preptlist.c

2023-03-22 Thread Etsuro Fujita
On Wed, Mar 22, 2023 at 4:50 PM David Rowley  wrote:
> And now it just clicked with me why Tom left this. Sorry for stepping
> on your toes here.

No problem at all.

Best regards,
Etsuro Fujita




Re: Question: Do we have a rule to use "PostgreSQL" and "PostgreSQL" separately?

2023-03-22 Thread Daniel Gustafsson
> On 22 Mar 2023, at 04:19, Hayato Kuroda (Fujitsu)  
> wrote:

> I have also grepped to detect another wrong markups, and I think at least
> "PostgreSQL" should be changed. PSA the patch.

I agree with that analysis, this instance should be marked up with
 but not the other ones.  I'll go ahead with your patch after some
testing.

--
Daniel Gustafsson





Re: Save a few bytes in pg_attribute

2023-03-22 Thread Matthias van de Meent
On Wed, 22 Mar 2023 at 10:42, Peter Eisentraut
 wrote:
>
> On 21.03.23 18:46, Andres Freund wrote:
> > FWIW, I think we should consider getting rid of attcacheoff. I doubt it's
> > worth its weight these days, because deforming via slots starts at the
> > beginning anyway. The overhead of maintaining it is not insubstantial, and
> > it's just architecturally ugly to to update tupledescs continually.
>
> Btw., could attcacheoff be int16?

I had the same thought in '21, and in the patch linked upthread[0] I
added an extra comment on the field:

> + Note: Although the maximum offset encountered in stored tuples is
> + limited to the max BLCKSZ (2**15), FormData_pg_attribute is used for
> + all internal tuples as well, so attcacheoff may be larger for those
> + tuples, and it is therefore not safe to use int16.

So, we can't reduce its size while we use attcacheoff for
(de)serialization of tuples with up to MaxAttributeNumber (=INT16_MAX)
of attributes which each can be larger than one byte (such as in
tuplestore, tuplesort, spilling hash aggregates, ...)

Kind regards,

Matthias van de Meent

[0] 
https://www.postgresql.org/message-id/caeze2wh8-metsryzx_ubj-uv6kb+2ynzhaejmedubjhmgus...@mail.gmail.com




Re: [BUG] pg_stat_statements and extended query protocol

2023-03-22 Thread Drouvot, Bertrand

Hi,

On 3/21/23 2:16 PM, Imseih (AWS), Sami wrote:

This indeed feels a bit more natural seen from here, after looking at
the code paths using an Instrumentation in the executor and explain,
for example. At least, this stresses me much less than adding 16
bytes to EState for something restricted to the extended protocol when
it comes to monitoring capabilities.


Attached is the patch that uses Instrumentation.



Thanks, I think this new approach makes sense.

-  const BufferUsage *bufusage,
+  int64 calls, const BufferUsage 
*bufusage,

What about using an uint64 for calls? That seems more appropriate to me (even if
queryDesc->totaltime->calls will be passed (which is int64), but that's already
also the case for the "rows" argument and queryDesc->totaltime->rows_processed)

@@ -88,6 +88,8 @@ typedef struct Instrumentation
double  nfiltered2; /* # of tuples removed by 
"other" quals */
BufferUsage bufusage;   /* total buffer usage */
WalUsagewalusage;   /* total WAL usage */
+   int64   calls;  /* # of total calls to 
ExecutorRun */
+   int64   rows_processed; /* # of total rows processed in 
ExecutorRun */

I'm not sure it's worth mentioning that the new counters are "currently" used 
with the ExecutorRun.

What about just "total calls" and "total rows processed" (or "total rows", see 
below)?

Also, I wonder if "rows" (and not rows_processed) would not be a better naming.

Those last comments regarding the Instrumentation are done because ISTM that at 
the end their usage
could vary depending of the use case of the Instrumentation.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




RE: Initial Schema Sync for Logical Replication

2023-03-22 Thread Kumar, Sachin
> From: Amit Kapila 
> Sent: Wednesday, March 22, 2023 5:16 AM
> To: Masahiko Sawada 
> Cc: Euler Taveira ; Kumar, Sachin
> ; Alvaro Herrera ; pgsql-
> hack...@lists.postgresql.org; Jonathan S. Katz 
> Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
> 
> CAUTION: This email originated from outside of the organization. Do not click
> links or open attachments unless you can confirm the sender and know the
> content is safe.
> 
> 
> 
> On Wed, Mar 22, 2023 at 8:29 AM Masahiko Sawada
>  wrote:
> >
> > On Tue, Mar 21, 2023 at 8:18 PM Amit Kapila 
> wrote:
> > >
> > > On Tue, Mar 21, 2023 at 7:32 AM Euler Taveira 
> wrote:
> > >
> > > > You should
> > > > exclude them removing these objects from the TOC before running
> > > > pg_restore or adding a few pg_dump options to exclude these
> > > > objects. Another issue is related to different version. Let's say
> > > > the publisher has a version ahead of the subscriber version, a new
> > > > table syntax can easily break your logical replication setup. IMO
> > > > pg_dump doesn't seem like a good solution for initial synchronization.
> > > >
> > > > Instead, the backend should provide infrastructure to obtain the
> > > > required DDL commands for the specific (set of) tables. This can
> > > > work around the issues from the previous paragraph:
> > > >
> > > ...
> > > > * don't need to worry about different versions.
> > > >
> > >
> > > AFAICU some of the reasons why pg_dump is not allowed to dump from
> > > the newer version are as follows: (a) there could be more columns in
> > > the newer version of the system catalog and then Select * type of
> > > stuff won't work because the client won't have knowledge of
> > > additional columns. (b) the newer version could have new features
> > > (represented by say new columns in existing catalogs or new
> > > catalogs) that the older version of pg_dump has no knowledge of and
> > > will fail to get that data and hence an inconsistent dump. The
> > > subscriber will easily be not in sync due to that.
> > >
> > > Now, how do we avoid these problems even if we have our own version
> > > of functionality similar to pg_dump for selected objects? I guess we
> > > will face similar problems.
> >
> > Right. I think that such functionality needs to return DDL commands
> > that can be executed on the requested version.
> >
> > > If so, we may need to deny schema sync in any such case.
> >
> > Yes. Do we have any concrete use case where the subscriber is an older
> > version, in the first place?
> >
> 
> As per my understanding, it is mostly due to the reason that it can work
> today. Today, during an off-list discussion with Jonathan on this point, he
> pointed me to a similar incompatibility in MySQL replication. See the "SQL
> incompatibilities" section in doc[1]. Also, please note that this applies not
> only to initial sync but also to schema sync during replication. I don't 
> think it
> would be feasible to keep such cross-version compatibility for DDL
> replication.
> 
> Having said above, I don't intend that we must use pg_dump from the
> subscriber for the purpose of initial sync. I think the idea at this stage is 
> to
> primarily write a POC patch to see what difficulties we may face. The other
> options that we could try out are (a) try to duplicate parts of pg_dump code
> in some way (by extracting required
> code) for the subscription's initial sync, or (b) have a common code (probably
> as a library or some other way) for the required functionality. There could be
> more possibilities that we may not have thought of yet. But the main point is
> that for approaches other than using pg_dump, we should consider ways to
> avoid duplicity of various parts of its code. Due to this, I think before 
> ruling
> out using pg_dump, we should be clear about its risks and limitations.
> 
> Thoughts?
There is one more thing which needs to be consider even if we use 
pg_dump/pg_restore
We still need to have a way to get the create table for tables , if we want to 
support
concurrent DDLs on the publisher.
>8. TableSync process should check the state of table , if it is 
>SUBREL_STATE_CREATE it should
>get the latest definition from the publisher and recreate the table. (We have 
>to recreate
>the table even if there are no changes). Then it should go into copy table 
>mode as usual.
Unless there is different way to support concurrent DDLs or we going for 
blocking publisher
till initial sync is completed.
Regards
Sachin
> 
> [1] - https://dev.mysql.com/doc/refman/8.0/en/replication-
> compatibility.html
> [2] - https://www.postgresql.org/message-
> id/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ-
> uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com
> 
> --
> With Regards,
> Amit Kapila.


RE: Allow logical replication to copy tables in binary format

2023-03-22 Thread Hayato Kuroda (Fujitsu)
Dear Amit, hackers,

> The patch looks mostly good to me. However, I have one
> question/comment as follows:
> 
> -   
> +   
>  binary (boolean)
>  
> 
> To allow references to the binary option, we add the varlistentry id
> here. It looks slightly odd to me to add id for just one entry, see
> commit 78ee60ed84bb3a1cf0b6bd9a715dcbcf252a90f5 where we have
> purposefully added ids to allow future references. Shall we add id to
> other options as well on this page?

I have analyzed same points and made patch that could be applied atop v19-0001.
Please check 0002 patch.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v19-0001-Allow-logical-replication-to-copy-table-in-binary.patch
Description:  v19-0001-Allow-logical-replication-to-copy-table-in-binary.patch


v19-0002-Add-XML-ID-attributes-to-create_subscription.sgm.patch
Description:  v19-0002-Add-XML-ID-attributes-to-create_subscription.sgm.patch


Re: Commitfest 2023-03 starting tomorrow!

2023-03-22 Thread Daniel Gustafsson
> On 22 Mar 2023, at 10:39, Peter Eisentraut 
>  wrote:

> Personally, if a patch isn't rebased up to the minute doesn't bother me at 
> all.  It's easy to check out as of when the email was sent (or extra bonus 
> points for using git format-patch --base).  Now, rebasing every month or so 
> is nice, but daily rebases during a commit fest are almost more distracting 
> than just leaving it.

+1.  As long as the patch is rebased and builds/tests green when the CF starts
I'm not too worried about not having it always rebased during the CF.  If
resolving the conflicts are non-trivial/obvious then of course, but if only to
stay recent and avoid fuzz in applying then it's more distracting.

--
Daniel Gustafsson





Re: About a recently-added permission-related error message

2023-03-22 Thread Yugo NAGATA
On Mon, 20 Mar 2023 17:05:41 +0900 (JST)
Kyotaro Horiguchi  wrote:

> I found an error message added by de4d456b406bf502341ef526710d3f764b41e2c8.
> 
> When I incorrectly configured the primary_conninfo with the wrong
> user, I received the following message on the server logs of both
> servers involved in a physical replcation set.
> 
> [27022:walsender] FATAL:  permission denied to start WAL sender
> [27022:walsender] DETAIL:  Only roles with the REPLICATION attribute may 
> start a WAL sender process.
> 
> I'm not sure if adding the user name in the log prefix is a common
> practice, but without it, the log line might not have enough
> information. Unlike other permission-related messages, this message is
> not the something human operators receive in response to their
> actions. It seems similar to connection authorization logs where the
> user name is important. So, I'd like to propose the following
> alternative.

I am not sure whether this change is necessary because the error message
will appear in the log of the standby server and users can easily know
the connection user just by checking primary_conninfo.

> [27022:walsender] DETAIL:  The connection user "r1" requires the REPLICATION 
> attribute.

However, if we need this change, how about using
"DETAIL: The connection user "r1" must have the REPLICATION attribute."
This pattern is used in other part like check_object_ownership() and
AlterRole(). The user name is not included there, though.

Regards,
Yugo Nagata

> What do you think about this change?
> 
> regards.
> 
> -- 
> Kyotaro Horiguchi
> NTT Open Source Software Center


-- 
Yugo NAGATA 




Re: meson: Non-feature feature options

2023-03-22 Thread Nazir Bilal Yavuz
Hi,

On Wed, 15 Mar 2023 at 11:12, Peter Eisentraut
 wrote:
>
> On 14.03.23 15:07, Nazir Bilal Yavuz wrote:
> >> I think the uuid side of this is making this way too complicated.  I'm
> >> content leaving this as a manual option for now.
> >>
> >> There is much more value in making the ssl option work automatically.
> >> So I would welcome a patch that just makes -Dssl=auto work smoothly,
> >> perhaps using the "trick" that Andres described.
> > I tried to implement what we did for ssl to uuid as well, do you have
> > any comments?
>
> For the uuid option, we have three different choices.  What should be
> the search order and why?

Docs [1] say that: OSSP uuid library is not well maintained, and is
becoming increasingly difficult to port to newer platforms; so we can
put 'uuid-ossp' to the end. Between 'uuid-e2fs' and 'uuid-bsd', I
believe 'uuid-e2fs' is used more often than 'uuid-bsd'.
Hence, they can be ordered as 'uuid-e2fs', 'uuid-bsd', 'uuid-ossp'.

Does that make sense?

Regards,
Nazir Bilal Yavuz
Microsoft

[1] https://www.postgresql.org/docs/current/uuid-ossp.html




Re: Issue attaching a table to a partitioned table with an auto-referenced foreign key

2023-03-22 Thread Guillaume Lelarge
One last ping, hoping someone will have more time now than in january.

Perhaps my test is wrong, but I'd like to know why.

Thanks.

Le mar. 17 janv. 2023 à 16:53, Guillaume Lelarge  a
écrit :

> Quick ping, just to make sure someone can get a look at this issue :)
> Thanks.
>
>
> Le ven. 6 janv. 2023 à 11:07, Guillaume Lelarge 
> a écrit :
>
>> Hello,
>>
>> One of our customers has an issue with partitions and foreign keys. He
>> works on a v13, but the issue is also present on v15.
>>
>> I attach a SQL script showing the issue, and the results on 13.7, 13.9,
>> and 15.1. But I'll explain the script here, and its behaviour on 13.9.
>>
>> There is one partitioned table, two partitions and a foreign key. The
>> foreign key references the same table:
>>
>> create table t1 (
>>   c1 bigint not null,
>>   c1_old bigint null,
>>   c2 bigint not null,
>>   c2_old bigint null,
>>   primary key (c1, c2)
>>   )
>>   partition by list (c1);
>> create table t1_a   partition of t1 for values in (1);
>> create table t1_def partition of t1 default;
>> alter table t1 add foreign key (c1_old, c2_old) references t1 (c1, c2) on
>> delete restrict on update restrict;
>>
>> I've a SQL function that shows me some information from pg_constraints
>> (code of the function in the SQL script attached). Here is the result of
>> this function after creating the table, its partitions, and its foreign key:
>>
>> select * from show_constraints();
>> conname |   t|  tref  |   coparent
>> +++---
>>  t1_c1_old_c2_old_fkey  | t1 | t1 |
>>  t1_c1_old_c2_old_fkey  | t1_a   | t1 | t1_c1_old_c2_old_fkey
>>  t1_c1_old_c2_old_fkey  | t1_def | t1 | t1_c1_old_c2_old_fkey
>>  t1_c1_old_c2_old_fkey1 | t1 | t1_a   | t1_c1_old_c2_old_fkey
>>  t1_c1_old_c2_old_fkey2 | t1 | t1_def | t1_c1_old_c2_old_fkey
>> (5 rows)
>>
>> The constraint works great :
>>
>> insert into t1 values(1, NULL, 2, NULL);
>> insert into t1 values(2, 1,2, 2);
>> delete from t1 where c1 = 1;
>> psql:ticket15010_v3.sql:34: ERROR:  update or delete on table "t1_a"
>> violates foreign key constraint "t1_c1_old_c2_old_fkey1" on table "t1"
>> DETAIL:  Key (c1, c2)=(1, 2) is still referenced from table "t1".
>>
>> This error is normal since the line I want to delete is referenced on the
>> other line.
>>
>> If I try to detach the partition, it also gives me an error.
>>
>> alter table t1 detach partition t1_a;
>> psql:ticket15010_v3.sql:36: ERROR:  removing partition "t1_a" violates
>> foreign key constraint "t1_c1_old_c2_old_fkey1"
>> DETAIL:  Key (c1_old, c2_old)=(1, 2) is still referenced from table "t1".
>>
>> Sounds good to me too (well, I'd like it to be smarter and find that the
>> constraint is still good after the detach, but I can understand why it
>> won't allow it).
>>
>> The pg_constraint didn't change of course:
>>
>> select * from show_constraints();
>> conname |   t|  tref  |   coparent
>> +++---
>>  t1_c1_old_c2_old_fkey  | t1 | t1 |
>>  t1_c1_old_c2_old_fkey  | t1_a   | t1 | t1_c1_old_c2_old_fkey
>>  t1_c1_old_c2_old_fkey  | t1_def | t1 | t1_c1_old_c2_old_fkey
>>  t1_c1_old_c2_old_fkey1 | t1 | t1_a   | t1_c1_old_c2_old_fkey
>>  t1_c1_old_c2_old_fkey2 | t1 | t1_def | t1_c1_old_c2_old_fkey
>> (5 rows)
>>
>> Now, I'll delete the whole table contents, and I'll detach the partition:
>>
>> delete from t1;
>> alter table t1 detach partition t1_a;
>>
>> It seems to be working, but the content of pg_constraints is weird:
>>
>> select * from show_constraints();
>> conname |   t|  tref  |   coparent
>> +++---
>>  t1_c1_old_c2_old_fkey  | t1 | t1 |
>>  t1_c1_old_c2_old_fkey  | t1_a   | t1 |
>>  t1_c1_old_c2_old_fkey  | t1_def | t1 | t1_c1_old_c2_old_fkey
>>  t1_c1_old_c2_old_fkey2 | t1 | t1_def | t1_c1_old_c2_old_fkey
>> (4 rows)
>>
>> I understand why the ('t1_c1_old_c2_old_fkey1', 't1', 't1_a',
>> 't1_c1_old_c2_old_fkey') tuple has gone but I don't understand why the
>> ('t1_c1_old_c2_old_fkey', 't1_a', 't1', NULL) tuple is still there.
>>
>> Anyway, I attach the partition:
>>
>> alter table t1 attach partition t1_a for values in (1);
>>
>> But pg_constraint has not changed:
>>
>> select * from show_constraints();
>> conname |   t|  tref  |   coparent
>> +++---
>>  t1_c1_old_c2_old_fkey  | t1 | t1 |
>>  t1_c1_old_c2_old_fkey  | t1_a   | t1 | t1_c1_old_c2_old_fkey
>>  t1_c1_old_c2_old_fkey  | t1_def | t1 | t1_c1_old_c2_old_fkey
>>  t1_c1_old_c2_old_fkey2 | t1 | t1_def | t1_c1_old_c2_old_fkey
>> (4 rows)
>>
>> I was expecting to see the fifth tuple coming back, but alas, no.
>>
>> And as a result, the foreign key doesn't work anymore:
>>
>> insert into t1 

RE: Data is copied twice when specifying both child and parent table in publication

2023-03-22 Thread wangw.f...@fujitsu.com
On Wed, Mar 22, 2023 at 14:32 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> Dear Wang,
> 
> Thank you for updating patch! Following are comments form v19-0001.

Thanks for your comments.

> 01. logical-replication.sgml
> 
> I found a following statement in logical-replication.sgml. I think this may 
> cause
> mis-reading because it's OK when publishers list partitions and 
> publish_via_root
> is true.
> 
> ```
>   
>A subscriber node may have multiple subscriptions if desired.  It is
>possible to define multiple subscriptions between a single
>publisher-subscriber pair, in which case care must be taken to ensure
>that the subscribed publication objects don't overlap.
>   
> ```
> 
> How about adding "If publications are set publish_via_partition_root as true 
> and
> they publish partitions that have same partitioned table, only a change to
> partitioned
> table is published from the publisher."or something like that?

I think these seem to be two different scenarios: The scenario mentioned here is
multiple subscriptions at the subscription node, while the scenario we fixed
this time is a single subscription at the subscription node. So, it seems that
these two notes are not strongly related.

> 02. filter_partitions
> 
> IIUC this function can refactor like following to avoid "skip" flag.
> How do you think?
> 
> ```
> @@ -209,7 +209,6 @@ filter_partitions(List *table_infos)
> 
> foreach(lc, table_infos)
> {
> -   boolskip = false;
> List   *ancestors = NIL;
> ListCell   *lc2;
> published_rel  *table_info = (published_rel *) lfirst(lc);
> @@ -224,13 +223,10 @@ filter_partitions(List *table_infos)
> /* Is ancestor exists in the published table list? */
> if (is_ancestor_member_tableinfos(ancestor, 
> table_infos))
> {
> -   skip = true;
> +   table_infos = 
> foreach_delete_current(table_infos, lc);
> break;
> }
> }
> -
> -   if (skip)
> -   table_infos = foreach_delete_current(table_infos, lc);
> }
>  }
> ```

I think this approach deletes the cell of the list of the outer loop in the
inner loop. IIUC, we can only use function foreach_delete_current in the current
loop to delete the cell of the current loop.

> 03. fetch_table_list
> 
> ```
> +   /* Get the list of tables from the publisher. */
> +   if (server_version >= 16)
> ```
> 
> I think boolean variable can be used to check it like check_columnlist.
> How about "use_extended_function" or something?

Since we only need it once, I think it's fine not to add a new variable.

Regards,
Wang Wei


RE: Data is copied twice when specifying both child and parent table in publication

2023-03-22 Thread wangw.f...@fujitsu.com
On Wed, Mar 22, 2023 at 12:50 PM Peter Smith  wrote:
> Here are some review comments for patch code of HEAD_v19-0001

Thanks for your comments.

> ==
> doc/src/sgml/ref/create_publication.sgml
> 
> 1.
> + 
> +  There can be a case where a subscription combines multiple
> +  publications. If a root partitioned table is published by any
> +  subscribed publications which set
> +  publish_via_partition_root = true, changes on 
> this
> +  root partitioned table (or on its partitions) will be published 
> using
> +  the identity and schema of this root partitioned table rather than
> +  that of the individual partitions.
> + 
> 
> 1a.
> The paragraph prior to this one just refers to "partitioned tables"
> instead of "root partitioned table", so IMO we should continue with
> the same terminology.

Changed.

> I also modified the remaining text slightly. AFAIK my suggestion
> conveys exactly the same information but is shorter.
> 
> SUGGESTION
> There can be a case where one subscription combines multiple
> publications. If any of those publications has set
> publish_via_partition_root = true, then changes in
> a partitioned table (or on its partitions) will be published using the
> identity and schema of the partitioned table.

Sorry, I'm not sure about this.
I'm not a native speaker of English, but it seems like the following use case is
not explained:
```
create table t1 (a int primary key) partition by range (a);
create table t2 (a int primary key) partition by range (a);
create table t3 (a int primary key);
alter table t1 attach partition t2 default;
alter table t2 attach partition t3 default;

create publication p1 for table t1;
create publication p2_via for table t2 with(publish_via_partition_root);
create publication p3 for table t3;
```
If we subscribe to p1, p2_via and p3 at the same time, then t2's identity and
schema will be used instead of t1's (and of course not t3's).

> ~
> 
> 1b.
> Shouldn't that paragraph (or possibly somewhere in the CREATE
> SUBSCRIPTION notes?) also explain that in this scenario the logical
> replication will only publish one set of changes? After all, this is
> the whole point of the patch, but I am not sure if the user will know
> of this behaviour from the current documentation.

It seems to me that what you're explaining is what users expect. So, it seems we
don't need to explain it.
BTW IIUC, when user wants to use the "via_root" option, they should first read
the pg-doc to confirm the meaning and related notes of this option. So, I'm not
sure if adding this section in other documentation would be redundant.

> ==
> src/backend/catalog/pg_publication.c
> 
> 2. filter_partitions
> 
> BEFORE:
> static void
> filter_partitions(List *table_infos)
> {
> ListCell   *lc;
> 
> foreach(lc, table_infos)
> {
> bool skip = false;
> List*ancestors = NIL;
> ListCell*lc2;
> published_rel*table_info = (published_rel *) lfirst(lc);
> 
> if (get_rel_relispartition(table_info->relid))
> ancestors = get_partition_ancestors(table_info->relid);
> 
> foreach(lc2, ancestors)
> {
> Oid ancestor = lfirst_oid(lc2);
> 
> /* Is ancestor exists in the published table list? */
> if (is_ancestor_member_tableinfos(ancestor, table_infos))
> {
> skip = true;
> break;
> }
> }
> 
> if (skip)
> table_infos = foreach_delete_current(table_infos, lc);
> }
> }
> 
> ~
> 
> 2a.
> My previous review [1] (see #1) suggested putting most code within the
> condition. AFAICT my comment still is applicable but was not yet
> addressed.

Personally, I prefer the current style because the approach you mentioned adds
some indentations.

> 2b.
> IMO the comment "/* Is ancestor exists in the published table list?
> */" is unnecessary because it is already clear what is the purpose of
> the function named "is_ancestor_member_tableinfos".

Removed.

> ==
> src/backend/commands/subscriptioncmds.c
> 
> 3.
>  fetch_table_list(WalReceiverConn *wrconn, List *publications)
>  {
>   WalRcvExecResult *res;
> - StringInfoData cmd;
> + StringInfoData cmd,
> + pub_names;
>   TupleTableSlot *slot;
>   Oid tableRow[3] = {TEXTOID, TEXTOID, NAMEARRAYOID};
>   List*tablelist = NIL;
> - bool check_columnlist = (walrcv_server_version(wrconn) >= 15);
> + int server_version = walrcv_server_version(wrconn);
> + bool check_columnlist = (server_version >= 15);
> +
> + initStringInfo(_names);
> + get_publications_str(publications, _names, true);
> 
>   initStringInfo();
> - appendStringInfoString(, "SELECT DISTINCT t.schemaname, t.tablename
> \n");
> 
> - /* Get column lists for each relation if the publisher supports it */
> - if (check_columnlist)
> - appendStringInfoString(, ", t.attnames\n");
> + /* Get the list of tables from the publisher. */
> + if (server_version >= 16)
> + {
> 
> ~
> 
> I think the 'pub_names' is only needed within that ">= 16" condition.
> 
> So all the below code can be moved into that scope can't it?
> 
> + 

Re: Avoid use deprecated Windows Memory API

2023-03-22 Thread Daniel Gustafsson
> On 19 Mar 2023, at 23:41, Michael Paquier  wrote:
> 
> On Fri, Mar 17, 2023 at 12:19:56PM -0300, Ranier Vilela wrote:
>> Rebased to latest Head.
> 
> I was looking at this thread, and echoing Daniel's and Alvaro's
> arguments, I don't quite see why this patch is something we need.  I
> would recommend to mark it as rejected and move on.

Unless the claimed performance improvement is measured and provides a speedup,
and the loss of zeroing memory is guaranteed safe, there doesn't seem to be
much value provided.

--
Daniel Gustafsson





Re: Allow logical replication to copy tables in binary format

2023-03-22 Thread Amit Kapila
On Wed, Mar 22, 2023 at 9:00 AM shiy.f...@fujitsu.com
 wrote:
>
> On Wed Mar 22, 2023 7:29 AM Peter Smith  wrote:
> >
> > Thanks for all the patch updates. Patch v19 LGTM.
> >
>
> +1
>

The patch looks mostly good to me. However, I have one
question/comment as follows:

-   
+   
 binary (boolean)
 

To allow references to the binary option, we add the varlistentry id
here. It looks slightly odd to me to add id for just one entry, see
commit 78ee60ed84bb3a1cf0b6bd9a715dcbcf252a90f5 where we have
purposefully added ids to allow future references. Shall we add id to
other options as well on this page?

-- 
With Regards,
Amit Kapila.




Re: gcc 13 warnings

2023-03-22 Thread Peter Eisentraut

On 18.03.23 00:54, Andres Freund wrote:

I think a good compromise would be buildtype=debugoptimized, which is -O2
with debug symbols, which also sort of matches the default in the autoconf
world.


Looks like that'd result in a slightly worse build with msvc, as afaict we
wouldn't end up with /OPT:REF doesn't get specified, which automatically gets
disabled if /DEBUG is specified. I guess we can live with that.


I looked up what /OPT:REF does 
(https://learn.microsoft.com/en-us/cpp/build/reference/opt-optimizations?view=msvc-170), 
and it seems pretty obscure to me, at least for development builds.





Re: Bytea PL/Perl transform

2023-03-22 Thread Daniel Gustafsson
> On 18 Mar 2023, at 23:25, Иван Панченко  wrote:
> 
> Hi,
> PostgreSQL passes bytea arguments to PL/Perl functions as hexadecimal 
> strings, which is not only inconvenient, but also memory and time consuming.
> So I decided to propose a simple transform extension to pass bytea as native 
> Perl octet strings.
> Please find the patch attached.

Thanks for the patch, I recommend registering this in the currently open
Commitfest to make sure it's kept track of:

https://commitfest.postgresql.org/43/

--
Daniel Gustafsson





Re: Save a few bytes in pg_attribute

2023-03-22 Thread Peter Eisentraut

On 21.03.23 18:46, Andres Freund wrote:

FWIW, I think we should consider getting rid of attcacheoff. I doubt it's
worth its weight these days, because deforming via slots starts at the
beginning anyway. The overhead of maintaining it is not insubstantial, and
it's just architecturally ugly to to update tupledescs continually.


Btw., could attcacheoff be int16?





Re: Commitfest 2023-03 starting tomorrow!

2023-03-22 Thread Peter Eisentraut

On 21.03.23 10:59, Alvaro Herrera wrote:

This led me to suggesting that perhaps we need to be more lenient when
it comes to new contributors.  As I said, for seasoned contributors,
it's not a problem to keep up with our requirements, however silly they
are.  But people who spend their evenings a whole week or month trying
to understand how to patch for one thing that they want, to be received
by six months of silence followed by a constant influx of "please rebase
please rebase please rebase", no useful feedback, and termination with
"eh, you haven't rebased for the 1001th time, your patch has been WoA
for X days, we're setting it RwF, feel free to return next year" ...
they are most certainly off-put and will*not*  try again next year.


Personally, if a patch isn't rebased up to the minute doesn't bother me 
at all.  It's easy to check out as of when the email was sent (or extra 
bonus points for using git format-patch --base).  Now, rebasing every 
month or so is nice, but daily rebases during a commit fest are almost 
more distracting than just leaving it.






  1   2   >