Reducing logs produced by TAP tests running pg_regress on crash

2022-06-22 Thread Michael Paquier
Hi all,
(Thomas and Andres in CC.)

Andres has complained a couple of days ago about the quantity of logs
that can be fed into the TAP tests running pg_regress:
https://www.postgresql.org/message-id/20220603195318.qk4voicqfdhls...@alap3.anarazel.de

This concerns the TAP tests of pg_upgrade, as well as
027_stream_regress.pl, where a crash while running the tests would
show a couple of megs worth of regression.diffs.  Most of the output
generated does not make much sense to have in this case, and the
parallel schedule gives a harder time to spot the exact query involved
in the crash (if that's a query) while the logs of the backend should
be enough to spot what's the problem with the PIDs tracked.

One idea I got to limit the useless output generated is to check the
status of the cluster after running the regression test suite as
restart_on_crash is disabled by default in Cluster.pm, and avoid any 
follow-up logic in these tests if the cluster is not running anymore,
as of the attached.

Thoughts?
--
Michael
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 3f11540e18..b3ea3cfba3 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -115,6 +115,10 @@ else
 		  . "--max-concurrent-tests=20 "
 		  . "--inputdir=\"$inputdir\" "
 		  . "--outputdir=\"$outputdir\"");
+
+	die "node has crashed while running pg_regress\n"
+	  if ($oldnode->status) != 0;
+
 	if ($rc != 0)
 	{
 		# Dump out the regression diffs file, if there is one
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index c8c7bc5045..463293bc86 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -995,6 +995,29 @@ sub promote
 	return;
 }
 
+=item $node->status()
+
+Wrapper for pg_ctl status.  Returns the exit code of the command.
+
+=cut
+
+sub status
+{
+	my ($self)  = @_;
+	my $port= $self->port;
+	my $pgdata  = $self->data_dir;
+	my $logfile = $self->logfile;
+	my $name= $self->name;
+	my $ret;
+
+	local %ENV = $self->_get_env();
+
+	print "### Retrieving status of node \"$name\"\n";
+	$ret = PostgreSQL::Test::Utils::system_log('pg_ctl', '-D', $pgdata,
+		'-l', $logfile, 'status');
+	return $ret;
+}
+
 =pod
 
 =item $node->logrotate()
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index fdb4ea0bf5..e3a8b35bd5 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -74,6 +74,10 @@ my $rc =
 	  . "--max-concurrent-tests=20 "
 	  . "--inputdir=../regress "
 	  . "--outputdir=\"$outputdir\"");
+
+die "node has crashed while running pg_regress\n"
+  if ($node_primary->status) != 0;
+
 if ($rc != 0)
 {
 	# Dump out the regression diffs file, if there is one


signature.asc
Description: PGP signature


Re: Add header support to text format and matching feature

2022-06-22 Thread Julien Rouhaud
On Thu, Jun 23, 2022 at 01:26:29PM +0900, Michael Paquier wrote:
> On Wed, Jun 22, 2022 at 08:00:15PM +0900, Michael Paquier wrote:
> > OK.  As this is originally a feature you have committed, I originally
> > thought that you would take care of it, even if I sent a patch.  I'll
> > handle that tomorrow then, if that's fine for you, of course.  Happy
> > to help.
> 
> And done.  Thanks.

Thanks!




Re: [PATCH] Completed unaccent dictionary with many missing characters

2022-06-22 Thread Michael Paquier
On Tue, Jun 21, 2022 at 03:41:48PM +0200, Przemysław Sztoch wrote:
> Thomas Munro wrote on 21.06.2022 02:53:
>> Oh, we're using CLDR 41, which reminds me: CLDR 36 added SOUND
>> RECORDING COPYRIGHT[1] so we could drop it from special_cases().

Indeed.

>> Hmm, is it possible to get rid of CYRILLIC CAPITAL LETTER IO and
>> CYRILLIC SMALL LETTER IO by adding Cyrillic to PLAIN_LETTER_RANGES?

That's a good point.  There are quite a bit of cyrillic characters
missing a conversion, visibly.

>> That'd leave just DEGREE CELSIUS and DEGREE FAHRENHEIT.  Not sure how
>> to kill those last two special cases -- they should be directly
>> replaced by their decomposition.
>> 
>> [1] https://unicode-org.atlassian.net/browse/CLDR-11383
>
> I patch v3 support for cirilic is added.
> Special character function has been purged.
> Added support for category: So - Other Symbol. This category include
> characters from special_cases().

I think that we'd better split v3 into more patches to keep each
improvement isolated.  The addition of cyrillic characters in the
range of letters and the removal of the sound copyright from the
special cases can be done on their own, before considering the
original case tackled by this thread.
--
Michael


signature.asc
Description: PGP signature


Re: Add header support to text format and matching feature

2022-06-22 Thread Michael Paquier
On Wed, Jun 22, 2022 at 08:00:15PM +0900, Michael Paquier wrote:
> OK.  As this is originally a feature you have committed, I originally
> thought that you would take care of it, even if I sent a patch.  I'll
> handle that tomorrow then, if that's fine for you, of course.  Happy
> to help.

And done.  Thanks.
--
Michael


signature.asc
Description: PGP signature


RE: tablesync copy ignores publication actions

2022-06-22 Thread shiy.f...@fujitsu.com
On Wed, Jun 22, 2022 4:49 PM Peter Smith  wrote:
> 
> On Wed, Jun 22, 2022 at 2:18 PM Amit Kapila 
> wrote:
> >
> > On Thu, Jun 16, 2022 at 6:07 AM Peter Smith 
> wrote:
> >
> > >
> > > Thank you for your review comments. Those reported mistakes are fixed
> > > in the attached patch v3.
> > >
> >
> > This patch looks mostly good to me except for a few minor comments
> > which are mentioned below. It is not very clear in which branch(es) we
> > should commit this patch? As per my understanding, this is a
> > pre-existing behavior but we want to document it because (a) It was
> > not already documented, and (b) we followed it for row filters in
> > PG-15 it seems that should be explained. So, we have the following
> > options (a) commit it only for PG-15, (b) commit for PG-15 and
> > backpatch the relevant sections, or (c) commit it when branch opens
> > for PG-16. What do you or others think?
> 
> Even though this is a very old docs omission, AFAIK nobody ever raised
> it as a problem before. It only became more important because of the
> PG15 row-filters. So I think option (a) is ok.
> 

I also think option (a) is ok.

> 
> PSA patch v4 to address all the above review comments.
> 

Thanks for updating the patch. It looks good to me.

Besides, I tested the examples in the patch, and there's no problem.

Regards,
Shi yu


Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

2022-06-22 Thread Julien Rouhaud
On Wed, Jun 22, 2022 at 11:05:54PM +, Imseih (AWS), Sami wrote:
> >Can you describe how it's kept in sync, and how it makes sure that the 
> > property
> >is maintained over restart / gc?  I don't see any change in the code for 
> > the
> >2nd part so I don't see how it could work (and after a quick test it 
> > indeed
> >doesn't).
>
> There is a routine called qtext_hash_sync which removed all entries from
> the qtext_hash and reloads it will all the query ids from pgss_hash.
> [...]
> All the points when it's called, an exclusive lock is held.this allows or 
> syncing all
> The present queryid's in pgss_hash with qtext_hash.

So your approach is to let the current gc / file loading behavior happen as
before and construct your mapping hash using the resulting query text / offset
info.  That can't work.

> >2nd part so I don't see how it could work (and after a quick test it 
> > indeed
> >doesn't).
>
> Can you tell me what test you used to determine it is not in sync?

What test did you use to determine it is in sync?  Have you checked how the gc/
file loading actually work?

In my case I just checked the size of the query text file after running some
script that makes sure that there are the same few queries ran by multiple
different roles, then:

Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 559B
pg_ctl restart
Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 2383B

> >Can you share more details on the benchmarks you did?  Did you also run
> >benchmark on workloads that induce entry eviction, with and without need 
> > for
> >gc?  Eviction in pgss is already very expensive, and making things worse 
> > just
> >to save a bit of disk space doesn't seem like a good compromise.
>
> Sorry this was poorly explained by me. I went back and did some benchmarks. 
> Attached is
> The script and results. But here is a summary:
> On a EC2 r5.2xlarge. The benchmark I performed is:
> 1. create 10k tables
> 2. create 5 users
> 3. run a pgbench script that performs per transaction a select on
> A randomly chosen table for each of the 5 users.
> 4. 2 variants of the test executed . 1 variant is with the default 
> pg_stat_statements.max = 5000
> and one test with a larger pg_stat_statements.max = 1.

But you wrote:

> ##
> ## pg_stat_statements.max = 15000
> ##

So which one is it?

>
> So 10-15% is not accurate. I originally tested on a less powered machine. For 
> this
> Benchmark I see a 6% increase in TPS (732k vs  683k) when we have a larger 
> sized
> pg_stat_statements.max is used and less gc/deallocations.
> Both tests show a drop in gc/deallocations and a net increase
> In tps. Less GC makes sense since the external file has less duplicate SQLs.

On the other hand you're rebuilding the new query_offset hashtable every time
there's an entry eviction, which seems quite expensive.

Also, as I mentioned you didn't change any of the heuristic for
need_gc_qtexts().  So if the query texts are indeed deduplicated, doesn't it
mean that  gc  will artificially
be called less often?  The wanted target of "50% bloat" will become "50%
bloat assuming no deduplication is done" and the average query text file size
will stay the same whether the query texts are deduplicated or not.

I'm wondering the improvements you see due to the patch or simply due to
artificially calling gc less often?  What are the results if instead of using
vanilla pg_stat_statements you patch it to perform roughly the same number of
gc as your version does?

Also your benchmark workload is very friendly with your feature, what are the
results with other workloads?  Having the results with query texts that aren't
artificially long would be interesting for instance, after fixing the problems
mentioned previously.

Also, you said that if you run that benchmark with a single user you don't see
any regression.   I don't see how rebuilding an extra hashtable in
entry_dealloc(), so when holding an exclusive lwlock, while not saving any
other work elsewhere could be free?

Looking at the script, you have:
echo "log_min_messages=debug1" >> $PGDATA/postgresql.conf; \

Is that really necessary?  Couldn't you upgrade the gc message to a higher
level for your benchmark need, or expose some new counter in
pg_stat_statements_info maybe?  Have you done the benchmark using a debug build
or normal build?




Re: Make COPY extendable in order to support Parquet and other formats

2022-06-22 Thread Tom Lane
Andres Freund  writes:
> On 2022-06-22 16:59:16 +0530, Ashutosh Bapat wrote:
>> IIUC, you want extensibility in FORMAT argument to COPY command
>> https://www.postgresql.org/docs/current/sql-copy.html. Where the
>> format is pluggable. That seems useful.

> Agreed.

Ditto.

> I suspect that we'd first need a patch to refactor the existing copy code a
> good bit to clean things up. After that it hopefully will be possible to plug
> in a new format without being too intrusive.

I think that step 1 ought to be to convert the existing formats into
plug-ins, and demonstrate that there's no significant loss of performance.

regards, tom lane




Re: Skipping logical replication transactions on subscriber side

2022-06-22 Thread Noah Misch
On Wed, Jun 22, 2022 at 09:50:02AM -0400, Robert Haas wrote:
> On Wed, Jun 22, 2022 at 12:28 AM Noah Misch  wrote:
> > Here's how I rank the options, from most-preferred to least-preferred:
> >
> > 1. Put new eight-byte fields at the front of each catalog, when in doubt.
> > 2. On systems where double alignment differs from int64 alignment, require
> >NAMEDATALEN%8==0.  Upgrading to v16 would require dump/reload for AIX 
> > users
> >changing NAMEDATALEN to conform to the new restriction.
> > 3. Introduce new typalign values.  Upgrading to v16 would require 
> > dump/reload
> >for all AIX users.
> > 4. De-support AIX.
> > 5. From above, "Somehow forcing values that are sometimes 4-byte aligned and
> >sometimes 8-byte aligned to be 8-byte alignment on all platforms".
> >Upgrading to v16 would require dump/reload for all AIX users.
> > 6. Require -qalign=natural on AIX.  Upgrading to v16 would require 
> > dump/reload
> >and possible system library rebuilds for all AIX users.
> >
> > I gather (1) isn't at the top of your ranking, or you wouldn't have written
> > in.  What do you think of (2)?
> 
> (2) pleases me in the sense that it seems to inconvenience very few
> people, perhaps no one, in order to avoid inconveniencing a larger
> number of people. However, it doesn't seem sufficient.

Here's a more-verbose description of (2), with additions about what it does
and doesn't achieve:

2. On systems where double alignment differs from int64 alignment, require
   NAMEDATALEN%8==0.  Modify the test from commits 79b716c and c1da0ac to stop
   treating "name" fields specially.  The test will still fail for AIX
   compatibility violations, but "name" columns no longer limit your field
   position candidates like they do today (today == option (1)).  Upgrading to
   v16 would require dump/reload for AIX users changing NAMEDATALEN to conform
   to the new restriction.  (I'm not sure pg_upgrade checks NAMEDATALEN
   compatibility, but it should require at least one of: same NAMEDATALEN, or
   absence of "name" columns in user tables.)

> If I understand
> correctly, even a catalog that includes no NameData column can have a
> problem.

Correct.

On Wed, Jun 22, 2022 at 10:39:20AM -0400, Tom Lane wrote:
> It appears that what we've got on AIX is that typalign 'd' overstates the
> actual alignment requirement for 'double', which is safe from the SIGBUS
> angle.

On AIX, typalign='d' states the exact alignment requirement for 'double'.  It
understates the alignment requirement for int64_t.

> I don't think we want rules exactly, what we need is mechanical verification
> that the field orderings in use are safe.

Commits 79b716c and c1da0ac did that.




Re: Perform streaming logical transactions by background workers and parallel apply

2022-06-22 Thread Peter Smith
FYI - the latest patch set v12* on this thread no longer applies.

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
v12-0003-A-temporary-patch-that-includes-patch-in-another.patch
error: patch failed: src/backend/replication/logical/relation.c:307
error: src/backend/replication/logical/relation.c: patch does not apply
error: patch failed: src/backend/replication/logical/worker.c:2358
error: src/backend/replication/logical/worker.c: patch does not apply
error: patch failed: src/test/subscription/t/013_partition.pl:868
error: src/test/subscription/t/013_partition.pl: patch does not apply
[postgres@CentOS7-x64 oss_postgres_misc]$

~~

I know the v12-0003 was meant just a temporary patch for something
that may now already be pushed, but it cannot be just skipped either
because then v12-0004 will also fail.

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
v12-0004-Add-some-checks-before-using-apply-background-wo.patch
error: patch failed: src/backend/replication/logical/relation.c:433
error: src/backend/replication/logical/relation.c: patch does not apply
error: patch failed: src/backend/replication/logical/worker.c:2403
error: src/backend/replication/logical/worker.c: patch does not apply
[postgres@CentOS7-x64 oss_postgres_misc]$

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Feature Proposal: Connection Pool Optimization - Change the Connection User

2022-06-22 Thread Todd Hubers
Hi Everyone,

Here is a progress update. I have an established team of 2 fulltime systems
programmers who have been working on this area for a couple of months now.

   - Past
   - *Impersonation* - a prototype has been completed for Option-1
  "Impersonation"
  - *Benchmarking* - has been completed on a range of options,
  including Option-0 "Reconnection".
  - Both Impersonation and the Benchmarking is currently on the
  backburner
   - Current:* Notification Concentrator* - This is not PostgreSQL Codebase
   work. This project makes NOTIFY/LISTEN work in Odyssey (and others) while
   in Transaction mode. (Until now, NOTIFY/LISTEN can only work in SESSION
   mode). We intend to also build patches for PgBouncer, and other popular
   Connection Pool systems.
   - It works in Session mode, but my organisation needs it to work in
  Transaction mode. It works by intercepting LISTEN/UNLISTEN in SQL and
  redirecting them to a single shared connection. There will be a Pub/Sub
  system within Odyssey. The LISTEN/UNLISTEN is only sent for the first
  subscriber or last unsubscriber accordingly. The NOTIFICATION
messages are
  then dispatched to the Subscriber list. At most only one SESSION
connection
  is required.
  - Next:
   - *Update Benchmarking:* I then expect to update Benchmarks with a range
  of prototype solutions, with both Impersonation and Notification
  Concentrator for final review.
  - *Publishing Benchmarking*: I will send our results here, and offer
  a patch for such benchmarking code.
  - *Final Implementation:* The team will finalise code for
  production-grade implementations, and tests
  - *Patches:* Then my team will submit a patch for PostgreSQL, Odyssey
  and others; working to polish anything else that might be required of us.

Todd

On Wed, 2 Feb 2022 at 10:56, Todd Hubers  wrote:

> Hi Everyone,
>
> Benchmarking work has commenced, and is ongoing.
>
>- *OPTIONS 5/6/7* - `SET SESSION AUTHORIZATION` takes double the time
>of a single separate SimpleQuery. This is to be expected, because double
>the amount of SimpleQuery messages are being sent, and that requires a full
>SimpleQuery/Result/Ready cycle. If there is significant latency between a
>Connection Pooler and the database, this delay is amplified. It would be
>possible to concatenate text into a single SimpleQuery. In the real world,
>the performance impact MAY be negligible.
>- *OPTION 0* - The time to reconnect (start a new connection from
>scratch with a different username/password) was found to be faster than
>using `SET SESSION AUTHORIZATION`.
>- *OPTION 1* - My team is continuing to explore a distinct Impersonate
>message (Option-1). We are completing a prototype-quality implementation,
>and then benchmarking it. Given that Option-1 is asynchronous (Request and
>expect to succeed) and it can even be included within the same TCP packet
>as the SimpleQuery (at times), we expect the performance will be better
>than restarting a connection, and not impacted by links of higher latency.
>
> I will be recording benchmark results in the document:
> https://docs.google.com/document/d/1u6mVKEHfKtR80UrMLNYrp5D6cCSW1_arcTaZ9HcAKlw/edit#
> after completion of the OPTION-1 prototype and benchmarking of that
> prototype.
>
> Note: In order to accommodate something like OPTION-8, an Impersonation
> message might have a flag (valid for 1x SimpleQuery only, then
> automatically restore back to the last user).
>
> Regards,
>
> Todd
>
>
> On Fri, 7 Jan 2022 at 10:55, Todd Hubers  wrote:
>
>> Hi Everyone,
>>
>> I have started working on this:
>>
>>- Benchmarking - increasingly more comprehensive benchmarking
>>- Prototyping - to simulate the change of users (toggling back and
>>forth)
>>- Draft Implementation - of OPTION-1 (New Protocol Message)
>>- (Then: Working with Odyssey and PgBouncer to add support (when the
>>GRANT role privilege is available))
>>
>> I hope to have a patch ready by the end of March.
>>
>> Regards,
>>
>> Todd
>>
>> On Wed, 24 Nov 2021 at 02:46, Todd Hubers  wrote:
>>
>>>
>>> Hi Jacob and Daniel,
>>>
>>> Thanks for your feedback.
>>>
>>> >@Daniel - I think thats conflating session_user and current_user, SET
>>> ROLE is not a login event. This is by design and discussed in the
>>> documentation..
>>>
>>> Agreed, I am using those terms loosely. I have updated option 4 in the
>>> proposal document. I have crossed it out. Option 5 is more suitable "SET
>>> SESSION AUTHORIZATION" for further consideration.
>>>
>>> >@Daniel - but it's important to remember that we need to cover the
>>> functionality in terms of *tests* first, performance benchmarking is
>>> another concern.
>>>
>>> For implementation absolutely, but not for a basic feasibility
>>> prototype. A quick non-secure non-reliable prototype is probably an
>>> important first-step to confirming which 

RE: Handle infinite recursion in logical replication setup

2022-06-22 Thread shiy.f...@fujitsu.com
On Mon, Jun 20, 2022 7:55 PM vignesh C  wrote:
> 
> Thanks for the comment, the v22 patch attached has the changes for the
> same.

Thanks for updating the patch, here are some comments on 0003 patch.

1. 032_origin.pl
+###
+# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) bidirectional
+# replication setup when the existing nodes (node_A & node_B) and the new node
+# (node_C) does not have any data.
+###

"does not have any data" should be "do not have any data" I think.

2.
The comment for 032_origin.pl:

# Test the CREATE SUBSCRIPTION 'origin' parameter.

After applying this patch, this file tests no only 'origin' parameter, but also
"copy_data" parameter, so should we modify this comment?

Besides, should we change the file name in this patch? It looks more like test
cases for bidirectional logical replication.

3. subscriptioncmds.c
/* Set default values for the boolean supported options. */
...
if (IsSet(supported_opts, SUBOPT_CREATE_SLOT))
opts->create_slot = true;
if (IsSet(supported_opts, SUBOPT_COPY_DATA))
-   opts->copy_data = true;
+   opts->copy_data = COPY_DATA_ON;
if (IsSet(supported_opts, SUBOPT_REFRESH))
opts->refresh = true;
if (IsSet(supported_opts, SUBOPT_BINARY))

"copy_data" option is not Boolean now, which is inconsistent with the comment.
So maybe we can change the comment here? ("the boolean supported options" ->
"the supported options")

Regards,
Shi yu


Re: [PoC] Let libpq reject unexpected authentication requests

2022-06-22 Thread David G. Johnston
On Thu, Jun 9, 2022 at 4:30 PM Jacob Champion 
wrote:

> On Wed, Jun 8, 2022 at 9:58 PM Michael Paquier 
> wrote:
>
> > One
> > interesting case comes down to stuff like channel_binding=require
> > require_auth="md5,scram-sha-256", where I think that we should still
> > fail even if the server asks for MD5 and enforce an equivalent of an
> > AND grammar through the parameters.  This reasoning limits the
> > dependencies between each parameter and treats the areas where these
> > are checked independently, which is what check_expected_areq() does
> > for channel binding.  So that's more robust at the end.
>
> Agreed.
>
>
That just makes me want to not implement OR'ing...

The existing set of individual parameters doesn't work as an API for
try-and-fallback.

Something like would be less problematic when it comes to setting multiple
related options:

--auth-try
"1;sslmode=require;channel_binding=require;method=scram-sha-256;sslcert=/tmp/machine.cert;sslkey=/tmp/machine.key"
--auth-try
"2;sslmode=require;method=cert;sslcert=/tmp/me.cert;sslkey=/tmp/me.key"
--auth-try "3;sslmode=prefer;method=md5"

Absent that radical idea, require_auth probably shouldn't change any
behavior that exists today without having specified require_auth and having
the chosen method happen anyway.  So whatever happens today with an md5
password prompt while channel_binding is set to require (not in the mood
right now to figure out how to test that on a compiled against HEAD
instance).

David J.


Re: Missing reference to pgstat_replslot.c in pgstat.c

2022-06-22 Thread Andres Freund
Hi,

On 2022-06-22 08:29:03 +0200, Drouvot, Bertrand wrote:
> I think there's a missing reference to pgstat_replslot.c in pgstat.c.

Indeed...

> Attached a tiny patch to fix it.

Thanks. Pushed.

Greetings,

Andres Freund




Re: pg_upgrade (12->14) fails on aggregate

2022-06-22 Thread Justin Pryzby
On Fri, Jun 17, 2022 at 10:14:13AM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Thu, Jun 16, 2022 at 10:01 PM Justin Pryzby  wrote:
> >> To me, oid>=16384 seems more hard-wired than namespace!='pg_catalog'.
> 
> > Extensions can be installed into pg_catalog, but they can't get
> > low-numbered OIDs.
> 
> Exactly.  (To be clear, I had in mind writing something involving
> FirstNormalObjectId, not that you should put literal "16384" in the
> code.)

Actually, 16384 is already used in two other places in check.c, so ...
done like that for consistency.
Also fixes parenthesis, typos, and renames vars.
>From 44a66cddbc1f32c16ee7229b85fe76aada4689aa Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 10 Jun 2022 11:17:36 -0500
Subject: [PATCH v3] WIP: pg_upgrade --check: detect old polymorphics from
 pre-14

These fail when upgrading from pre-14 (as expected), but it should fail during
pg_upgrade --check, and not after dumping the cluster and in the middle of
restoring it.

CREATE AGGREGATE array_accum(anyelement) (sfunc=array_append, stype=anyarray, initcond='{}');
CREATE OPERATOR !@# (PROCEDURE = array_append, LEFTARG=anyarray, rightarg=anyelement);

See also:
9e38c2bb5093ceb0c04d6315ccd8975bd17add66
97f73a978fc1aca59c6ad765548ce0096d95a923
---
 src/bin/pg_upgrade/check.c | 124 +
 1 file changed, 124 insertions(+)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index ace7387edaf..da1781237ba 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -31,6 +31,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_old_polymorphics(ClusterInfo *cluster);
 static char *get_canonical_locale_name(int category, const char *locale);
 
 
@@ -122,6 +123,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1300)
 		check_for_user_defined_postfix_ops(_cluster);
 
+	/*
+	 * PG 14 changed polymorphic functions from anyarray to anycompatiblearray.
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1300)
+		check_for_old_polymorphics(_cluster);
+
 	/*
 	 * Pre-PG 12 allowed tables to be declared WITH OIDS, which is not
 	 * supported anymore. Verify there are none, iff applicable.
@@ -775,6 +782,123 @@ check_proper_datallowconn(ClusterInfo *cluster)
 }
 
 
+/*
+ *	check_for_old_polymorphics()
+ *
+ *	Make sure nothing is using old polymorphic functions with
+ *	anyarray/anyelement rather than the new anycompatible variants.
+ */
+static void
+check_for_old_polymorphics(ClusterInfo *cluster)
+{
+	PGresult   *res;
+	FILE	   *script = NULL;
+	char		output_path[MAXPGPATH];
+
+	prep_status("Checking for use of old polymorphic functions");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "databases_with_old_polymorphics.txt");
+
+#define OLD_POLYMORPHICS \
+	"'array_append(anyarray,anyelement)', " \
+	"'array_cat(anyarray,anyarray)', " \
+	"'array_position(anyarray,anyelement)', " \
+	"'array_position(anyarray,anyelement,integer)', " \
+	"'array_positions(anyarray,anyelement)', " \
+	"'array_prepend(anyelement,anyarray)', " \
+	"'array_remove(anyarray,anyelement)', " \
+	"'array_replace(anyarray,anyelement,anyelement)', " \
+	"'width_bucket(anyelement,anyarray)' "
+
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		bool		db_used = false;
+		DbInfo	   *active_db = >dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+		int			ntups;
+		int			i_objkind,
+	i_objname;
+		/*
+		 * The query below hardcodes FirstNormalObjectId as 16384 rather than
+		 * interpolating that C #define into the query because, if that
+		 * #define is ever changed, the cutoff we want to use is the value
+		 * used by pre-version 14 servers, not that of some future version.
+		 */
+
+		res = executeQueryOrDie(conn,
+/* Aggregate transition functions */
+"SELECT 'aggregate' AS objkind, p.oid::regprocedure::text AS objname "
+"FROM pg_proc AS p "
+"JOIN pg_aggregate AS a ON a.aggfnoid=p.oid "
+"JOIN pg_proc AS transfn ON transfn.oid=a.aggtransfn "
+"JOIN pg_namespace AS transnsp ON transnsp.oid=transfn.pronamespace "
+"WHERE p.oid >= 16384 "
+/* Before v11, used proisagg=true, and afterwards uses prokind='a' */
+"AND transnsp.nspname = 'pg_catalog' "
+"AND a.aggtransfn = ANY(ARRAY[" OLD_POLYMORPHICS "]::regprocedure[]) "
+// "AND aggtranstype='anyarray'::regtype
+
+/* Aggregate final functions */
+"UNION ALL "
+"SELECT 'aggregate' AS objkind, p.oid::regprocedure::text AS objname "
+"FROM pg_proc AS p "
+"JOIN pg_aggregate AS a ON a.aggfnoid=p.oid "
+"JOIN 

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-22 Thread Andres Freund
Hi,

On 2022-06-22 16:59:16 +0530, Ashutosh Bapat wrote:
> On Tue, Jun 21, 2022 at 3:26 PM Aleksander Alekseev
>  wrote:
> 
> >
> > In other words, personally I'm unaware of use cases when somebody
> > needs a complete read/write FDW or TableAM implementation for formats
> > like Parquet, ORC, etc. Also to my knowledge they are not particularly
> > optimized for this.
> >
> 
> IIUC, you want extensibility in FORMAT argument to COPY command
> https://www.postgresql.org/docs/current/sql-copy.html. Where the
> format is pluggable. That seems useful.

Agreed.

But I think it needs quite a bit of care. Just plugging in a bunch of per-row
(or worse, per field) switches to COPYs input / output parsing will make the
code even harder to read and even slower.

I suspect that we'd first need a patch to refactor the existing copy code a
good bit to clean things up. After that it hopefully will be possible to plug
in a new format without being too intrusive.

I know little about parquet - can it support FROM STDIN efficiently?

Greetings,

Andres Freund




Re: [PoC] Let libpq reject unexpected authentication requests

2022-06-22 Thread Jacob Champion
On Mon, Jun 13, 2022 at 10:00 PM Michael Paquier  wrote:
> > Personally I think the ability to provide a default of `!password` is
> > very compelling. Any allowlist we hardcode won't be future-proof (see
> > also my response to Laurenz upthread [1]).
>
> Hm, perhaps.  You could use that as a default at application level,
> but the default set in libpq should be backward-compatible (aka allow
> everything, even trust where the backend just sends AUTH_REQ_OK).
> This is unfortunate but there is a point in not breaking any user's
> application, as well, particularly with ldap, and note that we have to
> keep a certain amount of things backward-compatible as a very common
> practice of packaging with Postgres is to have libpq link to binaries
> across *multiple* major versions (Debian is one if I recall), with the
> newest version of libpq used for all.

I am 100% with you on this, but there's been a lot of chatter around
removing plaintext password support from libpq. I would much rather
see them rejected by default than removed entirely. !password would
provide an easy path for that.

> > I'm pretty motivated to provide the ability to say "I want cert auth
> > only, nothing else." Using a separate parameter would mean we'd need
> > something like `require_auth=none`, but I think that makes a certain
> > amount of sense.
>
> If the default of require_auth is backward-compatible and allows
> everything, using a different parameter for the certs won't matter
> anyway?

If you want cert authentication only, allowing "everything" will let
the server extract a password and then you're back at square one.
There needs to be a way to prohibit all explicit authentication
requests.

> My suggestion is to reword the error message so as the reason and the
> main error message can be treated as two independent things.  You are
> right to apply two times libpq_gettext(), once to "reason" and once to
> the main string.

Ah, thanks for the clarification. Done that way in v3.

> My point would be to either register a max flag in the set of
> AUTH_REQ_* in pqcomm.h so as we never go above 32 with an assertion to
> make sure that this would never overflow, but I would add a comment in
> pqcomm.h telling that we also do bitwise operations, relying on the
> number of AUTH_REQ_* flags, and that we'd better be careful once the
> number of flags gets higher than 32.  There is some margin, still that
> could be easily forgotten.

Makes sense; done.

v3 also removes "cert" from require_auth while I work on a replacement
connection option, fixes the bugs/suggestions pointed out upthread,
and adds a documentation first draft. I tried combining this with the
NOT work but it was too much to juggle, so that'll wait for a v4+,
along with require_auth=none and "cert mode".

Thanks for the detailed review!
--Jacob
commit c299b1abf0ffc477371cdd0d0d789e824da56328
Author: Jacob Champion 
Date:   Fri Jun 10 11:20:08 2022 -0700

squash! libpq: let client reject unexpected auth methods

Per review suggestions:

- Fix a memory leak when parsing require_auth.
- Add PGREQUIREAUTH to the envvars.
- Remove the "cert" method from require_auth; this will be handled with
  a separate conninfo option later.
- Fix builds without GSSAPI.
- Test combinations of require_auth and channel_binding.
- Statically assert when AUTH_REQ_* grows too large to hold safely in
  our allowed_auth_methods bitmask.
- Improve ease of translation.
- Add a first draft of documentation.

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 37ec3cb4e5..00990ce47d 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1221,6 +1221,76 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  require_auth
+  
+  
+Specifies the authentication method that the client requires from the
+server. If the server does not use the required method to authenticate
+the client, or if the authentication handshake is not fully completed 
by
+the server, the connection will fail. A comma-separated list of methods
+may also be provided, of which the server must use exactly one in order
+for the connection to succeed. By default, any authentication method is
+accepted, and the server is free to skip authentication altogether.
+  
+  
+The following methods may be specified:
+
+
+ 
+  password
+  
+   
+The server must request plaintext password authentication.
+   
+  
+ 
+
+ 
+  md5
+  
+   
+The server must request MD5 hashed password authentication.
+   
+  
+ 
+
+ 
+  gss
+  
+   
+The server must either request a Kerberos handshake via
+GSSAPI or establish a
+GSS-encrypted channel (see also
+ 

Re: Amcheck verification of GiST and GIN

2022-06-22 Thread Andres Freund
Hi,

I think having amcheck for more indexes is great.

On 2022-06-22 20:40:56 +0300, Andrey Borodin wrote:
>> diff --git a/contrib/amcheck/amcheck.c b/contrib/amcheck/amcheck.c
> new file mode 100644
> index 00..7a222719dd
> --- /dev/null
> +++ b/contrib/amcheck/amcheck.c
> @@ -0,0 +1,187 @@
> +/*-
> + *
> + * amcheck.c
> + *   Utility functions common to all access methods.

This'd likely be easier to read if the reorganization were split into its own
commit.

I'd also split gin / gist support. It's a large enough patch that that imo
makes reviewing easier.


> +void amcheck_lock_relation_and_check(Oid indrelid, IndexCheckableCallback 
> checkable,
> + 
> IndexDoCheckCallback check, LOCKMODE lockmode, void *state)

Might be worth pgindenting - the void for function definitions (but not for
declarations) is typically on its own line in PG code.


> +static GistCheckState
> +gist_init_heapallindexed(Relation rel)
> +{
> + int64   total_pages;
> + int64   total_elems;
> + uint64  seed;
> + GistCheckState result;
> +
> + /*
> + * Size Bloom filter based on estimated number of tuples in index
> + */
> + total_pages = RelationGetNumberOfBlocks(rel);
> + total_elems = Max(total_pages * (MaxOffsetNumber / 5),
> + (int64) rel->rd_rel->reltuples);
> + /* Generate a random seed to avoid repetition */
> + seed = pg_prng_uint64(_global_prng_state);
> + /* Create Bloom filter to fingerprint index */
> + result.filter = bloom_create(total_elems, maintenance_work_mem, seed);
> +
> + /*
> +  * Register our own snapshot
> +  */
> + result.snapshot = RegisterSnapshot(GetTransactionSnapshot());

FWIW, comments like this, that just restate exactly what the code does, are
imo not helpful.  Also, there's a trailing space :)


Greetings,

Andres Freund




Re: SLRUs in the main buffer pool - Page Header definitions

2022-06-22 Thread Andres Freund
Hi,

On 2022-06-22 19:12:14 -0400, Tom Lane wrote:
> "Bagga, Rishu"  writes:
> > The current SLRU pages do not have any header, so there is a need to
> > create a new page header format for these. Our investigations revealed
> > that we need to:
> 
> > 1. track LSN to ensure durability and consistency of all pages (for redo
> >    and full page write purposes)
> > 2. have a checksum (for page correctness verification).
> > 3. A flag to identify if the page is a relational or BufferedObject
> > 4. Track version information.
> 
> Isn't this a nonstarter from the standpoint of pg_upgrade?

We're rewriting some relation forks as part of pg_upgrade (visibility map
IIRC?), so rewriting an SLRU is likely not prohibitive - there's much more of
a limit to the SLRU sizes than the number and aggregate size of relation
forks.

Greetings,

Andres Freund




Re: SLRUs in the main buffer pool - Page Header definitions

2022-06-22 Thread Tom Lane
"Bagga, Rishu"  writes:
> The current SLRU pages do not have any header, so there is a need to
> create a new page header format for these. Our investigations revealed
> that we need to:

> 1. track LSN to ensure durability and consistency of all pages (for redo
>    and full page write purposes)
> 2. have a checksum (for page correctness verification).
> 3. A flag to identify if the page is a relational or BufferedObject
> 4. Track version information.

Isn't this a nonstarter from the standpoint of pg_upgrade?

regards, tom lane




Re: fix stats_fetch_consistency value in postgresql.conf.sample

2022-06-22 Thread Andres Freund
Hi,

On 2022-06-17 09:43:58 +0900, Kyotaro Horiguchi wrote:
> +/*
> + * Convert value to unitless value according to the specified GUC variable
> + */
> +Datum
> +pg_config_unitless_value(PG_FUNCTION_ARGS)
> +{
> + char   *name = "";
> + char   *value = "";
> + struct config_generic *record;
> + char   *result = "";
> + void   *extra;
> + union config_var_val val;
> + const char *p;
> + char   buffer[256];
> +
> + if (!PG_ARGISNULL(0))
> + name = text_to_cstring(PG_GETARG_TEXT_PP(0));
> + if (!PG_ARGISNULL(1))
> + value = text_to_cstring(PG_GETARG_TEXT_PP(1));
> +
> + record = find_option(name, true, false, ERROR);
> +
> + parse_and_validate_value(record, name, value, PGC_S_TEST, WARNING,
> +  , );
> +

Hm. I think this should error out for options that the user doesn't have the
permissions for - good. I suggest adding a test for that.


> + switch (record->vartype)
> + {
> + case PGC_BOOL:
> + result = (val.boolval ? "on" : "off");
> + break;
> + case PGC_INT:
> + snprintf(buffer, sizeof(buffer), "%d", val.intval);
> + result = pstrdup(buffer);
> + break;
> + case PGC_REAL:
> + snprintf(buffer, sizeof(buffer), "%g", val.realval);
> + result = pstrdup(buffer);
> + break;
> + case PGC_STRING:
> + p = val.stringval;
> + if (p == NULL)
> + p = "";
> + result = pstrdup(p);
> + break;

Is this a good idea? I wonder if we shouldn't instead return NULL, rather than
making NULL and "" undistinguishable.

Not that it matters for efficiency here, but why are you pstrdup'ing the
buffers? cstring_to_text() will already copy the string, no?


> +# parameter names that cannot get consistency check performed
> +my @ignored_parameters = (

Perhaps worth adding comments explaining why these can't get checked?


> +foreach my $line (split("\n", $all_params))
> +{
> + my @f = split('\|', $line);
> + fail("query returned wrong number of columns: $#f : $line") if ($#f != 
> 4);
> + $all_params_hash{$f[0]}->{type} = $f[1];
> + $all_params_hash{$f[0]}->{unit} = $f[2];
> + $all_params_hash{$f[0]}->{bootval} = $f[3];
> +}
>

Might look a bit nicer to generate the hash in a local variable and then
assign to $all_params_hash{$f[0]} once, rather than repeating that part
multiple times.


> - if ($line =~ m/^#?([_[:alpha:]]+) = .*/)
> + if ($line =~ m/^#?([_[:alpha:]]+) = (.*)$/)
>   {
>   # Lower-case conversion matters for some of the GUCs.
>   my $param_name = lc($1);
>  
> + # extract value
> + my $file_value = $2;
> + $file_value =~ s/\s*#.*$//; # strip trailing comment
> + $file_value =~ s/^'(.*)'$/$1/;  # strip quotes
> +
>   # Ignore some exceptions.
>   next if $param_name eq "include";
>   next if $param_name eq "include_dir";

So there's now two ignore mechanisms? Why not just handle include[_dir] via
@ignored_parameters?


> @@ -66,19 +94,39 @@ while (my $line = <$contents>)
>   # Update the list of GUCs found in the sample file, for the
>   # follow-up tests.
>   push @gucs_in_file, $param_name;
> +
> + # Check for consistency between bootval and file value.

You're not checking the consistency here though?


> + if (!grep { $_ eq $param_name } @ignored_parameters)
> + {
> + push (@check_elems, "('$param_name','$file_value')");
> + }
>   }
>  }

>  
>  close $contents;
>  
> +# Run consistency check between config-file's default value and boot
> +# values.  To show sample setting that is not found in the view, use
> +# LEFT JOIN and make sure pg_settings.name is not NULL.
> +my $check_query =
> +  'SELECT f.n, f.v, s.boot_val FROM (VALUES '.
> +  join(',', @check_elems).
> +  ') f(n,v) LEFT JOIN pg_settings s ON lower(s.name) = f.n '.
> +  "WHERE pg_config_unitless_value(f.n, f.v) <> COALESCE(s.boot_val, '') ".
> +  'OR s.name IS NULL';
> +
> +print $check_query;
> +
> +is ($node->safe_psql('postgres', $check_query), '',
> + 'check if fileval-bootval consistency is fine');

"fileval-bootval" isn't that easy to understand, "is fine" doesn't quite sound
right. Maybe something like "GUC values in .sample and boot value match"?


I wonder if it'd not result in easier to understand output if the query just
called pg_config_unitless_value() for all the .sample values, but then did the
comparison of the results in perl.


Greetings,

Andres Freund




Re: Tightening behaviour for non-immutable behaviour in immutable functions

2022-06-22 Thread Greg Stark
On Thu, 16 Jun 2022 at 12:04, Greg Stark  wrote:
>
> Providing a fixed up search_path for users would give them a smoother
> upgrade process where we can give a warning only if the search_path is
> changed substantively which is much less likely.
>
> I'm still quite concerned about the performance impact of having
> search_path on so many functions. It causes a cache flush which could
> be pretty painful on a frequently called function such as one in an
> index expression during a data load

So it seems I missed a small change in Postgres SQL function world,
namely the SQL standard syntax and prosqlbody column from e717a9a18.

This feature is huge. It's awesome! It basically provides the lexical
scoping feature I was hoping to implement. Any sql immutable standard
syntax sql function can be safely used in indexes or elsewhere
regardless of your search_path as all the names are already resolved.

I'm now thinking we should just provide a LEXICAL option on Postgres
style functions to implement the same name path and store sqlbody for
them as well. They would have to be bound by the same restrictions
(notably no polymorphic parameters) but otherwise I think it should be
straightforward.

Functions defined this way would always be safe for pg_dump regardless
of the search_path used to define them and would also protect users
from accidentally corrupting indexes when users have different
search_paths.

This doesn't really address plpgsql functions of course, I doubt we
can do the same thing.


--
greg




Re: Support logical replication of DDLs

2022-06-22 Thread Zheng Li
> Here are some points in my mind about the two approaches discussed here.
>
> 1) search_patch vs schema qualify
>
> Again, I still think it will bring more flexibility and security by schema 
> qualify the
> objects in DDL command as mentioned before[1].

I wonder what security concerns you have? We certainly don't want to
log the search_path
if there are serious security issues.

> Besides, a schema qualified DDL is also more appropriate for other use
> cases(e.g. a table-level replication). As it's possible the schema is 
> different
> between pub/sub and it's easy to cause unexpected and undetectable failure if
> we just log the search_path.
>
> It makes more sense to me to have the same style WAL log(schema qualified) for
> both database level or table level replication as it will bring more
> flexibility.

I think it's reasonable to consider using different formats for the
two different use cases.
Especially if the space and time overhead of the deparser format
sticks out. I also don't
think we need to use the deparser for global objects DDL such as ROLE
statements because
no schema qualification is needed. Also another issue with ROLE
statements is that they
are not captured by event triggers currently.

> > "Create Table As .." is already handled by setting the skipData flag of the
> > statement parsetreee before replay:
>
> 2) About the handling of CREATE TABLE AS:
>
> I think it's not a appropriate approach to set the skipdata flag on subscriber
> as it cannot handle EXECUTE command in CTAS.
>
> CREATE TABLE q5_prep_results AS EXECUTE q5(200, 'DT');
>
> The Prepared statement is a temporary object which we don't replicate. So if
> you directly execute the original SQL on subscriber, even if you set skipdata
> it will fail.
>
> I think it difficult to make this work as you need handle the create/drop of
> this prepared statement. And even if we extended subscriber's code to make it
> work, it doesn't seems like a standard and elegant approach.

This is indeed an interesting case, thanks for pointing this out. One
light weight solution
I can think of is to directly deparse the parsetree on the publisher
into a simple CREATE TABLE statement
without the prepared statement and then replicate the simple CREATE
TABLE statement .
This doesn't have to involve the json format though.

> > "Alter Table .. " that rewrites with volatile expressions can also be 
> > handled
> > without any syntax change, by enabling the table rewrite replication and
> > converting the rewrite inserts to updates. ZJ's patch introduced this 
> > solution.
>
> 3) About the handling of ALTER TABLE rewrite.
>
> The approach I proposed before is based on the event trigger + deparser
> approach. We were able to improve that approach as we don't need to replicate
> the rewrite in many cases. For example: we don't need to replicate rewrite dml
> if there is no volatile/mutable function. We should check and filter these 
> case
> at publisher (e.g. via deparser) instead of checking that at subscriber.

Surely we can make the check about volatile/mutable functions on the
publisher side
as well. It doesn't have to be done via the deparser.

> Besides, as discussed, we need to give warning or error for the cases when DDL
> contains volatile function which would be executed[2]. We should check this at
> publisher as well(via deparser).

Again, I think the check doesn't have to be done via the deparser.

> > I think for such cases it's not full database replication and we could 
> > treat it as
> > table level DDL replication, i.e. use the the deparser format.
>
> 4) I think the point could be that we should make the WAL log format 
> extendable
> so that we can extend it to support more useful feature(table filter/schema
> maps/DDL filter). If we just WAL log the original SQL, it seems it's difficult
> to extend it in the future ?

My point is that for full replication/version upgrade use cases we
don't need to worry about extending
it for features such as schema mapping. Because such use cases
naturally want to keep identical schema
structures.

With Regards,
Zheng




Re: SLRUs in the main buffer pool - Page Header definitions

2022-06-22 Thread Andres Freund
Hi,

On 2022-06-22 21:06:29 +, Bagga, Rishu wrote:
> 3. A flag to identify if the page is a relational or BufferedObject

Why is this needed in the page header?


> Using the new BufferedObject page header will be space efficient but
> introduces a significant change in the codebase to now track two types
> of page header data. During upgrade, all SLRU files that exist on the
> system must be converted to the new format with page header. This will
> require rewriting all the SLRU pages with the page header as part of
> pg_upgrade.

How are you proposing to deal with this in the "key" to "offset in SLRU"
mapping? E.g. converting a xid to an offset in the pg_xact SLRU. I assume
you're thinking to deal with this by making the conversion math a bit more
complicated?

Greetings,

Andres Freund




SLRUs in the main buffer pool - Page Header definitions

2022-06-22 Thread Bagga, Rishu
Hi all,

PostgreSQL currently maintains several data structures in the SLRU
cache. The SLRU cache has scaling and sizing challenges because of it’s
simple implementation. The goal is to move these caches to the common
buffer cache to benefit from the stronger capabilities of the common
buffercache code. At AWS, we are building on the patch shared by Thomas
Munro [1], which treats the SLRU pages as part of a pseudo-databatabe
of ID 9. We will refer to the pages belonging to SLRU components as
BufferedObject pages going forward.

The current SLRU pages do not have any header, so there is a need to
create a new page header format for these. Our investigations revealed
that we need to:

1. track LSN to ensure durability and consistency of all pages (for redo
   and full page write purposes)
2. have a checksum (for page correctness verification).
3. A flag to identify if the page is a relational or BufferedObject
4. Track version information.

We are suggesting a minimal BufferedObject page header
to be the following, overlapping with the key fields near the beginning
of the regular PageHeaderData:

typedef struct BufferedObjectPageHeaderData
{
    PageXLogRecPtr pd_lsn;
    uint16_t       pd_checksum;
    uint16_t       pd_flags;
    uint16_t       pd_pagesize_version;
} BufferedObjectPageHeaderData;

For reference, the regular page header looks like the following:
typedef struct PageHeaderData
{
    PageXLogRecPtr    pd_lsn;
    uint16_t    pd_checksum;
    uint16_t    pd_flags;
    LocationIndex   pd_lower;
    LocationIndex   pd_upper;
    LocationIndex   pd_special;
    uint16_t           pd_pagesize_version;
    TransactionId   pd_prune_xid;
ItemIdDataCommon  pd_linp[];
} PageHeaderData;

After careful review, we have trimmed out the heap and index specific
fields from the suggested header that do not add any value to SLRU
components. We plan to use pd_lsn, pd_checksum, and pd_pagesize_version
in the same way that they are in relational pages. These fields are
needed to ensure consistency, durability and page correctness.

We will use the 4th bit of pd_flags to identify a BufferedObject page.
If the bit is set then this denotes a BufferedObject page. Today, bits
1 - 3 are used for determining if there are any free line pointers, if
the page is full, and if all tuples on the page are visible to
everyone, respectively. We will use this information accordingly in the
storage manager to determine which callback functions to use for file
I/O operations. This approach allows the buffercache to have an
universal method to quickly determine what type of page it is dealing
with at any time.

Using the new BufferedObject page header will be space efficient but
introduces a significant change in the codebase to now track two types
of page header data. During upgrade, all SLRU files that exist on the
system must be converted to the new format with page header. This will
require rewriting all the SLRU pages with the page header as part of
pg_upgrade.

We believe that this is the correct approach for the long run. We would
love feedback if there are additional items of data that should be
tracked as well. Alternatively, we could re-use the existing page
header and the unused fields could be used as a padding. This feels
like an unclean approach but would avoid having two page header types
in the database.



[1] - 
https://www.postgresql.org/message-id/flat/ca+hukgkayze99b-jk9nomp-2bdqagirc4ojv+bfxghngdie...@mail.gmail.com



Discussed with: Joe Conway, Nathan Bossart, Shawn Debnath


Rishu Bagga

Amazon Web Services (AWS)





Re: explain analyze rows=%.0f

2022-06-22 Thread Ibrar Ahmed
On Thu, Jun 23, 2022 at 1:04 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Wed, Jun 22, 2022 at 12:11 PM Ibrar Ahmed 
> wrote:
>
>> On Thu, Jun 23, 2022 at 12:01 AM Tom Lane  wrote:
>>
>>> Robert Haas  writes:
>>> > On Jun 2, 2009, at 9:41 AM, Simon Riggs  wrote:
>>> >> You're right that the number of significant digits already exceeds the
>>> >> true accuracy of the computation. I think what Robert wants to see is
>>> >> the exact value used in the calc, so the estimates can be checked more
>>> >> thoroughly than is currently possible.
>>>
>>> > Bingo.
>>>
>>> Uh, the planner's estimate *is* an integer.  What was under discussion
>>> (I thought) was showing some fractional digits in the case where EXPLAIN
>>> ANALYZE is outputting a measured row count that is an average over
>>> multiple loops, and therefore isn't necessarily an integer.  In that
>>> case the measured value can be considered arbitrarily precise --- though
>>> I think in practice one or two fractional digits would be plenty.
>>>
>>> regards, tom lane
>>>
>>>
>>> Hi,
>> I was looking at the TODO list and found that the issue requires
>> a quick fix. Attached is a patch which shows output like this.
>>
>
> Quick code review:
>
> + "actual rows=%.0f loops=%.0f": " rows=%.2f loops=%.0f",
>
> The leading space before the else block "rows" does not belong.
>
> There should be a space after the colon.
>
> Thanks, David for your quick response. I have updated the patch.


> The word "actual" that you are dropping in the else block seems like it
> should belong - it is a header for the entire section not just a modifier
> for the word "rows".  This is evidenced by the timing block verbiage where
> rows is standalone and the word actual comes before time.  In short, only
> the format specifier should change under the current scheme.  Both sections.
>
> - WRITE_FLOAT_FIELD(rows, "%.0f");
> + WRITE_FLOAT_FIELD(rows, "%.2f");
>
> This one looks suspicious, though I haven't dug into the code to see
> exactly what all is being touched.  That it doesn't have an nloops
> condition like everything else stands out.
>
> I was also thinking about that, but I don't see any harm when we
ultimately truncating that decimal
at a latter stage of code in case of loop = 1.


> Tooling that expects an integer is the only downside I see here, but I
> concur that the usability of always showing two decimal places when nloops
> > 1 overcomes any objection I have on those grounds.
>
> David J.
>
>

-- 
Ibrar Ahmed


explain_float_row_v2.patch
Description: Binary data


Re: replacing role-level NOINHERIT with a grant-level option

2022-06-22 Thread Robert Haas
On Thu, Jun 2, 2022 at 1:17 PM Tom Lane  wrote:
> Point 2 would cause every existing pg_dumpall script to fail, which
> seems like kind of a large gotcha.  Less unpleasant alternatives
> could include
>
> * Continue to accept the syntax, but ignore it, maybe with a WARNING
> for the alternative that doesn't correspond to the new behavior.
>
> * Keep pg_authid.rolinherit, and have it act as supplying the default
> behavior for subsequent GRANTs to that role.

Here's a rather small patch that does it the first way.

I know that Peter Eisentraut didn't like the idea of removing the
role-level option, but I don't know why, so I don't know whether doing
this the second way instead would address his objection.

I suppose if we did it the second way, we could make the syntax GRANT
granted_role TO recipient_role WITH INHERIT { TRUE | FALSE | DEFAULT
}, and DEFAULT would copy the current value of the rolinherit
property, so that changing the rolinherit property later would not
affect previous grants. The reverse is also possible: with the same
syntax, the rolinherit column could be changed from bool to "char",
storing t/f/d, and 'd' could mean the value of the rolinherit property
at time of use; thus, changing rolinherit would affect previous grants
performed using WITH INHERIT DEFAULT but not those that specified WITH
INHERIT TRUE or WITH INHERIT FALSE.

In some sense, this whole scheme seems backwards to me: wouldn't it be
more natural if the default were based on the role being granted
rather than the role to which it is granted? If I decide to GRANT
some_predefined_role TO some_user, it seems like I am really likely to
want the INHERIT behavior, whereas if I GRANT some_other_user TO
some_user, it could go either way depending on the use case. But this
may be water under the bridge: introducing a way to set the default
that is backwards from the way that the current role-level property
works may be too confusing to be worthy of any consideration at all.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0002-Replace-NO-INHERIT-property-for-roles-with-a-gran.patch
Description: Binary data


v1-0001-Fake-version-stamp.patch
Description: Binary data


Re: explain analyze rows=%.0f

2022-06-22 Thread David G. Johnston
On Wed, Jun 22, 2022 at 12:11 PM Ibrar Ahmed  wrote:

> On Thu, Jun 23, 2022 at 12:01 AM Tom Lane  wrote:
>
>> Robert Haas  writes:
>> > On Jun 2, 2009, at 9:41 AM, Simon Riggs  wrote:
>> >> You're right that the number of significant digits already exceeds the
>> >> true accuracy of the computation. I think what Robert wants to see is
>> >> the exact value used in the calc, so the estimates can be checked more
>> >> thoroughly than is currently possible.
>>
>> > Bingo.
>>
>> Uh, the planner's estimate *is* an integer.  What was under discussion
>> (I thought) was showing some fractional digits in the case where EXPLAIN
>> ANALYZE is outputting a measured row count that is an average over
>> multiple loops, and therefore isn't necessarily an integer.  In that
>> case the measured value can be considered arbitrarily precise --- though
>> I think in practice one or two fractional digits would be plenty.
>>
>> regards, tom lane
>>
>>
>> Hi,
> I was looking at the TODO list and found that the issue requires
> a quick fix. Attached is a patch which shows output like this.
>

Quick code review:

+ "actual rows=%.0f loops=%.0f": " rows=%.2f loops=%.0f",

The leading space before the else block "rows" does not belong.

There should be a space after the colon.

The word "actual" that you are dropping in the else block seems like it
should belong - it is a header for the entire section not just a modifier
for the word "rows".  This is evidenced by the timing block verbiage where
rows is standalone and the word actual comes before time.  In short, only
the format specifier should change under the current scheme.  Both sections.

- WRITE_FLOAT_FIELD(rows, "%.0f");
+ WRITE_FLOAT_FIELD(rows, "%.2f");

This one looks suspicious, though I haven't dug into the code to see
exactly what all is being touched.  That it doesn't have an nloops
condition like everything else stands out.

Tooling that expects an integer is the only downside I see here, but I
concur that the usability of always showing two decimal places when nloops
> 1 overcomes any objection I have on those grounds.

David J.


Re: gcc -ftabstop option

2022-06-22 Thread Daniel Gustafsson
> On 21 Jun 2022, at 12:49, Peter Eisentraut 
>  wrote:

> Second, it enables the compiler's detection of confusingly indented code to 
> work more correctly.

Wouldn't we also need to add -Wmisleading-indentation for this to trigger any
warnings?  Adding -ftabstop only allows the compiler to be able to properly
detect it.

--
Daniel Gustafsson   https://vmware.com/





Re: Amcheck verification of GiST and GIN

2022-06-22 Thread Nikolay Samokhvalov
On Wed, Jun 22, 2022 at 11:35 AM Andrey Borodin 
wrote:

> > On 30 May 2022, at 12:40, Andrey Borodin  wrote:
> >
> > What do you think?
>
> Hi Andrey!
>

Hi Andrey!

Since you're talking to yourself, just wanted to support you – this is an
important thing, definitely should be very useful for many projects; I hope
to find time to test it in the next few days.

Thanks for working on it.


Re: explain analyze rows=%.0f

2022-06-22 Thread Ibrar Ahmed
On Thu, Jun 23, 2022 at 12:01 AM Tom Lane  wrote:

> Robert Haas  writes:
> > On Jun 2, 2009, at 9:41 AM, Simon Riggs  wrote:
> >> You're right that the number of significant digits already exceeds the
> >> true accuracy of the computation. I think what Robert wants to see is
> >> the exact value used in the calc, so the estimates can be checked more
> >> thoroughly than is currently possible.
>
> > Bingo.
>
> Uh, the planner's estimate *is* an integer.  What was under discussion
> (I thought) was showing some fractional digits in the case where EXPLAIN
> ANALYZE is outputting a measured row count that is an average over
> multiple loops, and therefore isn't necessarily an integer.  In that
> case the measured value can be considered arbitrarily precise --- though
> I think in practice one or two fractional digits would be plenty.
>
> regards, tom lane
>
>
> Hi,
I was looking at the TODO list and found that the issue requires
a quick fix. Attached is a patch which shows output like this. It shows the
fraction digits in case of loops > 1

postgres=# explain analyze select * from foo;
  QUERY PLAN
--
 Seq Scan on foo  (cost=0.00..64414.79 rows=2326379 width=8) (actual
time=0.025..277.096 rows=2344671 loops=1
 Planning Time: 0.516 ms
 Execution Time: 356.993 ms
(3 rows)

postgres=# explain analyze select * from foo where b = (select c from
bar where c = 1);
 QUERY PLAN

 Seq Scan on foo  (cost=8094.37..78325.11 rows=2326379 width=8)
(actual time=72.352..519.159 rows=2344671 loops=1
   Filter: (b = $1)
   InitPlan 1 (returns $1)
 ->  Gather  (cost=1000.00..8094.37 rows=1 width=4) (actual
time=0.872..72.434 rows=1 loops=1
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on bar  (cost=0.00..7094.27 rows=1
width=4) (actual time=41.931..65.382 rows=0.33 loops=3)
 Filter: (c = 1)
 Rows Removed by Filter: 245457
 Planning Time: 0.277 ms
 Execution Time: 597.795 ms
(11 rows)



-- 
Ibrar Ahmed


explain_float_row.patch
Description: Binary data


Re: Column Redaction

2022-06-22 Thread Ibrar Ahmed
On Wed, Jun 22, 2022 at 11:53 PM Simon Riggs  wrote:

> Postgres currently supports column level SELECT privileges.
>
> 1. If we want to confirm a credit card number, we can issue SELECT 1
> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>
> 2. If we want to look for card fraud, we need to be able to use the
> full card number to join to transaction data and look up blocked card
> lists etc..
>
> 3. We want to block the direct retrieval of card numbers for
> additional security.
> In some cases, we might want to return an answer like ' * 
> 7733'
>
> We can't do all of the above with current facilities inside the database.
>
> The ability to mask output for data in certain cases, for the purpose
> of security, is known lately as data redaction, or column-level data
> redaction.
>
> The best way to support this requirement would be to allow columns to
> have an additional "output formatting function". This would be
> executed only when data is about to be returned by a query. All other
> uses of that would not restrict the data.
>
> This would have other uses as well, such as default report formats, so
> we can store financial amounts as NUMERIC, but format them on
> retrieval as $12,345.78 etc..
>
> Suggested user interface would be...
>FORMAT functionname(parameters, if any)
>
> e.g.
> CREATE TABLE customer
> ( id ...
> ...
> , stored_card_number  NUMERIC FORMAT pci_card_number_redaction()
> ...
> );
>
> We'd need to implement something to allow pg_dump to ignore format
> functions. I suggest the best way to do that is by providing a BACKUP
> role that can be delegated to other users. We would then allow a
> parameter for SET output_formatting = on | off, which can only be set
> by superuser and BACKUP role, then have pg_dump issue SET
> output_formatting = off explicitly when it runs.
>
> Do we want redaction in PostgreSQL?
> Do we want it generalised into output format functions?
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hack...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
> Hi,
Do we still have some interest in this? People generally like that
the idea, if yes I am happy to work on that and can send the complete
design first.

-- 
Ibrar Ahmed


Re: Amcheck verification of GiST and GIN

2022-06-22 Thread Andrey Borodin


> On 30 May 2022, at 12:40, Andrey Borodin  wrote:
> 
> What do you think?

Hi Andrey!

Here's a version with better tests. I've made sure that GiST tests actually 
trigger page reuse after deletion. And enhanced comments in both GiST and GIN 
test scripts. I hope you'll like it.

GIN heapallindexed is still a no-op check. Looking forward to hear any ideas on 
what it could be.


Best regards, Andrey Borodin.


v11-0001-Amcheck-for-GIN-and-GiST.patch
Description: Binary data




Re: Devel docs on website reloading

2022-06-22 Thread Peter Geoghegan
On Wed, Jun 22, 2022 at 8:45 AM Bruce Momjian  wrote:
> That is a big help for committers who want to email a URL of the new
> doc commit output.

Yes, it's a nice "quality of life" improvement.

Thanks for finally getting this done!
-- 
Peter Geoghegan




Re: amcheck is using a wrong macro to check compressed-ness

2022-06-22 Thread Robert Haas
On Tue, Jun 21, 2022 at 9:56 PM Michael Paquier  wrote:
> On Thu, Jun 09, 2022 at 10:48:27AM +0900, Michael Paquier wrote:
> > Three weeks later, ping.  Robert, could you look at this thread?
>
> And again.  beta2 is planned to next week, and this is still an open
> item.  I could look at that by myself, but I always tend to get easily
> confused with all the VARATT macros when it comes to compressed blobs,
> so it would take a bit of time.

Oops, I missed this thread. I think the patch is correct, so I have
committed it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: SYSTEM_USER reserved word implementation

2022-06-22 Thread David G. Johnston
On Wed, Jun 22, 2022 at 9:28 AM Tom Lane  wrote:

> Joe Conway  writes:
> > On 6/22/22 11:52, Tom Lane wrote:
> >> I think a case could be made for ONLY returning non-null when authn_id
> >> represents some externally-verified identifier (OS user ID gotten via
> >> peer identification, Kerberos principal, etc).
>
> > But -1 on that.
>
> > I think any time we have a non-null authn_id we should expose it. Are
> > there examples of cases when we have authn_id but for some reason don't
> > trust the value of it?
>
> I'm more concerned about whether we have a consistent story about what
> SYSTEM_USER means (another way of saying "what type is it").  If it's
> just the same as SESSION_USER it doesn't seem like we've added much.
>
> Maybe, instead of just being the raw user identifier, it should be
> something like "auth_method:user_identifier" so that one can tell
> what the identifier actually is and how it was verified.
>
>
I was thinking this was trying to make the following possible:

psql -U postgres
# set session authorization other_superuser;
# set role other_role;
# select system_user, session_user, current_user;
postgres | other_superuser | other_role

Though admittedly using "system" for that seems somehow wrong.
connection_user would make more sense.  Then the system_user would be, if
applicable, an external identifier that got matched with the assigned
connection_user.  I can definitely see having the external identifier be a
structured value.

David J.


Re: SYSTEM_USER reserved word implementation

2022-06-22 Thread Jacob Champion
On Wed, Jun 22, 2022 at 9:26 AM Joe Conway  wrote:
> On 6/22/22 11:35, Jacob Champion wrote:
> > On Wed, Jun 22, 2022 at 8:10 AM Joe Conway  wrote:
> Why would you want to do it differently than
> SessionUserId/OuterUserId/CurrentUserId? It is analogous, no?

Like I said, now there are two different sources of truth, and
additional code to sync the two, and two different APIs to set what
should be a single write-once attribute. But if SystemUser is instead
derived from authn_id, like what's just been proposed with
`method:authn_id`, I think there's a better argument for separating
the two.

--Jacob




Re: SYSTEM_USER reserved word implementation

2022-06-22 Thread Joe Conway

On 6/22/22 12:28, Tom Lane wrote:

Joe Conway  writes:

On 6/22/22 11:52, Tom Lane wrote:

I think a case could be made for ONLY returning non-null when authn_id
represents some externally-verified identifier (OS user ID gotten via
peer identification, Kerberos principal, etc).



But -1 on that.


I think any time we have a non-null authn_id we should expose it. Are 
there examples of cases when we have authn_id but for some reason don't 
trust the value of it?


I'm more concerned about whether we have a consistent story about what
SYSTEM_USER means (another way of saying "what type is it").  If it's
just the same as SESSION_USER it doesn't seem like we've added much.

Maybe, instead of just being the raw user identifier, it should be
something like "auth_method:user_identifier" so that one can tell
what the identifier actually is and how it was verified.


Oh, that's an interesting thought -- I like that.

--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: SYSTEM_USER reserved word implementation

2022-06-22 Thread Tom Lane
Joe Conway  writes:
> On 6/22/22 11:52, Tom Lane wrote:
>> I think a case could be made for ONLY returning non-null when authn_id
>> represents some externally-verified identifier (OS user ID gotten via
>> peer identification, Kerberos principal, etc).

> But -1 on that.

> I think any time we have a non-null authn_id we should expose it. Are 
> there examples of cases when we have authn_id but for some reason don't 
> trust the value of it?

I'm more concerned about whether we have a consistent story about what
SYSTEM_USER means (another way of saying "what type is it").  If it's
just the same as SESSION_USER it doesn't seem like we've added much.

Maybe, instead of just being the raw user identifier, it should be
something like "auth_method:user_identifier" so that one can tell
what the identifier actually is and how it was verified.

regards, tom lane




Re: SYSTEM_USER reserved word implementation

2022-06-22 Thread Joe Conway

On 6/22/22 11:35, Jacob Champion wrote:

On Wed, Jun 22, 2022 at 8:10 AM Joe Conway  wrote:

--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -473,6 +473,7 @@ static OidAuthenticatedUserId = InvalidOid;
 static OidSessionUserId = InvalidOid;
 static OidOuterUserId = InvalidOid;
 static OidCurrentUserId = InvalidOid;
+static const char *SystemUser = NULL;

 /* We also have to remember the superuser state of some of these levels */
 static bool AuthenticatedUserIsSuperuser = false;


What's the rationale for introducing a new global for this? A downside
is that now there are two sources of truth, for a security-critical
attribute of the connection.


Why would you want to do it differently than 
SessionUserId/OuterUserId/CurrentUserId? It is analogous, no?


--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: SYSTEM_USER reserved word implementation

2022-06-22 Thread Joe Conway

On 6/22/22 11:52, Tom Lane wrote:

Jacob Champion  writes:

On Wed, Jun 22, 2022 at 8:10 AM Joe Conway  wrote:

In case port->authn_id is NULL then the patch is returning the SESSION_USER for 
the SYSTEM_USER. Perhaps it should return NULL instead.



If the spec says that SYSTEM_USER "represents the operating system
user", but we don't actually know who that user was (authn_id is
NULL), then I think SYSTEM_USER should also be NULL so as not to
mislead auditors.


Yeah, that seems like a fundamental type mismatch.  If we don't know
the OS user identifier, substituting a SQL role name is surely not
the right thing.


+1 agreed


I think a case could be made for ONLY returning non-null when authn_id
represents some externally-verified identifier (OS user ID gotten via
peer identification, Kerberos principal, etc).


But -1 on that.

I think any time we have a non-null authn_id we should expose it. Are 
there examples of cases when we have authn_id but for some reason don't 
trust the value of it?



--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: SYSTEM_USER reserved word implementation

2022-06-22 Thread Tom Lane
Jacob Champion  writes:
> On Wed, Jun 22, 2022 at 8:10 AM Joe Conway  wrote:
>> In case port->authn_id is NULL then the patch is returning the SESSION_USER 
>> for the SYSTEM_USER. Perhaps it should return NULL instead.

> If the spec says that SYSTEM_USER "represents the operating system
> user", but we don't actually know who that user was (authn_id is
> NULL), then I think SYSTEM_USER should also be NULL so as not to
> mislead auditors.

Yeah, that seems like a fundamental type mismatch.  If we don't know
the OS user identifier, substituting a SQL role name is surely not
the right thing.

I think a case could be made for ONLY returning non-null when authn_id
represents some externally-verified identifier (OS user ID gotten via
peer identification, Kerberos principal, etc).

regards, tom lane




Re: Devel docs on website reloading

2022-06-22 Thread Bruce Momjian
On Tue, Jun 21, 2022 at 11:43:40AM +0200, Magnus Hagander wrote:
> On Wed, Nov 18, 2020 at 1:44 PM Magnus Hagander  wrote:
> No that'd be unrelated. We don't have a dedicated buildfarm animal for
> it, we just piggyback on the existing run, which runs on any changes,
> not just docs.
> 
> Less than 2 years later this is now actually done. That is, we are now loading
> the docs from a git checkout instead of a tarball, which means the devel docs
> are now stamped with the git revision that they were built from (and they're
> only rebuilt when something changes the docs, but when they do, it should
> normally take <2 minutes for the new docs to appear on the website)

That is a big help for committers who want to email a URL of the new
doc commit output.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: SYSTEM_USER reserved word implementation

2022-06-22 Thread Jacob Champion
On Wed, Jun 22, 2022 at 8:10 AM Joe Conway  wrote:
> On the contrary, I would argue that not having the identifier for the
> external "user" available is a security concern. Ideally you want to be
> able to trace actions inside Postgres to the actual user that invoked them.

If auditing is also the use case for SYSTEM_USER, you'll probably want
to review the arguments for making it available to parallel workers
that were made in the other thread [1].

Initial comments on the patch:

> In case port->authn_id is NULL then the patch is returning the SESSION_USER 
> for the SYSTEM_USER. Perhaps it should return NULL instead.

If the spec says that SYSTEM_USER "represents the operating system
user", but we don't actually know who that user was (authn_id is
NULL), then I think SYSTEM_USER should also be NULL so as not to
mislead auditors.

> --- a/src/backend/utils/init/miscinit.c
> +++ b/src/backend/utils/init/miscinit.c
> @@ -473,6 +473,7 @@ static OidAuthenticatedUserId = InvalidOid;
>  static OidSessionUserId = InvalidOid;
>  static OidOuterUserId = InvalidOid;
>  static OidCurrentUserId = InvalidOid;
> +static const char *SystemUser = NULL;
>
>  /* We also have to remember the superuser state of some of these levels */
>  static bool AuthenticatedUserIsSuperuser = false;

What's the rationale for introducing a new global for this? A downside
is that now there are two sources of truth, for a security-critical
attribute of the connection.

--Jacob

[1] 
https://www.postgresql.org/message-id/flat/793d990837ae5c06a558d58d62de9378ab525d83.camel%40vmware.com




Re: SYSTEM_USER reserved word implementation

2022-06-22 Thread Tom Lane
Joe Conway  writes:
> On 6/22/22 10:51, Tom Lane wrote:
>> My immediate guess would be that the SQL committee only intends
>> to deal in SQL role names and therefore SYSTEM_USER is defined
>> to return one of those, but I've not gone looking in the spec
>> to be sure.

> I only have a draft copy, but in SQL 2016 I find relatively thin 
> documentation for what SYSTEM_USER is supposed to represent:

>The value specified by SYSTEM_USER is equal to an
>implementation-defined string that represents the
>operating system user who executed the SQL-client
>module that contains the externally-invoked procedure
>whose execution caused the SYSTEM_USER specification> to be evaluated.

Huh.  Okay, if it's implementation-defined then we can define it
as "whatever auth.c put into authn_id".  Objection withdrawn.

regards, tom lane




Re: SYSTEM_USER reserved word implementation

2022-06-22 Thread Joe Conway

On 6/22/22 10:51, Tom Lane wrote:

My immediate guess would be that the SQL committee only intends
to deal in SQL role names and therefore SYSTEM_USER is defined
to return one of those, but I've not gone looking in the spec
to be sure.


I only have a draft copy, but in SQL 2016 I find relatively thin 
documentation for what SYSTEM_USER is supposed to represent:


  The value specified by SYSTEM_USER is equal to an
  implementation-defined string that represents the
  operating system user who executed the SQL-client
  module that contains the externally-invoked procedure
  whose execution caused the SYSTEM_USER  to be evaluated.


I'm also not that clear on what we expect authn_id to be, but
a quick troll in the code makes it look like it's not necessarily
a SQL role name, but might be some external identifier such as a
Kerberos principal.  If that's the case I think it's going to be
inappropriate to use SQL-spec syntax to return it.  I don't object
to inventing some PG-specific function for the purpose, though.


To me the Kerberos principal makes perfect sense given the definition above.


BTW, are there any security concerns about exposing such identifiers?


On the contrary, I would argue that not having the identifier for the 
external "user" available is a security concern. Ideally you want to be 
able to trace actions inside Postgres to the actual user that invoked them.


--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Postgres perl module namespace

2022-06-22 Thread Andrew Dunstan


On 2022-06-22 We 03:21, Noah Misch wrote:
> On Tue, Apr 19, 2022 at 07:24:58PM -0400, Andrew Dunstan wrote:
>> On 2022-04-19 Tu 18:39, Michael Paquier wrote:
>>> +*generate_ascii_string = *TestLib::generate_ascii_string;
>>> +*slurp_dir = *TestLib::slurp_dir;
>>> +*slurp_file = *TestLib::slurp_file;
>>>
>>> I am not sure if it is possible and my perl-fu is limited in this
>>> area, but could a failure be enforced when loading this path if a new
>>> routine added in TestLib.pm is forgotten in this list?
>> Not very easily that I'm aware of, but maybe some superior perl wizard
>> will know better.
> One can alias the symbol table, like https://metacpan.org/pod/Package::Alias
> does.  I'm attaching what I plan to use.  Today, check-world fails after
>
>   sed -i 's/TestLib/PostgreSQL::Test::Utils/g; 
> s/PostgresNode/PostgreSQL::Test::Cluster/g' **/*.pl
>
> on REL_14_STABLE, because today's alias list is incomplete.  With this change,
> the same check-world passes.

Nice. 30 years of writing perl and I'm still learning of nifty features.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Skipping logical replication transactions on subscriber side

2022-06-22 Thread Robert Haas
On Wed, Jun 22, 2022 at 11:01 AM Tom Lane  wrote:
> Given that we haven't run into this before, it seems like a reasonable
> bet that the problem will seldom arise.  So as long as we have a
> cross-check I'm all right with calling it good and moving on.  Expending
> a whole lot of work to improve the situation seems uncalled-for.

All right. Well, I'm on record as not liking that solution, but
obviously you can and do feel differently.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Skipping logical replication transactions on subscriber side

2022-06-22 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jun 22, 2022 at 10:39 AM Tom Lane  wrote:
>> I don't especially want to
>> invent an additional typalign code that we could only test on legacy
>> platforms.

> I agree with that, but I don't think that having the developers
> enforce alignment rules by reordering catalog columns for the sake of
> legacy platforms is appealing either.

Given that we haven't run into this before, it seems like a reasonable
bet that the problem will seldom arise.  So as long as we have a
cross-check I'm all right with calling it good and moving on.  Expending
a whole lot of work to improve the situation seems uncalled-for.

When and if we get to a point where we're ready to break on-disk
compatibility for user tables, perhaps revisiting the alignment
rules would be an appropriate component of that.  I don't see that
happening in the foreseeable future, though.

regards, tom lane




Re: Skipping logical replication transactions on subscriber side

2022-06-22 Thread Robert Haas
On Wed, Jun 22, 2022 at 10:39 AM Tom Lane  wrote:
> That's a fundamental misreading of the situation.  typalign is essential
> on alignment-picky architectures, else you will get a SIGBUS fault
> when trying to fetch a multibyte value (whether it's just going to get
> stored into a Datum array is not very relevant here).

I mean, that problem is easily worked around. Maybe you think memcpy
would be a lot slower than a direct assignment, but "essential" is a
strong word.

> I concur that Noah's description of #2 is not an accurate statement
> of the rules we'd have to impose to be sure that the C structs line up
> with the actual tuple layouts.  I don't think we want rules exactly,
> what we need is mechanical verification that the field orderings in
> use are safe.  The last time I looked at this thread, what was being
> discussed was (a) re-ordering pg_subscription's columns and (b)
> adding some kind of regression test to verify that all catalogs meet
> the expectation of 'd'-aligned fields not needing alignment padding
> that an AIX compiler might choose not to insert.  That still seems
> like the most plausible answer to me.  I don't especially want to
> invent an additional typalign code that we could only test on legacy
> platforms.

I agree with that, but I don't think that having the developers
enforce alignment rules by reordering catalog columns for the sake of
legacy platforms is appealing either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: SYSTEM_USER reserved word implementation

2022-06-22 Thread Tom Lane
Joe Conway  writes:
> On 6/22/22 09:49, Tom Lane wrote:
>> On what grounds do you argue that that's the appropriate meaning of
>> SYSTEM_USER?

> What else do you imagine it might mean?

I was hoping for some citation of the SQL spec.

> Here is SQL Server interpretation for example:
>"SYSTEM_USER
>Returns the name of the current data store user
>as identified by the operating system."

Meh.  That's as clear as mud.  (a) A big part of the question here
is what is meant by "current" user, in the face of operations like
SET ROLE.  (b) "as identified by the operating system" does more to
confuse me than anything else.  The operating system only deals in
OS user names; does that wording mean that what you get back is an OS
user name rather than a SQL role name?

My immediate guess would be that the SQL committee only intends
to deal in SQL role names and therefore SYSTEM_USER is defined
to return one of those, but I've not gone looking in the spec
to be sure.

I'm also not that clear on what we expect authn_id to be, but
a quick troll in the code makes it look like it's not necessarily
a SQL role name, but might be some external identifier such as a
Kerberos principal.  If that's the case I think it's going to be
inappropriate to use SQL-spec syntax to return it.  I don't object
to inventing some PG-specific function for the purpose, though.

BTW, are there any security concerns about exposing such identifiers?

regards, tom lane




Re: Skipping logical replication transactions on subscriber side

2022-06-22 Thread Tom Lane
[ sorry for not having tracked this thread more closely ... ]

Robert Haas  writes:
> Regarding (1), it is my opinion that the only real value of typalign
> is for system catalogs, and specifically that it lets you put the
> fields in an order that is aesthetically pleasing rather than worrying
> about alignment considerations. After all, if we just ordered the
> fields by descending alignment requirement, we could get rid of
> typalign altogether (at least, if we didn't care about backward
> compatibility). User tables would get smaller because we'd get rid of
> alignment padding, and I don't think we'd see much impact on
> performance because, for user tables, we copy the values into a datum
> array before doing anything interesting with them. So (1) seems to me
> to be conceding that typalign is unfit for the only purpose it has.

That's a fundamental misreading of the situation.  typalign is essential
on alignment-picky architectures, else you will get a SIGBUS fault
when trying to fetch a multibyte value (whether it's just going to get
stored into a Datum array is not very relevant here).

It appears that what we've got on AIX is that typalign 'd' overstates the
actual alignment requirement for 'double', which is safe from the SIGBUS
angle.  However, it is a problem for our usage with system catalogs,
where our C struct declarations may not line up with the way that a
tuple is constructed by the tuple assembly routines.

I concur that Noah's description of #2 is not an accurate statement
of the rules we'd have to impose to be sure that the C structs line up
with the actual tuple layouts.  I don't think we want rules exactly,
what we need is mechanical verification that the field orderings in
use are safe.  The last time I looked at this thread, what was being
discussed was (a) re-ordering pg_subscription's columns and (b)
adding some kind of regression test to verify that all catalogs meet
the expectation of 'd'-aligned fields not needing alignment padding
that an AIX compiler might choose not to insert.  That still seems
like the most plausible answer to me.  I don't especially want to
invent an additional typalign code that we could only test on legacy
platforms.

regards, tom lane




Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

2022-06-22 Thread Robert Haas
On Mon, Jun 20, 2022 at 9:35 PM Kyotaro Horiguchi
 wrote:
> Unfortunately it doesn't work because we read a record already known
> to be complete again at the end of recovery.  It is the reason of
> "abortedRecPtr < xlogreader->EndRecPtr" in my PoC patch.  Without it,
> abrotedRecPtr is erased when it is actually needed.  I don't like that
> expedient-looking condition, but the strict condition for resetting
> abortedRecPtr is iff "we have read a complete record at the same or
> grater LSN ofabortedRecPtr"...

Yeah, we need to work around that somehow. I noticed the
record-rereading behavior when I was working on some patches for this
release cycle and I think what we ought to do is get rid of that. It
serves no purpose other than to make things complicated. However, we
shouldn't back-patch a change like that, I think, so we'll need to
work around the issue somehow. Perhaps it'd be better to return these
values to the caller somehow and then the caller can decide whether to
save them based on context. The last re-read can choose not to do so.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




ALTER TABLE uses a bistate but not for toast tables

2022-06-22 Thread Justin Pryzby
ATRewriteTable() calls table_tuple_insert() with a bistate, to avoid clobbering
and polluting the buffers.

But heap_insert() then calls 
heap_prepare_insert() >
heap_toast_insert_or_update >
toast_tuple_externalize >
toast_save_datum >
heap_insert(toastrel, toasttup, mycid, options, NULL /* without bistate:( */);

I came up with this patch.  I'm not sure but maybe it should be implemented at
the tableam layer and not inside heap.  Maybe the BulkInsertState should have a
2nd strategy buffer for toast tables.

CREATE TABLE t(i int, a text, b text, c text,d text,e text,f text,g text);
INSERT INTO t SELECT 0, 
array_agg(a),array_agg(a),array_agg(a),array_agg(a),array_agg(a),array_agg(a) 
FROM generate_series(1,999)n,repeat(n::text,99)a,generate_series(1,99)b GROUP 
BY b;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;

ALTER TABLE t ALTER i TYPE smallint;
SELECT COUNT(1), relname, COUNT(1) FILTER(WHERE isdirty) FROM pg_buffercache b 
JOIN pg_class c ON c.oid=b.relfilenode GROUP BY 2 ORDER BY 1 DESC LIMIT 9;

Without this patch:
postgres=# SELECT COUNT(1), relname, COUNT(1) FILTER(WHERE isdirty) FROM 
pg_buffercache b JOIN pg_class c ON c.oid=b.relfilenode GROUP BY 2 ORDER BY 1 
DESC LIMIT 9;
 10283 | pg_toast_55759  |  8967

With this patch:
  1418 | pg_toast_16597  |  1418

-- 
Justin
>From 2b156036856dcbeab00de819e5c6eff820b564cd Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 21 Jun 2022 22:28:06 -0500
Subject: [PATCH] WIP: use BulkInsertState for toast tuples, too

ci-os-only: linux
---
 src/backend/access/common/toast_internals.c | 16 ++--
 src/backend/access/heap/heapam.c| 14 --
 src/backend/access/heap/heaptoast.c | 11 +++
 src/backend/access/heap/rewriteheap.c   |  2 +-
 src/backend/access/table/toast_helper.c |  6 --
 src/include/access/heaptoast.h  |  4 +++-
 src/include/access/toast_helper.h   |  3 ++-
 src/include/access/toast_internals.h|  4 +++-
 8 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 576e585a89f..2640e6a1589 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -118,7 +118,8 @@ toast_compress_datum(Datum value, char cmethod)
  */
 Datum
 toast_save_datum(Relation rel, Datum value,
- struct varlena *oldexternal, int options)
+ struct varlena *oldexternal, int options,
+ BulkInsertState bistate)
 {
 	Relation	toastrel;
 	Relation   *toastidxs;
@@ -299,6 +300,10 @@ toast_save_datum(Relation rel, Datum value,
 	t_isnull[1] = false;
 	t_isnull[2] = false;
 
+	/* Release pin after main table, before switching to write to toast table */
+	if (bistate)
+		ReleaseBulkInsertStatePin(bistate);
+
 	/*
 	 * Split up the item into chunks
 	 */
@@ -321,7 +326,7 @@ toast_save_datum(Relation rel, Datum value,
 		memcpy(VARDATA(_data), data_p, chunk_size);
 		toasttup = heap_form_tuple(toasttupDesc, t_values, t_isnull);
 
-		heap_insert(toastrel, toasttup, mycid, options, NULL);
+		heap_insert(toastrel, toasttup, mycid, options, bistate);
 
 		/*
 		 * Create the index entry.  We cheat a little here by not using
@@ -358,6 +363,13 @@ toast_save_datum(Relation rel, Datum value,
 		data_p += chunk_size;
 	}
 
+	if (bistate)
+	{
+		table_finish_bulk_insert(toastrel, options); // XXX
+		/* Release pin after writing toast table before resuming writes to the main table */
+		ReleaseBulkInsertStatePin(bistate);
+	}
+
 	/*
 	 * Done - close toast relation and its indexes but keep the lock until
 	 * commit, so as a concurrent reindex done directly on the toast relation
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 74218510276..e46a6035068 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -73,7 +73,8 @@
 
 
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
-	 TransactionId xid, CommandId cid, int options);
+	 TransactionId xid, CommandId cid, int options,
+	 BulkInsertState bistate);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
   Buffer newbuf, HeapTuple oldtup,
   HeapTuple newtup, HeapTuple old_key_tuple,
@@ -2042,7 +2043,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	 * Note: below this point, heaptup is the data we actually intend to store
 	 * into the relation; tup is the caller's original untoasted data.
 	 */
-	heaptup = heap_prepare_insert(relation, tup, xid, cid, options);
+
+	heaptup = heap_prepare_insert(relation, tup, xid, cid, options, bistate);
 
 	/*
 	 * Find buffer to insert this tuple into.  If the page is all visible,
@@ -2212,7 +2214,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
  */
 static HeapTuple
 

Re: SYSTEM_USER reserved word implementation

2022-06-22 Thread Joe Conway

On 6/22/22 09:49, Tom Lane wrote:

"Drouvot, Bertrand"  writes:
Please find attached a patch proposal to make use of the SYSTEM_USER so 
that it returns the authenticated identity (if any) (aka authn_id in the 
Port struct).


On what grounds do you argue that that's the appropriate meaning of
SYSTEM_USER?



What else do you imagine it might mean?

Here is SQL Server interpretation for example:

https://docs.microsoft.com/en-us/sql/t-sql/functions/system-user-transact-sql?view=sql-server-ver16

And Oracle:
http://luna-ext.di.fc.ul.pt/oracle11g/timesten.112/e13070/ttsql257.htm#i1120532

  "SYSTEM_USER

  Returns the name of the current data store user
  as identified by the operating system."


Seems equivalent.


--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: pg15b1: FailedAssertion("val > base", File: "...src/include/utils/relptr.h", Line: 67, PID: 30485)

2022-06-22 Thread Robert Haas
On Wed, Jun 22, 2022 at 12:34 AM Thomas Munro  wrote:
> > For the record, the third idea proposed was to use 1 for the first
> > byte, so that 0 is reserved for NULL and works with memset(0).  Here's
> > an attempt at that.
>
> ... erm, though, duh, I forgot to adjust Assert(val > base).  One more time.

I like this idea and think this might have the side benefit of making
it harder to get away with accessing relptr_off directly.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Skipping logical replication transactions on subscriber side

2022-06-22 Thread Robert Haas
On Wed, Jun 22, 2022 at 12:28 AM Noah Misch  wrote:
> "Everything" isn't limited to PostgreSQL.  The Perl ABI exposes large structs
> to plperl; a field of type double could require the AIX user to rebuild Perl
> with the same compiler option.

Oh, that isn't so great, then.

> Here's how I rank the options, from most-preferred to least-preferred:
>
> 1. Put new eight-byte fields at the front of each catalog, when in doubt.
> 2. On systems where double alignment differs from int64 alignment, require
>NAMEDATALEN%8==0.  Upgrading to v16 would require dump/reload for AIX users
>changing NAMEDATALEN to conform to the new restriction.
> 3. Introduce new typalign values.  Upgrading to v16 would require dump/reload
>for all AIX users.
> 4. De-support AIX.
> 5. From above, "Somehow forcing values that are sometimes 4-byte aligned and
>sometimes 8-byte aligned to be 8-byte alignment on all platforms".
>Upgrading to v16 would require dump/reload for all AIX users.
> 6. Require -qalign=natural on AIX.  Upgrading to v16 would require dump/reload
>and possible system library rebuilds for all AIX users.
>
> I gather (1) isn't at the top of your ranking, or you wouldn't have written
> in.  What do you think of (2)?

(2) pleases me in the sense that it seems to inconvenience very few
people, perhaps no one, in order to avoid inconveniencing a larger
number of people. However, it doesn't seem sufficient. If I understand
correctly, even a catalog that includes no NameData column can have a
problem.

Regarding (1), it is my opinion that the only real value of typalign
is for system catalogs, and specifically that it lets you put the
fields in an order that is aesthetically pleasing rather than worrying
about alignment considerations. After all, if we just ordered the
fields by descending alignment requirement, we could get rid of
typalign altogether (at least, if we didn't care about backward
compatibility). User tables would get smaller because we'd get rid of
alignment padding, and I don't think we'd see much impact on
performance because, for user tables, we copy the values into a datum
array before doing anything interesting with them. So (1) seems to me
to be conceding that typalign is unfit for the only purpose it has.
Perhaps that's just how things are, but it doesn't seem like a good
way for things to be.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: SYSTEM_USER reserved word implementation

2022-06-22 Thread Tom Lane
"Drouvot, Bertrand"  writes:
> Please find attached a patch proposal to make use of the SYSTEM_USER so 
> that it returns the authenticated identity (if any) (aka authn_id in the 
> Port struct).

On what grounds do you argue that that's the appropriate meaning of
SYSTEM_USER?

regards, tom lane




Re: Unify DLSUFFIX on Darwin

2022-06-22 Thread Tom Lane
Peter Eisentraut  writes:
> macOS has traditionally used extension .dylib for shared libraries (used 
> at build time) and .so for dynamically loaded modules (used by 
> dlopen()).  This complicates the build system a bit.  Also, Meson uses 
> .dylib for both, so it would be worth unifying this in order to be able 
> to get equal build output.

> There doesn't appear to be any reason to use any particular extension 
> for dlopened modules, since dlopen() will accept anything and PostgreSQL 
> is well-factored to be able to deal with any extension.  Other software 
> packages that I have handy appear to be about 50/50 split on which 
> extension they use for their plugins.  So it seems possible to change 
> this safely.

Doesn't this amount to a fundamental ABI break for extensions?
Yesterday they had to ship foo.so, today they have to ship foo.dylib.

I'm not against the idea if we can avoid widespread extension
breakage, but that part seems like a problem.

regards, tom lane




Re: CREATE TABLE ( .. STORAGE ..)

2022-06-22 Thread Peter Eisentraut

On 29.03.22 22:28, Matthias van de Meent wrote:

As I'm new to the grammar files; would you know the difference between
`name` and `ColId`, and why you would change from one to the other in
ALTER COLUMN STORAGE?


The grammar says

name:   ColId   { $$ = $1; };

so it doesn't matter technically.

It seems we are using "name" mostly for names of objects, so I wouldn't 
use it here for storage or compression types.





Re: CREATE TABLE ( .. STORAGE ..)

2022-06-22 Thread Peter Eisentraut

On 15.06.22 16:51, Aleksander Alekseev wrote:

I noticed that cfbot is not entirely happy with the patch, so I rebased it.


I see that COMPRESSION and STORAGE now are handled slightly
differently in the grammar. Maybe we could standardize that a bit
more; so that we have only one `STORAGE [kind]` definition in the
grammar?

As I'm new to the grammar files; would you know the difference between
`name` and `ColId`, and why you would change from one to the other in
ALTER COLUMN STORAGE?


Good point, Matthias. I addressed this in 0002. Does it look better now?


In your patch, the documentation for CREATE TABLE says "SET STORAGE", 
but the actual syntax does not contain "SET".






SYSTEM_USER reserved word implementation

2022-06-22 Thread Drouvot, Bertrand

Hi hackers,

The SYSTEM_USER is a sql reserved word as mentioned in [1] and is 
currently not implemented.


Please find attached a patch proposal to make use of the SYSTEM_USER so 
that it returns the authenticated identity (if any) (aka authn_id in the 
Port struct).


Indeed in some circumstances, the authenticated identity is not the 
SESSION_USER and then the information is lost from the connection point 
of view (it could still be retrieved thanks to commit 9afffcb833 and 
log_connections set to on).


_Example 1, using the gss authentification._

Say we have this entry in pg_hba.conf:

host all all 0.0.0.0/0 gss map=mygssmap

and the related mapping in pg_ident.conf

mygssmap   /^(.*@.*)\.LOCAL$    mary

Then, connecting with a valid Kerberos Ticket that contains 
“bertrand@BDTFOREST.LOCAL” as the default principal that way: psql -U 
mary -h myhostname -d postgres,


we will get:

postgres=> select current_user, session_user;
 current_user | session_user
--+--
 mary | mary
(1 row)

While the SYSTEM_USER would produce the Kerberos principal:

postgres=> select system_user;
   system_user
--
bertrand@BDTFOREST.LOCAL
(1 row)

_Example 2, using the peer authentification._

Say we have this entry in pg_hba.conf:

local all john peer map=mypeermap

and the related mapping in pg_ident.conf

mypeermap postgres john

Then connected localy as the system user postgres and connecting to the 
database that way: psql -U john -d postgres, we will get:


postgres=> select current_user, session_user;
 current_user | session_user
--+--
 john | john
(1 row)

While the SYSTEM_USER would produce the system user that requested the 
connection:


postgres=> select system_user;
 system_user
-
 postgres
(1 row)

Thanks to those examples we have seen some situations where the 
information related to the authenticated identity has been lost from the 
connection point of view (means not visible in the current_session or in 
the session_user).


The purpose of this patch is to make it visible through the SYSTEM_USER 
sql reserved word.


_Remarks: _

- In case port->authn_id is NULL then the patch is returning the 
SESSION_USER for the SYSTEM_USER. Perhaps it should return NULL instead.


- There is another thread [2] to expose port->authn_id to extensions and 
triggers thanks to a new API. This thread [2] leads to discussions about 
providing this information to the parallel workers too. While the new 
MyClientConnectionInfo being discussed in [2] could be useful to hold 
the client information that needs to be shared between the backend and 
any parallel workers, it does not seem to be needed in the case 
port->authn_id is exposed through SYSTEM_USER (like it is not for 
CURRENT_USER and SESSION_USER).


I will add this patch to the next commitfest.
I look forward to your feedback.

Bertrand

[1]: https://www.postgresql.org/docs/current/sql-keywords-appendix.html
[2]: 
https://www.postgresql.org/message-id/flat/793d990837ae5c06a558d58d62de9378ab525d83.camel%40vmware.com



diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 478a216dbb..07bb333a3b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -23660,6 +23660,20 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS 
t(ls,n);

   
 
+  
+   
+
+ system_user
+
+system_user
+name
+   
+   
+Returns the identity (if any) that the user presented during the
+authentication cycle, before they were assigned a database role.
+   
+  
+
   

 
diff --git a/src/backend/executor/execExprInterp.c 
b/src/backend/executor/execExprInterp.c
index e44ad68cda..63cf3d8bb7 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -2543,6 +2543,11 @@ ExecEvalSQLValueFunction(ExprState *state, ExprEvalStep 
*op)
*op->resvalue = session_user(fcinfo);
*op->resnull = fcinfo->isnull;
break;
+   case SVFOP_SYSTEM_USER:
+   InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, 
NULL, NULL);
+   *op->resvalue = system_user(fcinfo);
+   *op->resnull = fcinfo->isnull;
+   break;
case SVFOP_CURRENT_CATALOG:
InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, 
NULL, NULL);
*op->resvalue = current_database(fcinfo);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 969c9c158f..29dcc77724 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -842,7 +842,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
SEQUENCE SEQUENCES SERIALIZABLE SERVER SESSION SESSION_USER SET SETS 
SETOF
SHARE SHOW SIMILAR SIMPLE 

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-22 Thread Aleksander Alekseev
Hi Ashutosh,

> IIUC, you want extensibility in FORMAT argument to COPY command
> https://www.postgresql.org/docs/current/sql-copy.html. Where the
> format is pluggable. That seems useful.
> Another option is to dump the data in csv format but use external
> utility to convert csv to parquet or whatever other format is. I
> understand that that's not going to be as efficient as dumping
> directly in the desired format.

Exactly. However, to clarify, I suspect this may be a bit more
involved than simply extending the FORMAT arguments.

This change per se will not be extremely useful. Currently nothing
prevents an extension author to iterate over a table using
heap_open(), heap_getnext(), etc API and dump its content in any
format. The user will have to write "dump_table(foo, filename)"
instead of "COPY ..." but that's not a big deal.

The problem is that every new extension has to re-invent things like
figuring out the schema, the validation of the data, etc. If we could
do this in the core so that an extension author has to implement only
the minimal format-dependent list of callbacks that would be really
great. In order to make the interface practical though one will have
to implement a practical extension as well, for instance, a Parquet
one.

This being said, if it turns out that for some reason this is not
realistic to deliver, ending up with simply extending this part of the
syntax a bit should be fine too.

-- 
Best regards,
Aleksander Alekseev




RE: Replica Identity check of partition table on subscriber

2022-06-22 Thread houzj.f...@fujitsu.com
On Wednesday, June 22, 2022 7:06 PM Amit Kapila  wrote:
> 
> On Wed, Jun 22, 2022 at 10:09 AM Amit Langote 
> wrote:
> >
> > On Wed, Jun 22, 2022 at 12:02 PM houzj.f...@fujitsu.com
> >  wrote:
> > > Since the patch has been committed. Attach the last patch to fix the
> memory leak.
> > >
> > > The bug exists on PG10 ~ PG15(HEAD).
> > >
> > > For HEAD,PG14,PG13, to fix the memory leak, I think we should use
> > > free_attrmap instead of pfree and release the no-longer-useful
> > > attrmap When rebuilding the map info.
> > >
> > > For PG12,PG11,PG10, we only need to add the code to release the
> > > no-longer-useful attrmap when rebuilding the map info. We can still
> > > use
> > > pfree() because the attrmap in back-branch is a single array like:
> > >
> > > entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));
> >
> > LGTM, thank you.
> >
> 
> LGTM as well. One thing I am not completely sure about is whether to make this
> change in PG10 for which the final release is in Nov?
> AFAICS, the leak can only occur after the relcache invalidation on the 
> subscriber
> which may or may not be a very frequent case. What do you guys think?
> 
> Personally, I feel it is good to fix it in all branches including PG10.
+1

Best regards,
Hou zj


Re: Replica Identity check of partition table on subscriber

2022-06-22 Thread Amit Langote
On Wed, Jun 22, 2022 at 8:05 PM Amit Kapila  wrote:
> On Wed, Jun 22, 2022 at 10:09 AM Amit Langote  wrote:
> > On Wed, Jun 22, 2022 at 12:02 PM houzj.f...@fujitsu.com
> >  wrote:
> > > Since the patch has been committed. Attach the last patch to fix the 
> > > memory leak.
> > >
> > > The bug exists on PG10 ~ PG15(HEAD).
> > >
> > > For HEAD,PG14,PG13, to fix the memory leak, I think we should use
> > > free_attrmap instead of pfree and release the no-longer-useful attrmap
> > > When rebuilding the map info.
> > >
> > > For PG12,PG11,PG10, we only need to add the code to release the
> > > no-longer-useful attrmap when rebuilding the map info. We can still use
> > > pfree() because the attrmap in back-branch is a single array like:
> > >
> > > entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));
> >
> > LGTM, thank you.
>
> LGTM as well. One thing I am not completely sure about is whether to
> make this change in PG10 for which the final release is in Nov?
> AFAICS, the leak can only occur after the relcache invalidation on the
> subscriber which may or may not be a very frequent case. What do you
> guys think?

Agree that the leak does not seem very significant, though...

> Personally, I feel it is good to fix it in all branches including PG10.

...yes, why not.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Make COPY extendable in order to support Parquet and other formats

2022-06-22 Thread Ashutosh Bapat
On Tue, Jun 21, 2022 at 3:26 PM Aleksander Alekseev
 wrote:

>
> In other words, personally I'm unaware of use cases when somebody
> needs a complete read/write FDW or TableAM implementation for formats
> like Parquet, ORC, etc. Also to my knowledge they are not particularly
> optimized for this.
>

IIUC, you want extensibility in FORMAT argument to COPY command
https://www.postgresql.org/docs/current/sql-copy.html. Where the
format is pluggable. That seems useful.
Another option is to dump the data in csv format but use external
utility to convert csv to parquet or whatever other format is. I
understand that that's not going to be as efficient as dumping
directly in the desired format.

-- 
Best Wishes,
Ashutosh Bapat




Unify DLSUFFIX on Darwin

2022-06-22 Thread Peter Eisentraut
macOS has traditionally used extension .dylib for shared libraries (used 
at build time) and .so for dynamically loaded modules (used by 
dlopen()).  This complicates the build system a bit.  Also, Meson uses 
.dylib for both, so it would be worth unifying this in order to be able 
to get equal build output.


There doesn't appear to be any reason to use any particular extension 
for dlopened modules, since dlopen() will accept anything and PostgreSQL 
is well-factored to be able to deal with any extension.  Other software 
packages that I have handy appear to be about 50/50 split on which 
extension they use for their plugins.  So it seems possible to change 
this safely.From 291defe5f2aed76c608de3894d131eafa3eaf761 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 22 Jun 2022 13:04:23 +0200
Subject: [PATCH] Unify DLSUFFIX on Darwin

macOS has traditionally used extension .dylib for shared libraries
(used at build time) and .so for dynamically loaded modules (used by
dlopen()).  This complicates the build system a bit.  Also, Meson uses
.dylib for both, so it would be worth unifying this in order to be
able to get equal build output.

There doesn't appear to be any reason to use any particular extension
for dlopened modules, since dlopen() will accept anything and
PostgreSQL is well-factored to be able to deal with any extension.
Other software packages that I have handy appear to be about 50/50
split on which extension they use for their plugins.  So it seems
possible to change this safely.
---
 config/python.m4  | 15 +--
 configure | 15 +--
 src/Makefile.shlib|  2 --
 src/makefiles/Makefile.darwin |  2 +-
 src/template/darwin   |  2 ++
 5 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/config/python.m4 b/config/python.m4
index e500873ff3..b295ad3d3a 100644
--- a/config/python.m4
+++ b/config/python.m4
@@ -120,16 +120,11 @@ else
found_shlib=0
for d in "${python_libdir}" "${python_configdir}" /usr/lib64 /usr/lib
do
-   # Note: DLSUFFIX is for loadable modules, not shared
-   # libraries, so cannot be used here portably.  Just
-   # check all known possibilities.
-   for e in .so .dll .dylib .sl; do
-   if test -e "$d/lib${ldlibrary}$e"; then
-   python_libdir="$d"
-   found_shlib=1
-   break 2
-   fi
-   done
+   if test -e "$d/lib${ldlibrary}${DLSUFFIX}"; then
+   python_libdir="$d"
+   found_shlib=1
+   break 2
+   fi
done
# Some platforms (OpenBSD) require us to accept a bare versioned shlib
# (".so.n.n") as well. However, check this only after failing to find
diff --git a/configure b/configure
index 7dec6b7bf9..f8306a2999 100755
--- a/configure
+++ b/configure
@@ -10570,16 +10570,11 @@ else
found_shlib=0
for d in "${python_libdir}" "${python_configdir}" /usr/lib64 /usr/lib
do
-   # Note: DLSUFFIX is for loadable modules, not shared
-   # libraries, so cannot be used here portably.  Just
-   # check all known possibilities.
-   for e in .so .dll .dylib .sl; do
-   if test -e "$d/lib${ldlibrary}$e"; then
-   python_libdir="$d"
-   found_shlib=1
-   break 2
-   fi
-   done
+   if test -e "$d/lib${ldlibrary}${DLSUFFIX}"; then
+   python_libdir="$d"
+   found_shlib=1
+   break 2
+   fi
done
# Some platforms (OpenBSD) require us to accept a bare versioned shlib
# (".so.n.n") as well. However, check this only after failing to find
diff --git a/src/Makefile.shlib b/src/Makefile.shlib
index 551023c6fb..d0ec325bf1 100644
--- a/src/Makefile.shlib
+++ b/src/Makefile.shlib
@@ -118,7 +118,6 @@ endif
 ifeq ($(PORTNAME), darwin)
   ifdef soname
 # linkable library
-DLSUFFIX   = .dylib
 ifneq ($(SO_MAJOR_VERSION), 0)
   version_link = -compatibility_version $(SO_MAJOR_VERSION) 
-current_version $(SO_MAJOR_VERSION).$(SO_MINOR_VERSION)
 endif
@@ -127,7 +126,6 @@ ifeq ($(PORTNAME), darwin)
 shlib_major= lib$(NAME).$(SO_MAJOR_VERSION)$(DLSUFFIX)
   else
 # loadable module
-DLSUFFIX   = .so
 LINK.shared= $(COMPILER) -bundle -multiply_defined suppress
   endif
   BUILD.exports= $(AWK) '/^[^\#]/ {printf "_%s\n",$$1}' $< >$@
diff --git a/src/makefiles/Makefile.darwin b/src/makefiles/Makefile.darwin
index 4fc81c1584..45f253a5b4 100644
--- a/src/makefiles/Makefile.darwin
+++ b/src/makefiles/Makefile.darwin

Re: Replica Identity check of partition table on subscriber

2022-06-22 Thread Amit Kapila
On Wed, Jun 22, 2022 at 10:09 AM Amit Langote  wrote:
>
> On Wed, Jun 22, 2022 at 12:02 PM houzj.f...@fujitsu.com
>  wrote:
> > Since the patch has been committed. Attach the last patch to fix the memory 
> > leak.
> >
> > The bug exists on PG10 ~ PG15(HEAD).
> >
> > For HEAD,PG14,PG13, to fix the memory leak, I think we should use
> > free_attrmap instead of pfree and release the no-longer-useful attrmap
> > When rebuilding the map info.
> >
> > For PG12,PG11,PG10, we only need to add the code to release the
> > no-longer-useful attrmap when rebuilding the map info. We can still use
> > pfree() because the attrmap in back-branch is a single array like:
> >
> > entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));
>
> LGTM, thank you.
>

LGTM as well. One thing I am not completely sure about is whether to
make this change in PG10 for which the final release is in Nov?
AFAICS, the leak can only occur after the relcache invalidation on the
subscriber which may or may not be a very frequent case. What do you
guys think?

Personally, I feel it is good to fix it in all branches including PG10.

-- 
With Regards,
Amit Kapila.




Re: Add header support to text format and matching feature

2022-06-22 Thread Michael Paquier
On Wed, Jun 22, 2022 at 12:22:01PM +0200, Peter Eisentraut wrote:
> The latest patch was posted by you, so I was deferring to you to commit it.
> Would you like me to do it?

OK.  As this is originally a feature you have committed, I originally
thought that you would take care of it, even if I sent a patch.  I'll
handle that tomorrow then, if that's fine for you, of course.  Happy
to help.
--
Michael


signature.asc
Description: PGP signature


NLS: Put list of available languages into LINGUAS files

2022-06-22 Thread Peter Eisentraut


This possible change was alluded to in the meson thread at [0].

The proposal is to move the list of available languages from nls.mk into 
a separate file called po/LINGUAS.  Advantages:


- It keeps the parts notionally managed by programmers (nls.mk)
  separate from the parts notionally managed by translators (LINGUAS).

- It's the standard practice recommended by the Gettext manual
  nowadays.

- The Meson build system also supports this layout (and of course
  doesn't know anything about our custom nls.mk), so this would enable
  sharing the list of languages between the two build systems.

(The MSVC build system currently finds all po files by globbing, so it 
is not affected by this change.)


In practice, the list of languages is updated mostly by means of the 
cp-po script that I use to update the translations before releases [1], 
and I have a small patch ready for that to adapt to this change.  (In 
any case, that wouldn't be needed until the beta of PG16.)



[0]: 
https://www.postgresql.org/message-id/bfcd5353-0fb3-a05c-6f62-164d98c56...@enterprisedb.com


[1]: 
https://git.postgresql.org/gitweb/?p=pgtranslation/admin.git;a=blob;f=cp-po;h=d4ae9285697ba110228b6e01c8339b1d0f8c3458;hb=HEADFrom 6191a75c8d56cd524c505b371cb82705b676f026 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 22 Jun 2022 12:38:29 +0200
Subject: [PATCH] NLS: Put list of available languages into LINGUAS files

This moves the list of available languages from nls.mk into a separate
file called po/LINGUAS.  Advantages:

- It keeps the parts notionally managed by programmers (nls.mk)
  separate from the parts notionally managed by translators (LINGUAS).

- It's the standard practice recommended by the Gettext manual
  nowadays.

- The Meson build system also supports this layout (and of course
  doesn't know anything about our custom nls.mk), so this would enable
  sharing the list of languages between the two build systems.

(The MSVC build system currently finds all po files by globbing, so it
is not affected by this change.)
---
 doc/src/sgml/nls.sgml  | 23 ++-
 src/backend/nls.mk |  1 -
 src/backend/po/LINGUAS |  1 +
 src/bin/initdb/nls.mk  |  1 -
 src/bin/initdb/po/LINGUAS  |  1 +
 src/bin/pg_amcheck/nls.mk  |  1 -
 src/bin/pg_amcheck/po/LINGUAS  |  1 +
 src/bin/pg_archivecleanup/nls.mk   |  1 -
 src/bin/pg_archivecleanup/po/LINGUAS   |  1 +
 src/bin/pg_basebackup/nls.mk   |  1 -
 src/bin/pg_basebackup/po/LINGUAS   |  1 +
 src/bin/pg_checksums/nls.mk|  1 -
 src/bin/pg_checksums/po/LINGUAS|  1 +
 src/bin/pg_config/nls.mk   |  1 -
 src/bin/pg_config/po/LINGUAS   |  1 +
 src/bin/pg_controldata/nls.mk  |  1 -
 src/bin/pg_controldata/po/LINGUAS  |  1 +
 src/bin/pg_ctl/nls.mk  |  1 -
 src/bin/pg_ctl/po/LINGUAS  |  1 +
 src/bin/pg_dump/nls.mk |  1 -
 src/bin/pg_dump/po/LINGUAS |  1 +
 src/bin/pg_resetwal/nls.mk |  1 -
 src/bin/pg_resetwal/po/LINGUAS |  1 +
 src/bin/pg_rewind/nls.mk   |  1 -
 src/bin/pg_rewind/po/LINGUAS   |  1 +
 src/bin/pg_test_fsync/nls.mk   |  1 -
 src/bin/pg_test_fsync/po/LINGUAS   |  1 +
 src/bin/pg_test_timing/nls.mk  |  1 -
 src/bin/pg_test_timing/po/LINGUAS  |  1 +
 src/bin/pg_upgrade/nls.mk  |  1 -
 src/bin/pg_upgrade/po/LINGUAS  |  1 +
 src/bin/pg_verifybackup/nls.mk |  1 -
 src/bin/pg_verifybackup/po/LINGUAS |  1 +
 src/bin/pg_waldump/nls.mk  |  1 -
 src/bin/pg_waldump/po/LINGUAS  |  1 +
 src/bin/psql/nls.mk|  1 -
 src/bin/psql/po/LINGUAS|  1 +
 src/bin/scripts/nls.mk |  1 -
 src/bin/scripts/po/LINGUAS |  1 +
 src/interfaces/ecpg/ecpglib/nls.mk |  1 -
 src/interfaces/ecpg/ecpglib/po/LINGUAS |  1 +
 src/interfaces/ecpg/preproc/nls.mk |  1 -
 src/interfaces/ecpg/preproc/po/LINGUAS |  1 +
 src/interfaces/libpq/nls.mk|  1 -
 src/interfaces/libpq/po/LINGUAS|  1 +
 src/nls-global.mk  |  6 +-
 src/pl/plperl/nls.mk   |  1 -
 src/pl/plperl/po/LINGUAS   |  1 +
 src/pl/plpgsql/src/nls.mk  |  1 -
 src/pl/plpgsql/src/po/LINGUAS  |  1 +
 src/pl/plpython/nls.mk |  1 -
 src/pl/plpython/po/LINGUAS |  1 +
 src/pl/tcl/nls.mk  |  1 -
 src/pl/tcl/po/LINGUAS  |  1 +
 54 files changed, 41 insertions(+), 40 deletions(-)
 create mode 100644 src/backend/po/LINGUAS
 create mode 100644 src/bin/initdb/po/LINGUAS
 create mode 100644 src/bin/pg_amcheck/po/LINGUAS
 create mode 100644 src/bin/pg_archivecleanup/po/LINGUAS
 create mode 100644 src/bin/pg_basebackup/po/LINGUAS
 create mode 100644 src/bin/pg_checksums/po/LINGUAS
 create mode 100644 

Re: Use fadvise in wal replay

2022-06-22 Thread Andrey Borodin



> On 22 Jun 2022, at 13:26, Pavel Borisov  wrote:
> 
> Then I'd guess that your speedup is due to speeding up the first several Mb's 
> in many files opened
I think in this case Thomas' aproach of prefetching next WAL segment would do 
better. But Jakub observed opposite results.

Best regards, Andrey Borodin.



Re: Use fadvise in wal replay

2022-06-22 Thread Pavel Borisov
On Wed, Jun 22, 2022 at 2:07 PM Andrey Borodin  wrote:

>
>
> > On 21 Jun 2022, at 20:52, Pavel Borisov  wrote:
> >
> > > On 21 Jun 2022, at 16:59, Jakub Wartak 
> wrote:
> > Oh, wow, your benchmarks show really impressive improvement.
> >
> > FWIW I was trying to speedup long sequential file reads in Postgres
> using fadvise hints. I've found no detectable improvements.
> > Then I've written 1Mb - 1Gb sequential read test with both fadvise
> POSIX_FADV_WILLNEED and POSIX_FADV_SEQUENTIAL in Linux.
> Did you drop caches?
>
Yes. I saw nothing changes speed of long file (50Mb+) read.

> > The only improvement I've found was
> >
> > 1. when the size of read was around several Mb and fadvise len also
> around several Mb.
> > 2. when before fdavice and the first read there was a delay (which was
> supposedly used by OS for reading into prefetch buffer)
> That's the case of startup process: you read a xlog page, then redo
> records from this page.
>
Then I'd guess that your speedup is due to speeding up the first several
Mb's in many files opened (and delay for kernel prefetch is due to some
other reason). That may differ from the case I've tried to measure speedup
and this could be the cause of speedup in your case.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: Add header support to text format and matching feature

2022-06-22 Thread Peter Eisentraut



On 22.06.22 01:34, Michael Paquier wrote:

On Mon, Jun 20, 2022 at 09:03:23AM +0900, Michael Paquier wrote:

On Thu, Jun 16, 2022 at 09:24:56AM +0200, Peter Eisentraut wrote:

I don't feel very strongly about this.  It makes sense to stay consistent
with the existing COPY code.


Yes, my previous argument is based on consistency with the
surroundings.  I am not saying that this could not be made better, it
surely can, but I would recommend to tackle that in a separate patch,
and apply that to more areas than this specific one.


Peter, beta2 is planned for next week.  Do you think that you would be
able to address this open item by the end of this week?  If not, and I
have already looked at the proposed patch, I can jump in and help.


The latest patch was posted by you, so I was deferring to you to commit 
it.  Would you like me to do it?





Re: Use fadvise in wal replay

2022-06-22 Thread Andrey Borodin



> On 21 Jun 2022, at 20:52, Pavel Borisov  wrote:
> 
> > On 21 Jun 2022, at 16:59, Jakub Wartak  wrote:
> Oh, wow, your benchmarks show really impressive improvement.
> 
> FWIW I was trying to speedup long sequential file reads in Postgres using 
> fadvise hints. I've found no detectable improvements.
> Then I've written 1Mb - 1Gb sequential read test with both fadvise 
> POSIX_FADV_WILLNEED and POSIX_FADV_SEQUENTIAL in Linux.
Did you drop caches?

> The only improvement I've found was 
> 
> 1. when the size of read was around several Mb and fadvise len also around 
> several Mb. 
> 2. when before fdavice and the first read there was a delay (which was 
> supposedly used by OS for reading into prefetch buffer)
That's the case of startup process: you read a xlog page, then redo records 
from this page.
> 3. If I read sequential blocks i saw speedup only on first ones. Overall read 
> speed of say 1Gb file remained unchanged no matter what.
> 
> I became convinced that if I read something long, OS does necessary speedups 
> automatically (which is also in agreement with fadvise manual/code comments).
> Could you please elaborate how have you got the results with that big 
> difference? (Though I don't against fadvise usage, at worst it is expected to 
> be useless).

FWIW we with Kirill observed drastically reduced lag on a production server 
when running patched version. Fidvise surely works :) The question is how to 
use it optimally.

Best regards, Andrey Borodin.



Re: Support logical replication of DDLs

2022-06-22 Thread vignesh C
On Tue, Jun 21, 2022 at 5:49 PM houzj.f...@fujitsu.com
 wrote:
>
> On Monday, June 20, 2022 11:32 AM houzj.f...@fujitsu.com 
>  wrote:
> >
> > On Saturday, June 18, 2022 3:38 AM Zheng Li  wrote:
> > > On Wed, Jun 15, 2022 at 12:00 AM Amit Kapila 
> > > wrote:
> > > >
> > > > On Wed, Jun 15, 2022 at 5:44 AM Zheng Li  wrote:
> > > > >
> > > > >
> > > > > While I agree that the deparser is needed to handle the potential
> > > > > syntax differences between the pub/sub, I think it's only relevant
> > > > > for the use cases where only a subset of tables in the database are
> > > > > replicated. For other use cases where all tables, functions and
> > > > > other objects need to be replicated, (for example, creating a
> > > > > logical replica for major version upgrade) there won't be any syntax
> > > > > difference to handle and the schemas are supposed to match exactly
> > > > > between the pub/sub. In other words the user seeks to create an
> > > > > identical replica of the source database and the DDLs should be
> > > > > replicated as is in this case.
> > > > >
> > > >
> > > > I think even for database-level replication we can't assume that
> > > > source and target will always have the same data in which case "Create
> > > > Table As ..", "Alter Table .. " kind of statements can't be replicated
> > > > as it is because that can lead to different results.
> > > "Create Table As .." is already handled by setting the skipData flag of 
> > > the
> > > statement parsetreee before replay:
> > >
> > > /*
> > > * Force skipping data population to avoid data inconsistency.
> > > * Data should be replicated from the publisher instead.
> > > */
> > > castmt->into->skipData = true;
> > >
> > > "Alter Table .. " that rewrites with volatile expressions can also be 
> > > handled
> > > without any syntax change, by enabling the table rewrite replication and
> > > converting the rewrite inserts to updates. ZJ's patch introduced this 
> > > solution.
> > > I've also adopted this approach in my latest patch
> > > 0012-Support-replication-of-ALTER-TABLE-commands-that-rew.patch
> > >
> > > > The other point
> > > > is tomorrow we can extend the database level option/syntax to exclude
> > > > a few objects (something like [1]) as well in which case we again need
> > > > to filter at the publisher level
> > >
> > > I think for such cases it's not full database replication and we could 
> > > treat it as
> > > table level DDL replication, i.e. use the the deparser format.
> >
> > Hi,
> >
> > Here are some points in my mind about the two approaches discussed here.
> >
> > 1) search_patch vs schema qualify
> >
> > Again, I still think it will bring more flexibility and security by schema 
> > qualify the
> > objects in DDL command as mentioned before[1].
> >
> > Besides, a schema qualified DDL is also more appropriate for other use
> > cases(e.g. a table-level replication). As it's possible the schema is 
> > different
> > between pub/sub and it's easy to cause unexpected and undetectable failure 
> > if
> > we just log the search_path.
> >
> > It makes more sense to me to have the same style WAL log(schema qualified)
> > for
> > both database level or table level replication as it will bring more
> > flexibility.
> >
> >
> > > "Create Table As .." is already handled by setting the skipData flag of 
> > > the
> > > statement parsetreee before replay:
> >
> > 2) About the handling of CREATE TABLE AS:
> >
> > I think it's not a appropriate approach to set the skipdata flag on 
> > subscriber
> > as it cannot handle EXECUTE command in CTAS.
> >
> > CREATE TABLE q5_prep_results AS EXECUTE q5(200, 'DT');
> >
> > The Prepared statement is a temporary object which we don't replicate. So if
> > you directly execute the original SQL on subscriber, even if you set 
> > skipdata
> > it will fail.
> >
> > I think it difficult to make this work as you need handle the create/drop of
> > this prepared statement. And even if we extended subscriber's code to make 
> > it
> > work, it doesn't seems like a standard and elegant approach.
> >
> >
> > > "Alter Table .. " that rewrites with volatile expressions can also be 
> > > handled
> > > without any syntax change, by enabling the table rewrite replication and
> > > converting the rewrite inserts to updates. ZJ's patch introduced this 
> > > solution.
> >
> > 3) About the handling of ALTER TABLE rewrite.
> >
> > The approach I proposed before is based on the event trigger + deparser
> > approach. We were able to improve that approach as we don't need to
> > replicate
> > the rewrite in many cases. For example: we don't need to replicate rewrite 
> > dml
> > if there is no volatile/mutable function. We should check and filter these 
> > case
> > at publisher (e.g. via deparser) instead of checking that at subscriber.
> >
> > Besides, as discussed, we need to give warning or error for the cases when 
> > DDL
> > contains volatile function which would be executed[2]. We should check this 
> > at

Re: [PoC] Reducing planning time when tables have many partitions

2022-06-22 Thread Yuya Watari
Dear David,

Thank you for your comments on my patch. I really apologize for my
late response.

On Thu, Mar 24, 2022 at 11:03 AM David Rowley  wrote:
> I think a better way to solve this would be just to have a single hash
> table over all EquivalenceClasses that allows fast lookups of
> EquivalenceMember->em_expr.  I think there's no reason that a given
> Expr should appear in more than one non-merged EquivalenceClass. The
> EquivalenceClass a given Expr belongs to would need to be updated
> during the merge process.

Thank you for your idea. However, I think building a hash table whose
key is EquivalenceMember->em_expr does not work for this case.

What I am trying to optimize in this patch is the following code.

=
EquivalenceClass *ec = /* given */;

EquivalenceMember *em;
ListCell *lc;
foreach(lc, ec->ec_members)
{
em = (EquivalenceMember *) lfirst(lc);

/* predicate is bms_equal or bms_is_subset, etc */
if (!predicate(em))
continue;

/* The predicate satisfies */
do something...;
}
=

>From my observation, the predicates above will be false in most cases
and the subsequent processes are not executed. My optimization is
based on this notion and utilizes hash tables to eliminate calls of
predicates.

If the predicate were "em->em_expr == something", the hash table whose
key is em_expr would be effective. However, the actual predicates are
not of this type but the following.

// Find EquivalenceMembers whose relids is equal to the given relids
(1) bms_equal(em->em_relids, relids)

// Find EquivalenceMembers whose relids is a subset of the given relids
(2) bms_is_subset(em->em_relids, relids)

Since these predicates perform a match search for not em_expr but
em_relids, we need to build a hash table with em_relids as key. If so,
we can drastically reduce the planning time for the pattern (1).
Besides, by enumerating all subsets of relids, pattern (2) can be
optimized. The detailed algorithm is described in the first email.

I show an example of the pattern (1). The next code is in
src/backend/optimizer/path/equivclass.c. As can be seen from this
code, the foreach loop tries to find an EquivalenceMember whose
cur_em->em_relids is equal to rel->relids. If found, subsequent
processing will be performed.

== Before patched ==
List *
generate_implied_equalities_for_column(PlannerInfo *root,
   RelOptInfo *rel,
   ec_matches_callback_type callback,
   void *callback_arg,
   Relids prohibited_rels)
{
...

EquivalenceClass *cur_ec = (EquivalenceClass *)
list_nth(root->eq_classes, i);
EquivalenceMember *cur_em;
ListCell   *lc2;

cur_em = NULL;
foreach(lc2, cur_ec->ec_members)
{
cur_em = (EquivalenceMember *) lfirst(lc2);
if (bms_equal(cur_em->em_relids, rel->relids) &&
callback(root, rel, cur_ec, cur_em, callback_arg))
break;
cur_em = NULL;
}

if (!cur_em)
continue;

...
}
===

My patch modifies this code as follows. The em_foreach_relids_equals
is a newly defined macro that finds EquivalenceMember satisfying the
bms_equal. The macro looks up a hash table using rel->relids as a key.
This type of optimization cannot be achieved without using hash tables
whose key is em->em_relids.

== After patched ==
List *
generate_implied_equalities_for_column(PlannerInfo *root,
   RelOptInfo *rel,
   ec_matches_callback_type callback,
   void *callback_arg,
   Relids prohibited_rels)
{
...

EquivalenceClass *cur_ec = (EquivalenceClass *)
list_nth(root->eq_classes, i);
EquivalenceMember *cur_em;
EquivalenceMember *other_em;

cur_em = NULL;
em_foreach_relids_equals(cur_em, cur_ec, rel->relids)
{
Assert(bms_equal(cur_em->em_relids, rel->relids));
if (callback(root, rel, cur_ec, cur_em, callback_arg))
break;
cur_em = NULL;
}

if (!cur_em)
continue;

...
}
===

> We might not want to build the hash table for all queries.

I agree with you. Building a lot of hash tables will consume much
memory.  My idea for this problem is to let the hash table's key be a
pair of EquivalenceClass and Relids. However, this approach may lead
to increasing looking up time of the hash table.

==

I noticed that the previous patch does not work with the current HEAD.
I attached the modified one to this email.

Additionally, I added my patch to the current commit fest [1].
[1] https://commitfest.postgresql.org/38/3701/

-- 
Best regards,
Yuya Watari
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 

Re: tablesync copy ignores publication actions

2022-06-22 Thread Peter Smith
On Wed, Jun 22, 2022 at 2:18 PM Amit Kapila  wrote:
>
> On Thu, Jun 16, 2022 at 6:07 AM Peter Smith  wrote:
>
> >
> > Thank you for your review comments. Those reported mistakes are fixed
> > in the attached patch v3.
> >
>
> This patch looks mostly good to me except for a few minor comments
> which are mentioned below. It is not very clear in which branch(es) we
> should commit this patch? As per my understanding, this is a
> pre-existing behavior but we want to document it because (a) It was
> not already documented, and (b) we followed it for row filters in
> PG-15 it seems that should be explained. So, we have the following
> options (a) commit it only for PG-15, (b) commit for PG-15 and
> backpatch the relevant sections, or (c) commit it when branch opens
> for PG-16. What do you or others think?

Even though this is a very old docs omission, AFAIK nobody ever raised
it as a problem before. It only became more important because of the
PG15 row-filters. So I think option (a) is ok.

>
> Few comments:
> ==
> 1.
> >
> -   particular event types.  By default, all operation types are replicated.
> -   (Row filters have no effect for TRUNCATE. See
> -   ).
> +   particular event types. By default, all operation types are replicated.
> +   These are DML operation limitations only; they do not affect the initial
> +   data synchronization copy.
> >
>
> Using limitations in the above sentence can be misleading. Can we
> change it to something like: "These publication specifications apply
> only for DML operations; they do ... ".
>

OK - modified as suggested.

> 2.
> + operations. The publication pub3b has a row filter.
>
> In the Examples section, you have used row filter whereas that section
> is later in the docs. So, it is better if you give reference to that
> section in the above sentence (see Section ...).
>

OK - added xref as suggested.

> 3.
> + 
> +  This parameter only affects DML operations. In particular, the
> +  subscription initial data synchronization does not take
> this parameter
> +  into account when copying existing table data.
> + 
>
> In the second sentence: "... subscription initial data synchronization
> ..." doesn't sound appropriate. Can we change it to something like:
> "In particular, the initial data synchronization (see Section ..) in
> logical replication does not take this parameter into account when
> copying existing table data."?
>

OK - modified and added xref as suggested.

~~

PSA patch v4 to address all the above review comments.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v4-0001-PGDOCS-tablesync-ignores-publish-operation.patch
Description: Binary data


Re: Support load balancing in libpq

2022-06-22 Thread Jelte Fennema
I tried to stay in line with the naming of this same option in JDBC and
Npgsql, where it's called "loadBalanceHosts" and "Load Balance Hosts"
respectively. So, actually to be more in line it should be the option for 
libpq should be called "load_balance_hosts" (not "loadbalance" like 
in the previous patch). I attached a new patch with the name of the 
option changed to this.

I also don't think the name is misleading. Randomization of hosts will 
automatically result in balancing the load across multiple hosts. This is 
assuming more than a single connection is made using the connection 
string, either on the same client node or on different client nodes. I think
I think is a fair assumption to make. Also note that this patch does not load 
balance queries, it load balances connections. This is because libpq works
at the connection level, not query level, due to session level state. 

I agree it is indeed fairly simplistic load balancing. But many dedicated load 
balancers often use simplistic load balancing too. Round-robin, random and
hash+modulo based load balancing are all very commonly used load balancer
strategies. Using this patch you should even be able to implement the 
weighted load balancing that you suggest, by supplying the same host + port 
pair multiple times in the list of hosts. 

My preference would be to use load_balance_hosts for the option name.
However, if the name of the option becomes the main point of contention
I would be fine with changing the option to "randomize_hosts". I think 
in the end it comes down to what we want the name of the option to reflect:
1. load_balance_hosts reflects what you (want to) achieve by enabling it
2. randomize_hosts reflects how it is achieved


Jelte

0001-Support-load-balancing-in-libpq.patch
Description: 0001-Support-load-balancing-in-libpq.patch


Re: New Object Access Type hooks

2022-06-22 Thread Michael Paquier
On Mon, Apr 18, 2022 at 03:50:11PM +0900, Michael Paquier wrote:
> A removal from recomputeNamespacePath() implies an addition at the end
> of fetch_search_path() and fetch_search_path_array().  Perhaps an
> extra one in RangeVarGetCreationNamespace()?  The question is how much
> of these we want, for example the search hook would be called now even
> when doing relation-specific checks like RelationIsVisible() and the
> kind.

I have been playing with this issue, and if we want to minimize the
number of times the list of namespaces in activeSearchPath gets
checked through the search hook, it looks like this is going to
require an additional cached list of namespace OIDs filtered through
InvokeNamespaceSearchHook().  However, it is unclear to me how we can
guarantee that any of the code paths forcing a recomputation of
activeSearchPath are not used for a caching phase, so it looks rather
easy to mess up things and finish with a code path using an unfiltered
activeSearchPath.  The set of *IsVisible() routines should be fine, at
least.

At the end, I am not sure that it is a wise time to redesign this
area close to beta2, so I would vote for leaving this issue aside for
now.  Another thing that we could do is to tweak the tests and silence
the part around OAT_NAMESPACE_SEARCH, which would increase the coverage
with installcheck, removing the need for ENCODING and NO_LOCALE.
--
Michael


signature.asc
Description: PGP signature


Re: Postgres perl module namespace

2022-06-22 Thread Noah Misch
On Tue, Apr 19, 2022 at 07:24:58PM -0400, Andrew Dunstan wrote:
> On 2022-04-19 Tu 18:39, Michael Paquier wrote:
> > +*generate_ascii_string = *TestLib::generate_ascii_string;
> > +*slurp_dir = *TestLib::slurp_dir;
> > +*slurp_file = *TestLib::slurp_file;
> >
> > I am not sure if it is possible and my perl-fu is limited in this
> > area, but could a failure be enforced when loading this path if a new
> > routine added in TestLib.pm is forgotten in this list?
> 
> Not very easily that I'm aware of, but maybe some superior perl wizard
> will know better.

One can alias the symbol table, like https://metacpan.org/pod/Package::Alias
does.  I'm attaching what I plan to use.  Today, check-world fails after

  sed -i 's/TestLib/PostgreSQL::Test::Utils/g; 
s/PostgresNode/PostgreSQL::Test::Cluster/g' **/*.pl

on REL_14_STABLE, because today's alias list is incomplete.  With this change,
the same check-world passes.
Author: Noah Misch 
Commit: Noah Misch 

For PostgreSQL::Test compatibility, alias entire package symbol tables.

Remove the need to edit back-branch-specific code sites when
back-patching the addition of a PostgreSQL::Test::Utils symbol.  Replace
per-symbol, incomplete alias lists.  Give old and new package names the
same EXPORT and EXPORT_OK semantics.  Back-patch to v10 (all supported
versions).

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 12339c2..a855fbc 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1,9 +1,9 @@
 
 # Copyright (c) 2022, PostgreSQL Global Development Group
 
-# allow use of release 15+ perl namespace in older branches
-# just 'use' the older module name.
-# See PostgresNode.pm for function implementations
+# Allow use of release 15+ Perl package name in older branches, by giving that
+# package the same symbol table as the older package.  See PostgresNode::new
+# for behavior reacting to the class name.
 
 package PostgreSQL::Test::Cluster;
 
@@ -11,5 +11,8 @@ use strict;
 use warnings;
 
 use PostgresNode;
+BEGIN { *PostgreSQL::Test::Cluster:: = \*PostgresNode::; }
+
+use Exporter 'import';
 
 1;
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm 
b/src/test/perl/PostgreSQL/Test/Utils.pm
index bdbbd6e..e743bdf 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -1,48 +1,16 @@
 # Copyright (c) 2022, PostgreSQL Global Development Group
 
-# allow use of release 15+ perl namespace in older branches
-# just 'use' the older module name.
-# We export the same names as the v15 module.
-# See TestLib.pm for alias assignment that makes this all work.
+# Allow use of release 15+ Perl package name in older branches, by giving that
+# package the same symbol table as the older package.
 
 package PostgreSQL::Test::Utils;
 
 use strict;
 use warnings;
 
-use Exporter 'import';
-
 use TestLib;
+BEGIN { *PostgreSQL::Test::Utils:: = \*TestLib::; }
 
-our @EXPORT = qw(
-  generate_ascii_string
-  slurp_dir
-  slurp_file
-  append_to_file
-  check_mode_recursive
-  chmod_recursive
-  check_pg_config
-  dir_symlink
-  system_or_bail
-  system_log
-  run_log
-  run_command
-  pump_until
-
-  command_ok
-  command_fails
-  command_exit_is
-  program_help_ok
-  program_version_ok
-  program_options_handling_ok
-  command_like
-  command_like_safe
-  command_fails_like
-  command_checks_all
-
-  $windows_os
-  $is_msys2
-  $use_unix_sockets
-);
+use Exporter 'import';
 
 1;
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index be90963..5b8c7a9 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -149,6 +149,11 @@ of finding port numbers, registering instances for 
cleanup, etc.
 sub new
 {
my ($class, $name, $pghost, $pgport) = @_;
+
+   # Use release 15+ semantics when called under a release 15+ name.
+   return PostgresNode->get_new_node(@_[ 1 .. $#_ ])
+ if $class ne 'PostgresNode';
+
my $testname = basename($0);
$testname =~ s/\.[^.]+$//;
my $self = {
@@ -2796,18 +2801,4 @@ sub corrupt_page_checksum
 
 =cut
 
-# support release 15+ perl module namespace
-
-package PostgreSQL::Test::Cluster; ## no critic (ProhibitMultiplePackages)
-
-sub new
-{
-   shift; # remove class param from args
-   return PostgresNode->get_new_node(@_);
-}
-
-no warnings 'once';
-
-*get_free_port = *PostgresNode::get_free_port;
-
 1;
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index f3ee20a..610050e 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -979,46 +979,4 @@ sub command_checks_all
 
 =cut
 
-# support release 15+ perl module namespace
-
-package PostgreSQL::Test::Utils; ## no critic (ProhibitMultiplePackages)
-
-# we don't want to export anything here, but we want to support things called
-# 

Re: pg_page_repair: a tool/extension to repair corrupted pages in postgres with streaming/physical replication

2022-06-22 Thread Masahiko Sawada
Hi,

On Wed, Jun 22, 2022 at 2:44 PM RKN Sai Krishna
 wrote:
>
> Hi,
>
> Problem: Today when a data page is corrupted in the primary postgres with 
> physical replication (sync or async standbys), there seems to be no way to 
> repair it easily and we rely on PITR to recreate the postgres server or drop 
> the corrupted table (of course this is not an option for important customer 
> tables, but may be okay for some maintenance or temporary tables). PITR is 
> costly to do in a production environment oftentimes as it involves creation 
> of the full-blown postgres from the base backup and causing downtime for the 
> customers.
>
> Solution: Make use of the uncorrupted page present in sync or async standby. 
> The proposed tool/extension pg_page_repair (as we call it) can fetch the 
> uncorrupted page from sync or async standby and overwrite the corrupted one 
> on the primary. Yes, there will be a challenge in making sure that the WAL is 
> replayed completely and standby is up-to-date so that we are sure that stale 
> pages are not copied across. A simpler idea could be that the pg_page_repair 
> can wait until the standby replays/catches up with the primary's flush LSN 
> before fetching the uncorrupted page. A downside of this approach is that the 
> pg_page_repair waits for long or rather infinitely if the replication lag is 
> huge. As we all know that the replication lag is something a good postgres 
> solution will always monitor to keep it low, if true, the pg_page_repair is 
> guaranteed to not wait for longer. Another idea could be that the 
> pg_page_repair gets the base page from the standby and applies all the WAL 
> records pertaining to the corrupted page using the base page to get the 
> uncorrupted page. This requires us to pull the replay logic from the core to 
> pg_page_repair which isn't easy. Hence we propose to go with approach 1, but 
> open to discuss on approach 2 as well. We suppose that the solution proposed 
> in this thread holds good even for pages corresponding to indexes.

I'm interested in this topic and recalled I did some research on the
first idea while writing experimental code several years ago[1].

The corruption that can be fixed by this feature is mainly physical
corruption, for example, introduced by storage array cache corruption,
array firmware bugs, filesystem bugs, is that right? Logically corrupt
blocks are much more likely to have been introduced as a result of a
failure or a bug in PostgreSQL, which would end up propagating to
physical standbys.

>
> Implementation Choices: pg_page_repair can either take the corrupted page 
> info (db id, rel id, block number etc.) or just a relation name and 
> automatically figure out the corrupted page using pg_checksums for instance 
> or just database name and automatically figure out all the corrupted pages. 
> It can either repair the corrupted pages online (only the corrupted table is 
> inaccessible, the server continues to run) or take downtime if there are many 
> corrupted pages.

Since the server must be shutdown cleanly before running pg_checksums
if we want to verify checksums of the page while the server is running
we would need to do online checksum verification we discussed
before[2].

>
> Future Scope: pg_page_repair can be integrated to the core so that the 
> postgres will repair the pages automatically without manual intervention.
>
> Other Solutions: We did consider an approach where the tool could obtain the 
> FPI from WAL and replay till the latest WAL record to repair the page. But 
> there could be limitations such as FPI and related WAL not being available in 
> primary/archive location.

How do we find the FPI of the corrupted page effectively from WAL? We
could seek WAL records from backward but it could take a quite long
time.

Regards,

[1] https://github.com/MasahikoSawada/pgtools/tree/master/page_repair
[2] 
https://www.postgresql.org/message-id/CAOBaU_aVvMjQn%3Dge5qPiJOPMmOj5%3Dii3st5Q0Y%2BWuLML5sR17w%40mail.gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Handle infinite recursion in logical replication setup

2022-06-22 Thread Peter Smith
Here are my review comments for the v22* patch set.


v22-0001


No comments. LGTM


V22-0002


2.1 doc/src/sgml/catalogs.sgml

+   Possible origin values are local or
+   any. The default is any.

IMO the word "Possible" here is giving a sense of vagueness.

SUGGESTION
The origin value must be either local or
any.

~~~

2.2 src/backend/commands/subscriptioncmds.c

@@ -265,6 +271,29 @@ parse_subscription_options(ParseState *pstate,
List *stmt_options,
  opts->specified_opts |= SUBOPT_DISABLE_ON_ERR;
  opts->disableonerr = defGetBoolean(defel);
  }
+ else if (IsSet(supported_opts, SUBOPT_ORIGIN) &&
+ strcmp(defel->defname, "origin") == 0)
+ {
+ if (IsSet(opts->specified_opts, SUBOPT_ORIGIN))
+ errorConflictingDefElem(defel, pstate);
+
+ opts->specified_opts |= SUBOPT_ORIGIN;
+
+ /*
+ * Even though "origin" parameter allows only "local" and "any"
+ * values, the "origin" parameter type is implemented as string
+ * type instead of boolean to extend the "origin" parameter to
+ * support filtering of origin name specified by the user in the
+ * later versions.
+ */
+ opts->origin = defGetString(defel);
+
+ if ((strcmp(opts->origin, LOGICALREP_ORIGIN_LOCAL) != 0) &&
+ (strcmp(opts->origin, LOGICALREP_ORIGIN_ANY) != 0))
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized origin value: \"%s\"", opts->origin));
+ }

I was wondering if it might be wise now to do a pfree(opts->origin)
here before setting the new option value which overwrites the
strdup-ed default "any". OTOH maybe it is overkill to worry about the
tiny leak? I am not sure what is the convention for this.

~~~

2.3 src/backend/replication/pgoutput/pgoutput.c

@@ -1698,12 +1719,16 @@ pgoutput_message(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
 }

 /*
- * Currently we always forward.
+ * Return true if the data source (origin) is remote and user has requested
+ * only local data, false otherwise.
  */
 static bool
 pgoutput_origin_filter(LogicalDecodingContext *ctx,
 RepOriginId origin_id)

"user" -> "the user"

~~~

2.4 src/include/catalog/pg_subscription.h

+/*
+ * The subscription will request the publisher to send any changes regardless
+ * of their origin
+ */
+#define LOGICALREP_ORIGIN_ANY "any"

SUGGESTION (remove 'any'; add period)
The subscription will request the publisher to send changes regardless
of their origin.


v22-0003


3.1 Commit message

change 1) Checks and throws an error if 'copy_data = on' and 'origin =
local' but the publication tables were also subscribing from other publishers.

"also subscribing from other publishers." -> "also replicating from
other publishers."

~~~

3.2 doc/src/sgml/ref/alter_subscription.sgml

+ 
+  There is some interaction between the origin
+  parameter and copy_data parameter.

"and copy_data parameter." -> "and the copy_data parameter."

~~~

3.3 doc/src/sgml/ref/create_subscription.sgml

+ 
+  There is some interaction between the origin
+  parameter and copy_data parameter.

"and copy_data parameter." -> "and the copy_data parameter."

~~~

3.4 doc/src/sgml/ref/create_subscription.sgml

+ 
+  There is some interaction between the origin
+  parameter and copy_data parameter. Refer to the
+   for details.
+ 

"and copy_data parameter." -> "and the copy_data parameter."

~~~

3.5 doc/src/sgml/ref/create_subscription.sgml

@@ -383,6 +398,15 @@ CREATE SUBSCRIPTION subscription_name

+  
+   If the subscription is created with origin = local and
+   copy_data = true, it will check if the
publisher tables are
+   being subscribed to any other publisher and, if so, then throw an error to
+   prevent possible non-local data from being copied. The user can override
+   this check and continue with the copy operation by specifying
+   copy_data = force.
+  
+

For the part "it will check if the publisher tables are being
subscribed to any other publisher...", the "being subscribed to"
sounds a bit strange. Maybe it is better to reword this like you
already worded some of the code comments.

SUGGESTION
-> "... it will check if the publisher has subscribed to the same
table from other publishers and ..."

~~~

3.6 src/backend/commands/subscriptioncmds.c

+ /*
+ * Throw an error if the publisher has subscribed to the same table
+ * from some other publisher. We cannot differentiate between the
+ * local and non-local data that is present in the HEAP during the
+ * initial sync. Identification of local data can be done only from
+ * the WAL by using the origin id. XXX: For simplicity, we don't check
+ * whether the table has any data or not. If the table doesn't have
+ * any data then we don't need to distinguish between local and
+ * non-local data so we can avoid throwing error in that case.
+ */

I think the XXX part should be after a blank like so it is more
prominent, instead of being buried in the other text.

e.g.

+ /*
+ * Throw 

Re: Missing reference to pgstat_replslot.c in pgstat.c

2022-06-22 Thread Masahiko Sawada
On Wed, Jun 22, 2022 at 3:29 PM Drouvot, Bertrand  wrote:
>
> Hi hackers,
>
> I think there's a missing reference to pgstat_replslot.c in pgstat.c.
>
> Attached a tiny patch to fix it.

+1

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/