Re: Assert name/short_desc to prevent SHOW ALL segfault

2022-05-27 Thread Tom Lane
Michael Paquier  writes:
> And I've learnt today that we enforce a definition of __has_attribute
> at the top of c.h, and that we already rely on that.  So I agree that
> what you are doing in 0002 should be enough.  Should we wait until 16~
> opens for business though?  I don't see a strong argument to push
> forward with that now that we are in beta mode on HEAD.

Agreed.  This part isn't a bug fix.

regards, tom lane




Re: Assert name/short_desc to prevent SHOW ALL segfault

2022-05-27 Thread Michael Paquier
On Fri, May 27, 2022 at 10:43:17AM -0700, Nathan Bossart wrote:
> Makes sense.  Here's a new patch set.  0001 is the part intended for
> back-patching, and 0002 is the rest (i.e., adding pg_attribute_nonnull()).
> I switched to using __has_attribute to discover whether nonnull was

Okay, I have looked at 0001 this morning and applied it down to 12.
The change in GetConfigOptionByNum() is not required in 10 and 11, as
the strings of pg_show\all_settings() have begun to be translated in
12~.

> supported, as that seemed cleaner.  I didn't see any need for a new
> configure check, but maybe I am missing something.

And I've learnt today that we enforce a definition of __has_attribute
at the top of c.h, and that we already rely on that.  So I agree that
what you are doing in 0002 should be enough.  Should we wait until 16~
opens for business though?  I don't see a strong argument to push
forward with that now that we are in beta mode on HEAD.
--
Michael


signature.asc
Description: PGP signature


Re: SLRUs in the main buffer pool, redux

2022-05-27 Thread Thomas Munro
On Fri, May 27, 2022 at 11:24 PM Thomas Munro  wrote:
> Rebased, debugged and fleshed out a tiny bit more, but still with
> plenty of TODO notes and questions.  I will talk about this idea at
> PGCon, so I figured it'd help to have a patch that actually applies,
> even if it doesn't work quite right yet.  It's quite a large patch but
> that's partly because it removes a lot of lines...

FWIW, here are my PGCon slides about this:
https://speakerdeck.com/macdice/improving-the-slru-subsystem

There was a little bit of discussion on #pgcon-stream2 which I could
summarise as: can we figure out a way to keep parts of the CLOG pinned
so that backends don't have to do that for each lookup?  Then CLOG
checks become simple reads.  There may be some relation to the idea of
'nailing' btree root pages that I've heard of from a couple of people
now (with ProcSignalBarrier or something more fine grained along those
lines if you need to unnail anything).  Something to think about.

I'm also wondering if it would be possible to do "optimistic" pinning
instead for reads that normally need only a pin, using some kind of
counter scheme with read barriers to tell you if the page might have
been evicted after you read the data...




Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

2022-05-27 Thread Ranier Vilela
Em sex., 27 de mai. de 2022 às 18:22, Andres Freund 
escreveu:

> Hi,
>
> On 2022-05-27 10:35:08 -0300, Ranier Vilela wrote:
> > Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra <
> > tomas.von...@enterprisedb.com> escreveu:
> >
> > > On 5/27/22 02:11, Ranier Vilela wrote:
> > > >
> > > > ...
> > > >
> > > > Here the results with -T 60:
> > >
> > > Might be a good idea to share your analysis / interpretation of the
> > > results, not just the raw data. After all, the change is being proposed
> > > by you, so do you think this shows the change is beneficial?
> > >
> > I think so, but the expectation has diminished.
> > I expected that the more connections, the better the performance.
> > And for both patch and head, this doesn't happen in tests.
> > Performance degrades with a greater number of connections.
>
> Your system has four CPUs. Once they're all busy, adding more connections
> won't improve performance. It'll just add more and more context switching,
> cache misses, and make the OS scheduler do more work.
>
conns   tps head
10  82365.634750
50  74593.714180
80  69219.756038
90  67419.574189
100 66613.771701
Yes it is quite disappointing that with 100 connections, tps loses to 10
connections.


>
>
>
> > GetSnapShowData() isn't a bottleneck?
>
> I'd be surprised if it showed up in a profile on your machine with that
> workload in any sort of meaningful way. The snapshot reuse logic will
> always
> work - because there are no writes - and thus the only work that needs to
> be
> done is to acquire the ProcArrayLock briefly.  And because there is only a
> small number of cores, contention on the cacheline for that isn't a
> problem.
>
Thanks for sharing this.


>
>
> > > These results look much saner, but IMHO it also does not show any clear
> > > benefit of the patch. Or are you still claiming there is a benefit?
> > >
> > We agree that they are micro-optimizations.  However, I think they
> should be
> > considered micro-optimizations in inner loops, because all in
> procarray.c is
> > a hotpath.
>
> As explained earlier, I don't agree that they optimize anything - you're
> making some of the scalability behaviour *worse*, if it's changed at all.
>
>
> > The first objective, I believe, was achieved, with no performance
> > regression.
> > I agree, the gains are small, by the tests done.
>
> There are no gains.
>
IMHO, I must disagree.


>
>
> > But, IMHO, this is a good way, small gains turn into big gains in the
> end,
> > when applied to all code.
> >
> > Consider GetSnapShotData()
> > 1. Most of the time the snapshot is not null, so:
> > if (snaphost == NULL), will fail most of the time.
> >
> > With the patch:
> > if (snapshot->xip != NULL)
> > {
> > if (GetSnapshotDataReuse(snapshot))
> >  return snapshot;
> > }
> >
> > Most of the time the test is true and GetSnapshotDataReuse is not called
> > for new
> > snapshots.
> > count, subcount and  suboverflowed, will not be initialized, for all
> > snapshots.
>
> But that's irrelevant. There's only a few "new" snapshots in the life of a
> connection. You're optimizing something irrelevant.
>
IMHO, when GetSnapShotData() is the bottleneck, all is relevant.


>
>
> > 2. If snapshot is taken during recoverys
> > The pgprocnos and ProcGlobal->subxidStates are not touched unnecessarily.
>
> That code isn't reached when in recovery?
>
Currently it is reached *even* when not in recovery.
With the patch, *only* is reached when in recovery.


>
> > 3. Calling GetSnapshotDataReuse() without first acquiring ProcArrayLock.
> > There's an agreement that this would be fine, for now.
>
> There's no such agreement at all. It's not correct.
>
Ok, but there is a chance it will work correctly.


>
> > Consider ComputeXidHorizons()
> > 1. ProcGlobal->statusFlags is touched before the lock.
>
> Hard to believe that'd have a measurable effect.
>
IMHO, anything you take out of the lock is a benefit.


>
>
> > 2. allStatusFlags[index] is not touched for all numProcs.
>
> I'd be surprised if the compiler couldn't defer that load on its own.
>
Better be sure of that, no?

regards,
Ranier Vilela


Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

2022-05-27 Thread Ranier Vilela
Em sex., 27 de mai. de 2022 às 18:08, Andres Freund 
escreveu:

> Hi,
>
> On 2022-05-27 03:30:46 +0200, Tomas Vondra wrote:
> > On 5/27/22 02:11, Ranier Vilela wrote:
> > > ./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres
> > >
> > > pgbench (15beta1)
> > > transaction type: 
> > > scaling factor: 1
> > > query mode: prepared
> > > number of clients: 100
> > > number of threads: 100
> > > maximum number of tries: 1
> > > duration: 60 s
> > >
> > > connstps head  tps patched
> > >
> > > 1 17126.32610817792.414234
> > > 10   82068.12338382468.334836
> > > 50   73808.73140474678.839428
> > > 80   73290.19171373116.553986
> > > 90   67558.48304368384.906949
> > > 100 65960.98280166997.793777
> > > 200 62216.01199862870.243385
> > > 300 62924.22565862796.157548
> > > 400 62278.09970463129.555135
> > > 500 63257.93087062188.825044
> > > 600 61479.89061161517.913967
> > > 700 61139.35405361327.898847
> > > 800 60833.66379161517.913967
> > > 900 61305.12964261248.336593
> > > 1000   60990.91871961041.670996
> > >
> >
> > These results look much saner, but IMHO it also does not show any clear
> > benefit of the patch. Or are you still claiming there is a benefit?
>
> They don't look all that sane to me - isn't that way lower than one would
> expect?

Yes, quite disappointing.

Restricting both client and server to the same four cores, a
> thermically challenged older laptop I have around I get 150k tps at both 10
> and 100 clients.
>
And you can share the benchmark details? Hardware, postgres and pgbench,
please?

>
> Either way, I'd not expect to see any GetSnapshotData() scalability
> effects to
> show up on an "Intel® Core™ i5-8250U CPU Quad Core" - there's just not
> enough
> concurrency.
>
This means that our customers will not see any connections scalability with
PG15, using the simplest hardware?


> The correct pieces of these changes seem very unlikely to affect
> GetSnapshotData() performance meaningfully.
>
> To improve something like GetSnapshotData() you first have to come up with
> a
> workload that shows it being a meaningful part of a profile. Unless it is,
> performance differences are going to just be due to various forms of noise.
>
Actually in the profiles I got with perf, GetSnapShotData() didn't show up.

regards,
Ranier Vilela


RE: [PATCH] Support % wildcard in extension upgrade filenames

2022-05-27 Thread Regina Obe
> At PostGIS we've been thinking of ways to have better support, from
> PostgreSQL proper, to our upgrade strategy, which has always consisted in
a
> SINGLE upgrade file good for upgrading from any older version.
> 
> The current lack of such support for EXTENSIONs forced us to install a lot
of
> files on disk and we'd like to stop doing that at some point in the
future.
> 
> The proposal is to support wildcards for versions encoded in the filename
so
> that (for our case) a single wildcard could match "any version". I've been
> thinking about the '%' character for that, to resemble the wildcard used
for
> LIKE.
> 
> Here's the proposal:
> https://lists.osgeo.org/pipermail/postgis-devel/2022-February/029500.html
> 
> A very very short (and untested) patch which might (or might not) support
our
> case is attached.
> 
> The only problem with my proposal/patch would be the presence, on the
wild,
> of PostgreSQL EXTENSION actually using the '%' character in their version
> strings, which is currently considered legit by PostgreSQL.
> 
> How do you feel about the proposal (which is wider than the patch) ?
> 
> --strk;
> 
>   Libre GIS consultant/developer
>   https://strk.kbt.io/services.html
[Regina Obe] 

Just a heads up about the above, Sandro has added it as a commitfest item
which hopefully we can polish in time for PG16.
https://commitfest.postgresql.org/38/3654/

Does anyone think this is such a horrible idea that we should abandon all
hope?

The main impetus is that many extensions (postgis, pgRouting, and I'm sure
others) have over 300 extensions script files that are exactly the same.
We'd like to reduce this footprint significantly.

strk said the patch is crappy so don't look at it just yet.  We'll work on
polishing it.  I'll review and provide docs for it.

Thanks,
Regina








Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

2022-05-27 Thread Andres Freund
Hi,

On 2022-05-27 10:35:08 -0300, Ranier Vilela wrote:
> Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra <
> tomas.von...@enterprisedb.com> escreveu:
>
> > On 5/27/22 02:11, Ranier Vilela wrote:
> > >
> > > ...
> > >
> > > Here the results with -T 60:
> >
> > Might be a good idea to share your analysis / interpretation of the
> > results, not just the raw data. After all, the change is being proposed
> > by you, so do you think this shows the change is beneficial?
> >
> I think so, but the expectation has diminished.
> I expected that the more connections, the better the performance.
> And for both patch and head, this doesn't happen in tests.
> Performance degrades with a greater number of connections.

Your system has four CPUs. Once they're all busy, adding more connections
won't improve performance. It'll just add more and more context switching,
cache misses, and make the OS scheduler do more work.



> GetSnapShowData() isn't a bottleneck?

I'd be surprised if it showed up in a profile on your machine with that
workload in any sort of meaningful way. The snapshot reuse logic will always
work - because there are no writes - and thus the only work that needs to be
done is to acquire the ProcArrayLock briefly.  And because there is only a
small number of cores, contention on the cacheline for that isn't a problem.


> > These results look much saner, but IMHO it also does not show any clear
> > benefit of the patch. Or are you still claiming there is a benefit?
> >
> We agree that they are micro-optimizations.  However, I think they should be
> considered micro-optimizations in inner loops, because all in procarray.c is
> a hotpath.

As explained earlier, I don't agree that they optimize anything - you're
making some of the scalability behaviour *worse*, if it's changed at all.


> The first objective, I believe, was achieved, with no performance
> regression.
> I agree, the gains are small, by the tests done.

There are no gains.


> But, IMHO, this is a good way, small gains turn into big gains in the end,
> when applied to all code.
>
> Consider GetSnapShotData()
> 1. Most of the time the snapshot is not null, so:
> if (snaphost == NULL), will fail most of the time.
>
> With the patch:
> if (snapshot->xip != NULL)
> {
> if (GetSnapshotDataReuse(snapshot))
>  return snapshot;
> }
>
> Most of the time the test is true and GetSnapshotDataReuse is not called
> for new
> snapshots.
> count, subcount and  suboverflowed, will not be initialized, for all
> snapshots.

But that's irrelevant. There's only a few "new" snapshots in the life of a
connection. You're optimizing something irrelevant.


> 2. If snapshot is taken during recoverys
> The pgprocnos and ProcGlobal->subxidStates are not touched unnecessarily.

That code isn't reached when in recovery?


> 3. Calling GetSnapshotDataReuse() without first acquiring ProcArrayLock.
> There's an agreement that this would be fine, for now.

There's no such agreement at all. It's not correct.


> Consider ComputeXidHorizons()
> 1. ProcGlobal->statusFlags is touched before the lock.

Hard to believe that'd have a measurable effect.


> 2. allStatusFlags[index] is not touched for all numProcs.

I'd be surprised if the compiler couldn't defer that load on its own.

Greetings,

Andres Freund




Ignore heap rewrites for materialized views in logical replication

2022-05-27 Thread Euler Taveira
Hi,

While investigating an internal report, I concluded that it is a bug. The
reproducible test case is simple (check 0002) and it consists of a FOR ALL
TABLES publication and a non-empty materialized view on publisher. After the
setup, if you refresh the MV, you got the following message on the subscriber:

ERROR:  logical replication target relation "public.pg_temp_N" does not 
exist

That's because the commit 1a499c2520 (that fixes the heap rewrite for tables)
forgot to consider that materialized views can also create transient heaps and
they should also be skipped. The affected version is only 10 because 11
contains a different solution (commit 325f2ec555) that provides a proper fix
for the heap rewrite handling in logical decoding.

0001 is a patch to skip MV too. I attached 0002 to demonstrate the issue but it
doesn't seem appropriate to be included. The test was written to detect the
error and bail out. After this fix, it takes a considerable amount of time to
finish the test because it waits for a message that never arrives. Since nobody
reports this bug in 5 years and considering that version 10 will be EOL in 6
months, I don't think an additional test is crucial here.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 652efe45665d91f2f4ae865dba078fcaffdc0a17 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Fri, 27 May 2022 11:35:27 -0300
Subject: [PATCH v1 1/2] Ignore heap rewrites for materialized views in logical
 replication

If you have a publication that specifies FOR ALL TABLES clause, a REFRESH
MATERIALIZED VIEW can break your setup with the following message

ERROR:  logical replication target relation "public.pg_temp_NNN" does not exist

Commit 1a499c2520 introduces a heuristic to prevent that transient heaps (due
to table rewrites) are included for a FOR ALL TABLES publication. However, it
forgot to exclude materialized views (heap rewrites occurs when you execute
REFRESH MATERIALIZED VIEW). Since materialized views are not supported in
logical decoding, it is appropriate to filter them too.

As explained in the commit message, a more robust solution was included in
version 11 (commit 325f2ec555), hence only version 10 is affected.
---
 src/backend/replication/pgoutput/pgoutput.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index f96fde3df0..e03ff4a11e 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -540,7 +540,9 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 if (sscanf(relname, "pg_temp_%u%n", , ) == 1 &&
 	relname[n] == '\0')
 {
-	if (get_rel_relkind(u) == RELKIND_RELATION)
+	char	relkind = get_rel_relkind(u);
+
+	if (relkind == RELKIND_RELATION || relkind == RELKIND_MATVIEW)
 		break;
 }
 			}
-- 
2.30.2

From 7ba28dea6800359ae631c1a132f8f6c6d418548b Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Fri, 27 May 2022 14:43:10 -0300
Subject: [PATCH v1 2/2] test: heap rewrite for materialized view in logical
 replication

---
 src/test/subscription/t/006_rewrite.pl | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/src/test/subscription/t/006_rewrite.pl b/src/test/subscription/t/006_rewrite.pl
index 5e3211aefa..e5e0fdabe7 100644
--- a/src/test/subscription/t/006_rewrite.pl
+++ b/src/test/subscription/t/006_rewrite.pl
@@ -3,7 +3,8 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 2;
+use Time::HiRes qw(usleep);
+use Test::More tests => 3;
 
 sub wait_for_caught_up
 {
@@ -26,6 +27,13 @@ my $ddl = "CREATE TABLE test1 (a int, b text);";
 $node_publisher->safe_psql('postgres', $ddl);
 $node_subscriber->safe_psql('postgres', $ddl);
 
+$ddl = "CREATE TABLE test2 (a int, b text);";
+$node_publisher->safe_psql('postgres', $ddl);
+$node_subscriber->safe_psql('postgres', $ddl);
+
+$node_publisher->safe_psql('postgres', q{INSERT INTO test2 (a, b) VALUES (10, 'ten'), (20, 'twenty');});
+$node_publisher->safe_psql('postgres', 'CREATE MATERIALIZED VIEW test3 AS SELECT a, b FROM test2;');
+
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
 my $appname   = 'encoding_test';
 
@@ -69,5 +77,21 @@ is($node_subscriber->safe_psql('postgres', q{SELECT a, b, c FROM test1}),
 3|three|33),
'data replicated to subscriber');
 
+$node_publisher->safe_psql('postgres', 'REFRESH MATERIALIZED VIEW test3;');
+
+my $max_attempts = 2 * $TestLib::timeout_default;
+my $attempts = 0;
+my $errmsg = qr/logical replication target relation.*does not exist/;
+{
+	while ($attempts < $max_attempts)
+	{
+		last if (slurp_file($node_subscriber->logfile) =~ $errmsg);
+
+		usleep(100_000);
+		$attempts++;
+	}
+}
+is($attempts, $max_attempts, 'heap rewrite for materialized view on subscriber');
+
 $node_subscriber->stop;
 $node_publisher->stop;
-- 
2.30.2



Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

2022-05-27 Thread Andres Freund
Hi,

On 2022-05-27 03:30:46 +0200, Tomas Vondra wrote:
> On 5/27/22 02:11, Ranier Vilela wrote:
> > ./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres
> > 
> > pgbench (15beta1)
> > transaction type: 
> > scaling factor: 1
> > query mode: prepared
> > number of clients: 100
> > number of threads: 100
> > maximum number of tries: 1
> > duration: 60 s
> > 
> > conns    tps head      tps patched
> > 
> > 1         17126.326108            17792.414234
> > 10       82068.123383            82468.334836
> > 50       73808.731404            74678.839428
> > 80       73290.191713            73116.553986
> > 90       67558.483043            68384.906949
> > 100     65960.982801            66997.793777
> > 200     62216.011998            62870.243385
> > 300     62924.225658            62796.157548
> > 400     62278.099704            63129.555135
> > 500     63257.930870            62188.825044
> > 600     61479.890611            61517.913967
> > 700     61139.354053            61327.898847
> > 800     60833.663791            61517.913967
> > 900     61305.129642            61248.336593
> > 1000   60990.918719            61041.670996
> > 
> 
> These results look much saner, but IMHO it also does not show any clear
> benefit of the patch. Or are you still claiming there is a benefit?

They don't look all that sane to me - isn't that way lower than one would
expect? Restricting both client and server to the same four cores, a
thermically challenged older laptop I have around I get 150k tps at both 10
and 100 clients.

Either way, I'd not expect to see any GetSnapshotData() scalability effects to
show up on an "Intel® Core™ i5-8250U CPU Quad Core" - there's just not enough
concurrency.

The correct pieces of these changes seem very unlikely to affect
GetSnapshotData() performance meaningfully.

To improve something like GetSnapshotData() you first have to come up with a
workload that shows it being a meaningful part of a profile. Unless it is,
performance differences are going to just be due to various forms of noise.

Greetings,

Andres Freund




Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~

2022-05-27 Thread Thomas Munro
On Fri, May 27, 2022 at 3:53 PM Michael Paquier  wrote:
> Windows 8 ends its support
> in 2023, it seems, so that sounds short even for PG16.

I guess you meant 8.1 here, and corresponding server release 2012 R2.
These will come to the end of their "extended" support phase in 2023,
before PG16 comes out.  If I understand correctly (and I'm not a
Windows user, I just googled this), they will start showing blue
full-screen danger-Will-Robinson alerts about viruses and malware.
Why would we have explicit support for that in a new release?  Do we
want people putting their users' data in such a system?  Can you go to
GDPR jail for that in Europe?  (Joking, I think).

We should go full Marie Kondo on EOL'd OSes that are not in our CI or
build farm, IMHO.




Re: "ERROR: latch already owned" on gharial

2022-05-27 Thread Tom Lane
Robert Haas  writes:
> On Fri, May 27, 2022 at 10:21 AM Tom Lane  wrote:
>> What I'd suggest is to promote that failure to elog(PANIC), which
>> would at least give us the PID and if we're lucky a stack trace.

> That proposed change is fine with me.

> As to the question of whether it's a real bug, nobody can prove
> anything unless we actually run it down.

Agreed, and I'll even grant your point that if it is an HPUX-specific
or IA64-specific bug, it is not worth spending huge amounts of time
to isolate.  The problem is that we don't know that.  What we do know
so far is that if it can occur elsewhere, it's rare --- so we'd better
be prepared to glean as much info as possible if we do get such a
failure.  Hence my thought of s/ERROR/PANIC/.  And I'd be in favor of
any other low-effort change we can make to instrument the case better.

regards, tom lane




Re: generic plans and "initial" pruning

2022-05-27 Thread Zhihong Yu
On Fri, May 27, 2022 at 1:10 AM Amit Langote 
wrote:

> On Mon, Apr 11, 2022 at 12:53 PM Zhihong Yu  wrote:
> > On Sun, Apr 10, 2022 at 8:05 PM Amit Langote 
> wrote:
> >> Sending v15 that fixes that to keep the cfbot green for now.
> >
> > Hi,
> >
> > +   /* RT index of the partitione table. */
> >
> > partitione -> partitioned
>
> Thanks, fixed.
>
> Also, I broke this into patches:
>
> 0001 contains the mechanical changes of moving PartitionPruneInfo out
> of Append/MergeAppend into a list in PlannedStmt.
>
> 0002 is the main patch to "Optimize AcquireExecutorLocks() by locking
> only unpruned partitions".
>
> --
> Thanks, Amit Langote
> EDB: http://www.enterprisedb.com

Hi,
In the description:

is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the

I think the second `made available` is redundant (can be omitted).

+ * Initial pruning is performed here if needed (unless it has already been
done
+ * by ExecDoInitialPruning()), and in that case only the surviving
subplans'

I wonder if there is a typo above - I don't find ExecDoInitialPruning
either in PG codebase or in the patches (except for this in the comment).
I think it should be ExecutorDoInitialPruning.

+* bit from it just above to prevent empty tail bits resulting in

I searched in the code base but didn't find mentioning of `empty tail bit`.
Do you mind explaining a bit about it ?

Cheers


Re: "ERROR: latch already owned" on gharial

2022-05-27 Thread Robert Haas
On Fri, May 27, 2022 at 10:21 AM Tom Lane  wrote:
> That's possible, certainly.  It's also possible that it's a real bug
> that so far has only manifested there for (say) timing reasons.
> The buildfarm is not so large that we can write off single-machine
> failures as being unlikely to hit in the real world.
>
> What I'd suggest is to promote that failure to elog(PANIC), which
> would at least give us the PID and if we're lucky a stack trace.

That proposed change is fine with me.

As to the question of whether it's a real bug, nobody can prove
anything unless we actually run it down. It's just a question of what
you think the odds are. Noah's PGCon talk a few years back on the long
tail of buildfarm failures convinced me (perhaps unintentionally) that
low-probability failures that occur only on obscure systems or
configurations are likely not worth running down, because while they
COULD be real bugs, a lot of them aren't, and the time it would take
to figure it out could be spent on other things - for instance, fixing
things that we know for certain are bugs. Spending 40 hours of
person-time on something with a 10% chance of being a bug in the
PostgreSQL code doesn't necessarily make sense to me, because while
you are correct that the buildfarm isn't that large, neither is the
developer community.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: suboverflowed subtransactions concurrency performance optimize

2022-05-27 Thread Peter Geoghegan
On Fri, May 27, 2022 at 11:59 AM Andres Freund  wrote:
> On 2022-05-27 11:48:45 -0700, Peter Geoghegan wrote:
> > I find it hard to believe that there wasn't even a cursory effort at
> > performance validation before this was committed, but that's what it
> > looks like.
>
> Yea. Imo this pretty clearly should be reverted. It has correctness issues,
> testing issues and we don't know whether it does anything useful.

It should definitely be reverted.

-- 
Peter Geoghegan




Re: Allowing REINDEX to have an optional name

2022-05-27 Thread Cary Huang
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hello

The patch applies and tests fine and I think this patch has good intentions to 
prevent the default behavior of REINDEX DATABASE to cause a deadlock. However, 
I am not in favor of simply omitting the database name after DATABASE clause 
because of consistency. Almost all other queries involving the DATABASE clause 
require database name to be given following after. For example, ALTER DATABASE 
[dbname].  

Being able to omit database name for REINDEX DATABASE seems inconsistent to me.

The documentation states that REINDEX DATABASE only works on the current 
database, but it still requires the user to provide a database name and require 
that it must match the current database. Not very useful option, isn’t it? But 
it is still required from the user to stay consistent with other DATABASE 
clauses.

Maybe the best way is to keep the query clause as is (with the database name 
still required) and simply don’t let it reindex system catalog to prevent 
deadlock. At the end, give user a notification that system catalogs have not 
been reindexed, and tell them to use REINDEX SYSTEM instead.

Cary Huang
-
HighGo Software Canada
www.highgo.ca

Re: suboverflowed subtransactions concurrency performance optimize

2022-05-27 Thread Andres Freund
On 2022-05-27 11:48:45 -0700, Peter Geoghegan wrote:
> On Fri, May 27, 2022 at 8:55 AM Andres Freund  wrote:
> > > Anyway, how about if we clear this cache for subtrans whenever
> > > TransactionXmin is advanced and cachedFetchSubXid precedes it? The
> > > comments atop SubTransGetTopmostTransaction seem to state that we
> > > don't care about the exact topmost parent when the intermediate one
> > > precedes TransactionXmin. I think it should preserve the optimization
> > > because anyway for such cases there is a fast path in
> > > SubTransGetTopmostTransaction.
> >
> > There's not even a proof this does speed up anything useful! There's not a
> > single benchmark for the patch.
> 
> I find it hard to believe that there wasn't even a cursory effort at
> performance validation before this was committed, but that's what it
> looks like.

Yea. Imo this pretty clearly should be reverted. It has correctness issues,
testing issues and we don't know whether it does anything useful.




Re: suboverflowed subtransactions concurrency performance optimize

2022-05-27 Thread Peter Geoghegan
On Fri, May 27, 2022 at 8:55 AM Andres Freund  wrote:
> > Anyway, how about if we clear this cache for subtrans whenever
> > TransactionXmin is advanced and cachedFetchSubXid precedes it? The
> > comments atop SubTransGetTopmostTransaction seem to state that we
> > don't care about the exact topmost parent when the intermediate one
> > precedes TransactionXmin. I think it should preserve the optimization
> > because anyway for such cases there is a fast path in
> > SubTransGetTopmostTransaction.
>
> There's not even a proof this does speed up anything useful! There's not a
> single benchmark for the patch.

I find it hard to believe that there wasn't even a cursory effort at
performance validation before this was committed, but that's what it
looks like.

-- 
Peter Geoghegan




Re: Assert name/short_desc to prevent SHOW ALL segfault

2022-05-27 Thread Nathan Bossart
On Thu, May 26, 2022 at 11:01:44PM -0400, Tom Lane wrote:
> Michael Paquier  writes:
>> FWIW, I would be fine to backpatch the NULL handling for short_desc,
>> while treating the addition of nonnull as a HEAD-only change.
> 
> Yeah, sounds about right to me.  My guess is that we will need
> a configure check for nonnull, but perhaps I'm wrong.

Makes sense.  Here's a new patch set.  0001 is the part intended for
back-patching, and 0002 is the rest (i.e., adding pg_attribute_nonnull()).
I switched to using __has_attribute to discover whether nonnull was
supported, as that seemed cleaner.  I didn't see any need for a new
configure check, but maybe I am missing something.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From c2760bf3db292fb55b6448f22eaeee38656083b5 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 27 May 2022 09:48:51 -0700
Subject: [PATCH v4 1/2] Properly handle NULL short descriptions for custom
 variables.

If a NULL short description is specified in one of the
DefineCustomXXXVariable functions, SHOW ALL will segfault.  This
change teaches SHOW ALL to properly handle NULL short descriptions.

Back-patch to all supported versions.

Reported by: Steve Chavez
Author: Steve Chavez
Reviewed by: Nathan Bossart, Michael Paquier, Andred Freund, Tom Lane
Discussion: https://postgr.es/m/CAGRrpzY6hO-Kmykna_XvsTv8P2DshGiU6G3j8yGao4mk0CqjHA%40mail.gmail.com
---
 src/backend/utils/misc/guc.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8e9b71375c..55d41ae7d6 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -9780,7 +9780,16 @@ ShowAllGUCConfig(DestReceiver *dest)
 			isnull[1] = true;
 		}
 
-		values[2] = PointerGetDatum(cstring_to_text(conf->short_desc));
+		if (conf->short_desc)
+		{
+			values[2] = PointerGetDatum(cstring_to_text(conf->short_desc));
+			isnull[2] = false;
+		}
+		else
+		{
+			values[2] = PointerGetDatum(NULL);
+			isnull[2] = true;
+		}
 
 		/* send it to dest */
 		do_tup_output(tstate, values, isnull);
@@ -9792,7 +9801,8 @@ ShowAllGUCConfig(DestReceiver *dest)
 			pfree(setting);
 			pfree(DatumGetPointer(values[1]));
 		}
-		pfree(DatumGetPointer(values[2]));
+		if (conf->short_desc)
+			pfree(DatumGetPointer(values[2]));
 	}
 
 	end_tup_output(tstate);
@@ -10002,7 +10012,7 @@ GetConfigOptionByNum(int varnum, const char **values, bool *noshow)
 	values[3] = _(config_group_names[conf->group]);
 
 	/* short_desc */
-	values[4] = _(conf->short_desc);
+	values[4] = conf->short_desc != NULL ? _(conf->short_desc) : NULL;
 
 	/* extra_desc */
 	values[5] = conf->long_desc != NULL ? _(conf->long_desc) : NULL;
-- 
2.25.1

>From bd05e7092f45f416476cc02a21f8d0bb069a313f Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 27 May 2022 10:10:09 -0700
Subject: [PATCH v4 2/2] Introduce pg_attribute_nonnull() and use it for
 DefineCustomXXXVariable().

pg_attribute_nonnull() can be used to generate compiler warnings
when a function is called with the specified arguments set to NULL.
An empty argument list indicates that all pointer arguments must
not be NULL.  pg_attribute_nonnull() only works for compilers that
support the nonnull function attribute.  If nonnull is not
supported, pg_attribute_nonnull() has no effect.

In addition to introducing pg_attribute_nonnull(), this change uses
it for the DefineCustomXXXVariable() functions to generate warnings
when the "name" and "valueAddr" arguments are set to NULL.

Idea by: Andres Freund
Author: Nathan Bossart
Reviewed by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20220525061739.ur7x535vtzyzkmqo%40alap3.anarazel.de
---
 src/include/c.h | 11 +++
 src/include/utils/guc.h | 10 +-
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/src/include/c.h b/src/include/c.h
index 4f16e589b3..0c64557c62 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -144,6 +144,17 @@
 #define pg_attribute_no_sanitize_alignment()
 #endif
 
+/*
+ * pg_attribute_nonnull means the compiler should warn if the function is called
+ * with the listed arguments set to NULL.  If no arguments are listed, the
+ * compiler should warn if any pointer arguments are set to NULL.
+ */
+#if __has_attribute (nonnull)
+#define pg_attribute_nonnull(...) __attribute__((nonnull(__VA_ARGS__)))
+#else
+#define pg_attribute_nonnull(...)
+#endif
+
 /*
  * Append PG_USED_FOR_ASSERTS_ONLY to definitions of variables that are only
  * used in assert-enabled builds, to avoid compiler warnings about unused
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index efcbad7842..be691c5e9c 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -303,7 +303,7 @@ extern void DefineCustomBoolVariable(const char *name,
 	 int flags,
 	 GucBoolCheckHook check_hook,
 	 GucBoolAssignHook assign_hook,
-	 GucShowHook show_hook);
+	 

Re: postgres and initdb not working inside docker

2022-05-27 Thread Roffild

Only in an ideal world are all standards observed...

Docker has different standards inside.

$ ls -l /home/neo/
drwxr-xr-x2 pgsql  pgsql   8192 May 27 10:37 data
drwxr-sr-x2 pgsql  pgsql   4096 May 24 09:35 data2

/home/pgsql/data - mounted volume.

Therefore, the permissions have changed to drwxr-xr-x

$ mkdir /home/pgsql/data/pgtest
$ ls -l /home/pgsql/data
drwxr-xr-x2 pgsql  pgsql  0 May 27 11:08 pgtest

$ chmod 700 /home/pgsql/data/pgtest
$ ls -l /home/pgsql/data
drwxr-xr-x2 pgsql  pgsql  0 May 27 11:08 pgtest

Oops...

If it's a regular "data2" folder and there is no "read_only: true" flag 
for the container:

$ mkdir /home/pgsql/data2/pgtest
$ chmod 700 /home/pgsql/data2/pgtest
$ ls -l /home/pgsql/data2
drwx--2 pgsql  pgsql   4096 May 27 11:19 pgtest


Roffild writes:

postgres and initdb not working inside docker.
chmod 755 always for a mounted volume inside docker.


This patch will never be accepted.  You don't need it if you take the
standard advice[1] that the Postgres data directory should not itself
be a mount point.  Instead, make a subdirectory in the mounted volume,
and that can have the ownership and permissions that the server expects.

regards, tom lane

[1] https://www.postgresql.org/message-id/12168.1312921709%40sss.pgh.pa.us





Re: Support logical replication of DDLs

2022-05-27 Thread Masahiko Sawada
On Fri, May 27, 2022 at 7:19 AM Zheng Li  wrote:
>
> Hi Masahiko,
>
> > Thank you for updating the patches!
> >
> > I've not looked at these patches in-depth yet but with this approach,
> > what do you think we can handle the DDL syntax differences between
> > major versions? DDL syntax or behavior could be changed by future
> > changes and I think we need to somehow deal with the differences. For
>
> > example, if the user uses logical replication for major version
> > upgrade, the publisher is older than the subscriber. We might have to
> > rewrite the DDL before applying to the subscriber because the DDL
> > executed on the publisher no longer work on a new PostgreSQL version
>
> I don't think we will allow this kind of situation to happen in the
> first place for
> backward compatibility.

It seems like a big limitation to me.

> If a DDL no longer works on a new version of
> PostgreSQL, the user will have to change the application code as well.
> So even if it happens for
> whatever reason, we could either
> 1. fail the apply worker and let the user fix such DDL because they'll
> have to fix the application code anyway when this happens.

Once the apply worker received the DDL, if the DDL doesn't work on the
subscriber, it will enter an infinite loop until the problem is fixed.
If the failure is due to a syntax error, how does the user fix it?

> 2. add guard rail logic in the apply worker to automatically fix such
> DDL if possible, knowing the version of the source and target. Similar
> logic must have been implemented for pg_dump/restore/upgrade.

If I'm not missing something, there is no such implementation in
pg_dump/restore/upgrade. When we use pg_dump/pg_restore for major
version upgrades, we usually use the newer version pg_dump to fetch
objects from the older version server, then restore the objects by
using the newer version pg_restore.

>
> > or we might have to add some options to the DDL before the application
> > in order to keep the same behavior. This seems to require a different
> > solution from what the patch does for the problem you mentioned such
>
> > as "DDL involving multiple tables where only some tables are
> > replicated”.
>
> First of all, this case can only happen when the customer chooses to
> only replicate a subset of the tables in a database in which case
> table level DDL replication is chosen instead of database level DDL
> replication (where all tables
> and DDLs are replicated). I think the solution would be:
> 1. make best effort to detect such DDLs on the publisher and avoid
> logging of such DDLs in table level DDL replication.

I think it's better to support this case.

> 2. apply worker will fail to replay such command due to missing
> objects if such DDLs didn't get filtered on the publisher for some
> reason. This should be rare and I think it's OK even if it happens,
> we'll find out
> why and fix it.

I'm not sure it's rare since replicating a subset of tables is a
common use case of logical replication. But even if we want to go this
way I think we should consider how to fix it at this stage, otherwise
we will end up redesigning it.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: selectivity function

2022-05-27 Thread Greg Hennessy

On Thu, May 26, 2022 at 3:10 PM Tom Lane  wrote:

Can you do anything useful with attaching selectivity estimates
to the functions it references, instead?

I may have been doing down a bad path before. The function I'm
working to improve has five argument, the last being "degrees", which
is the match radius. Obviously a larger match radius  should cause more
matches.

For a small value of a match radius (0.005 degrees):

q3c_test=# explain (analyze, buffers) select * from test as a, test1 as 
b where q3c_join(a.ra,a.dec,b.ra,b.dec,.005);

QUERY PLAN
Nested Loop  (cost=92.28..22787968818.00 rows=5 width=32) (actual 
time=7.799..10758.566 rows=31 loops=1)

  Buffers: shared hit=8005684
  ->  Seq Scan on test a  (cost=0.00..15406.00 rows=100 width=16) 
(actual time=0.008..215.570 rows=100 loops=1)

    Buffers: shared hit=5406
  ->  Bitmap Heap Scan on test1 b  (cost=92.28..22785.45 rows=250 
width=16) (actual time=0.009..0.009 rows=0 loops=100)


(note: I deleted some of the output, since I think I'm keeping the 
important bits)


So, the cost of the query is calculated as 2e10, where it expect five rows,
found 31, and a hot cache of reading 8 million units of disk space, I'd have
to check the fine manual to remind myself of the units of that.

When I do the same sort of query on a much larger match radius (5 deg) I 
get:
q3c_test=# explain (analyze, buffers) select * from test as a, test1 as 
b where q3c_join(a.ra,a.dec,b.ra,b.dec,5);

QUERY PLAN
Nested Loop  (cost=92.28..22787968818.00 rows=4766288 width=32) (actual 
time=0.086..254995.691 rows=38051626 loops=1)

  Buffers: shared hit=104977026
  ->  Seq Scan on test a  (cost=0.00..15406.00 rows=100 width=16) 
(actual time=0.008..261.425 rows=100 loops=1)

    Buffers: shared hit=5406
  ->  Bitmap Heap Scan on test1 b  (cost=92.28..22785.45 rows=250 
width=16) (actual time=0.053..0.247 rows=38 loops=100)


The "total cost" is the same identical 2e10, this time the number of 
rows expectd
is 4.7 million, the number of rows delivered is 38 million (so the 
calculation is off
by a factor of 8, I'm not sure that is important), but the io is now 104 
million units.

So while we are doing a lot more IO, and dealing with a lot more rows, the
calculated cost is identical. That seems strange me me. Is that a normal 
thing?
Is it possible that the cost calculation isn't including the selectivity 
calculation?


Greg






Re: suboverflowed subtransactions concurrency performance optimize

2022-05-27 Thread Andres Freund
Hi,

On 2022-05-27 15:44:39 +0530, Amit Kapila wrote:
> On Thu, May 26, 2022 at 12:53 PM Michael Paquier  wrote:
> >
> > On Tue, May 24, 2022 at 04:52:50PM -0700, Andres Freund wrote:
> >
> > > 2) xid wraparound. There's nothing forcing 
> > > SubTransGetTopmostTransaction() to
> > >be called regularly, so even if a backend isn't idle, the cache could 
> > > just
> > >get more and more outdated until hitting wraparound
> >
> > Hence, you mean that the non-regularity of the call makes it more
> > exposed to an inconsistent result after a wraparound?
> >
> 
> Won't in theory the similar cache in transam.c is also prone to
> similar behavior?

It's not quite the same risk, because there we are likely to actually hit the
cache regularly. Whereas quite normal workloads might not hit this cache for
days on end.


> Anyway, how about if we clear this cache for subtrans whenever
> TransactionXmin is advanced and cachedFetchSubXid precedes it? The
> comments atop SubTransGetTopmostTransaction seem to state that we
> don't care about the exact topmost parent when the intermediate one
> precedes TransactionXmin. I think it should preserve the optimization
> because anyway for such cases there is a fast path in
> SubTransGetTopmostTransaction.

There's not even a proof this does speed up anything useful! There's not a
single benchmark for the patch.

Greetings,

Andres Freund




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-27 Thread Tom Lane
Yura Sokolov  writes:
> В Вт, 24/05/2022 в 17:39 -0700, Andres Freund пишет:
>> A variation on your patch would be to only store the offset to the block
>> header - that should always fit into 32bit (huge allocations being their own
>> block, which is why this wouldn't work for storing an offset to the
>> context).

> I'm +1 for this.

Given David's results in the preceding message, I don't think I am.
A scheme like this would add more arithmetic and at least one more
indirection to GetMemoryChunkContext(), and we already know that
adding even a test-and-branch there has measurable cost.  (I wonder
if using unlikely() on the test would help?  But it's not unlikely
in a generation-context-heavy use case.)  There would also be a good
deal of complication and ensuing slowdown created by the need for
oversize chunks to be a completely different kind of animal with a
different header.

I'm also not very happy about this:

> And with this change every memory context kind can have same header:

IMO that's a bug not a feature.  It puts significant constraints on how
context types can be designed.

regards, tom lane




Re: "ERROR: latch already owned" on gharial

2022-05-27 Thread Tom Lane
Robert Haas  writes:
> On Fri, May 27, 2022 at 7:55 AM Thomas Munro  wrote:
>> Thanks.  Hmm.  So far it's always a parallel worker.  The best idea I
>> have is to include the ID of the mystery PID in the error message and
>> see if that provides a clue next time.

> ... Even if we find a bug in PostgreSQL,
> it's likely to be a bug that only matters on systems nobody cares
> about.

That's possible, certainly.  It's also possible that it's a real bug
that so far has only manifested there for (say) timing reasons.
The buildfarm is not so large that we can write off single-machine
failures as being unlikely to hit in the real world.

What I'd suggest is to promote that failure to elog(PANIC), which
would at least give us the PID and if we're lucky a stack trace.

regards, tom lane




Re: "ERROR: latch already owned" on gharial

2022-05-27 Thread Robert Haas
On Fri, May 27, 2022 at 7:55 AM Thomas Munro  wrote:
> Thanks.  Hmm.  So far it's always a parallel worker.  The best idea I
> have is to include the ID of the mystery PID in the error message and
> see if that provides a clue next time.

What I'm inclined to do is get gharial and anole removed from the
buildfarm. anole was set up by Heikki in 2011. I don't know when
gharial was set up, or by whom. I don't think anyone at EDB cares
about these machines any more, or has any interest in maintaining
them. I think the only reason they're still running is that, just by
good fortune, they haven't fallen over and died yet. The hardest part
of getting them taken out of the buildfarm is likely to be finding
someone who has a working username and password to log into them and
take the jobs out of the crontab.

If someone really cares about figuring out what's going on here, it's
probably possible to get someone who is an EDB employee access to the
box to chase it down. But I'm having a hard time understanding what
value we get out of that given that the machines are running an
11-year-old compiler version on discontinued hardware on a
discontinued operating system. Even if we find a bug in PostgreSQL,
it's likely to be a bug that only matters on systems nobody cares
about.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

2022-05-27 Thread Ranier Vilela
Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra <
tomas.von...@enterprisedb.com> escreveu:

> On 5/27/22 02:11, Ranier Vilela wrote:
> >
> > ...
> >
> > Here the results with -T 60:
>
> Might be a good idea to share your analysis / interpretation of the
> results, not just the raw data. After all, the change is being proposed
> by you, so do you think this shows the change is beneficial?
>
I think so, but the expectation has diminished.
I expected that the more connections, the better the performance.
And for both patch and head, this doesn't happen in tests.
Performance degrades with a greater number of connections.
GetSnapShowData() isn't a bottleneck?


>
> > Linux Ubuntu 64 bits
> > shared_buffers = 128MB
> >
> > ./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres
> >
> > pgbench (15beta1)
> > transaction type: 
> > scaling factor: 1
> > query mode: prepared
> > number of clients: 100
> > number of threads: 100
> > maximum number of tries: 1
> > duration: 60 s
> >
> > connstps head  tps patched
> >
> > 1 17126.32610817792.414234
> > 10   82068.12338382468.334836
> > 50   73808.73140474678.839428
> > 80   73290.19171373116.553986
> > 90   67558.48304368384.906949
> > 100 65960.98280166997.793777
> > 200 62216.01199862870.243385
> > 300 62924.22565862796.157548
> > 400 62278.09970463129.555135
> > 500 63257.93087062188.825044
> > 600 61479.89061161517.913967
> > 700 61139.35405361327.898847
> > 800 60833.66379161517.913967
> > 900 61305.12964261248.336593
> > 1000   60990.91871961041.670996
> >
>
> These results look much saner, but IMHO it also does not show any clear
> benefit of the patch. Or are you still claiming there is a benefit?
>
We agree that they are micro-optimizations.
However, I think they should be considered micro-optimizations in inner
loops,
because all in procarray.c is a hotpath.
The first objective, I believe, was achieved, with no performance
regression.
I agree, the gains are small, by the tests done.
But, IMHO, this is a good way, small gains turn into big gains in the end,
when applied to all code.

Consider GetSnapShotData()
1. Most of the time the snapshot is not null, so:
if (snaphost == NULL), will fail most of the time.

With the patch:
if (snapshot->xip != NULL)
{
if (GetSnapshotDataReuse(snapshot))
 return snapshot;
}

Most of the time the test is true and GetSnapshotDataReuse is not called
for new
snapshots.

count, subcount and  suboverflowed, will not be initialized, for all
snapshots.

2. If snapshot is taken during recoverys
The pgprocnos and ProcGlobal->subxidStates are not touched unnecessarily.
Only if is not suboverflowed.
Skipping all InvalidTransactionId, mypgxactoff, backends doing logical
decoding,
and XID is >= xmax.

3. Calling GetSnapshotDataReuse() without first acquiring ProcArrayLock.
There's an agreement that this would be fine, for now.

Consider ComputeXidHorizons()
1. ProcGlobal->statusFlags is touched before the lock.
2. allStatusFlags[index] is not touched for all numProcs.

All changes were made with the aim of avoiding or postponing unnecessary
work.


> BTW it's generally a good idea to do multiple runs and then use the
> average and/or median. Results from a single may be quite noisy.
>
> >
> > Linux Ubuntu 64 bits
> > shared_buffers = 2048MB
> >
> > ./pgbench -M prepared -c $conns -j $conns -S -n -U postgres
> >
> > pgbench (15beta1)
> > transaction type: 
> > scaling factor: 1
> > query mode: prepared
> > number of clients: 100
> > number of threads: 100
> > maximum number of tries: 1
> > number of transactions per client: 10
> >
> > conns  tps headtps patched
> >
> > 1   2918.004085  3211.303789
> > 10   12262.41569615540.015540
> > 50   13656.72457116701.182444
> > 80   14338.20234816628.559551
> > 90   16597.51037316835.016835
> > 100 17706.77579316607.433487
> > 200 16877.06744116426.969799
> > 300 16942.26077516319.780662
> > 400 16794.51491116155.023607
> > 500 16598.50215116051.106724
> > 600 16717.93500116007.171213
> > 700 16651.20483416004.353184
> > 800 16467.54658316834.591719
> > 900 16588.24114916693.902459
> > 1000   16564.98526516936.952195
> >
>
> I think we've agreed these results are useless.
>
> >
> > Linux Ubuntu 64 bits
> > shared_buffers = 2048MB
> >
> > ./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U postgres
> >
> > pgbench (15beta1)
> > transaction type: 
> > scaling factor: 1
> > query mode: prepared
> > number of clients: 100
> > number of threads: 100
> > 

Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-27 Thread Yura Sokolov
В Вт, 24/05/2022 в 17:39 -0700, Andres Freund пишет:
> 
> A variation on your patch would be to only store the offset to the block
> header - that should always fit into 32bit (huge allocations being their own
> block, which is why this wouldn't work for storing an offset to the
> context). With a bit of care that'd allow aset.c to half it's overhead, by
> using 4 bytes of space for all non-huge allocations.  Of course, it'd increase
> the cost of pfree() of small allocations, because AllocSetFree() currently
> doesn't need to access the block for those. But I'd guess that'd be outweighed
> by the reduced memory usage.

I'm +1 for this.

And with this change every memory context kind can have same header:

typedef struct MemoryChunk {
#ifdef MEMORY_CONTEXT_CHECKING
Size   requested_size;
#endif
uint32 encoded_size;/* encoded allocation size */
uint32 offset_to_block; /* backward offset to block header */
}

Allocated size always could be encoded into uint32 since it is rounded
for large allocations (I believe, large allocations certainly rounded
to at least 4096 bytes):

encoded_size = size < (1u<<31) ? size : (1u<<31)|(size>>12);
/* and reverse */
size = (encoded_size >> 31) ? ((Size)(encoded_size<<1)<<12) :
   (Size)encoded_size;

There is a glitch with Aset since it currently reuses `aset` pointer
for freelist link. With such change this link had to be encoded in
chunk-body itself instead of header. I was confused with this, since
there are valgrind hooks, and I was not sure how to change it (I'm
not good at valgrind hooks). But after thinking more about I believe
it is doable.


regards

---

Yura Sokolov





Re: pg_upgrade test writes to source directory

2022-05-27 Thread Peter Eisentraut

On 26.05.22 22:52, Justin Pryzby wrote:

On Fri, May 27, 2022 at 05:43:04AM +0900, Michael Paquier wrote:

On Thu, May 26, 2022 at 04:36:47PM +0200, Peter Eisentraut wrote:

I chose TESTOUTDIR because it corresponds to the tmp_check directory, so
that the output files of the pg_upgrade run are removed when the test
artifacts are cleaned up.  When using TESTDIR, the pg_upgrade output files
end up in the build directory, which is less bad than the source directory,
but still not ideal.


Where does the choice of TESTOUTDIR come from?  I am a bit surprised
by this choice, to be honest, because there is no trace of it in the
buildfarm client or the core code.  TESTDIR, on the other hand, points
to tmp_check/ if not set.  It gets set it in vcregress.pl and
Makefile.global.in.


It looks like Peter working on top of the meson branch.
TESTOUTDIR is not yet in master.


Ooops, yeah. :)

I think you can just chdir to ${PostgreSQL::Test::Utils::tmp_check}.




Re: enable/disable broken for statement triggers on partitioned tables

2022-05-27 Thread Amit Langote
On Fri, May 27, 2022 at 5:11 PM Peter Eisentraut
 wrote:
> On 24.05.22 23:23, Zhihong Yu wrote:
> > Hi,
> >
> >  AT_EnableTrig,  /* ENABLE TRIGGER name */
> > +   AT_EnableTrigRecurse,   /* internal to commands/tablecmds.c */
> >  AT_EnableAlwaysTrig,/* ENABLE ALWAYS TRIGGER name */
> > +   AT_EnableAlwaysTrigRecurse, /* internal to commands/tablecmds.c */
> >
> > Is it better to put the new enum's at the end of the AlterTableType?
> >
> > This way the numeric values for existing ones don't change.
>
> That's a concern if backpatching.  Otherwise, it's better to put them
> like shown in the patch.

Agreed.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: doc: CREATE FOREIGN TABLE .. PARTITION OF .. DEFAULT

2022-05-27 Thread Amit Langote
On Fri, May 27, 2022 at 7:15 PM Etsuro Fujita  wrote:
> On Fri, May 27, 2022 at 1:58 AM Robert Haas  wrote:
> > Committed, except I adjusted the v11 version so that the CREATE
> > FOREIGN TABLE documentation would match the CREATE TABLE documentation
> > in that branch.
>
> I think we should fix the syntax synopsis in the Parameters section
> of the CREATE FOREIGN TABLE reference page as well.

Oops, good catch.

>  Attached is a patch for that.

Thank you.

I think we should also rewrite the description to match the CREATE
TABLE's text, as in the attached updated patch.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


Further-fix-CREATE-FOREIGN-TABLE-synopsis_v2.patch
Description: Binary data


Re: allow building trusted languages without the untrusted versions

2022-05-27 Thread Peter Eisentraut

On 24.05.22 22:58, Nathan Bossart wrote:

FWIW this was my original thinking.  I can choose to build/install
extensions separately, but when it comes to PL/Tcl and PL/Perl, you've
got to build the trusted and untrusted stuff at the same time, and the
untrusted symbols remain even if you remove the control file and
installation scripts.  Of course, this isn't a complete solution for
removing the ability to do any sort of random file system access, though.


This only makes sense to me if you install directly from the source tree 
to your production installation.  Presumably, there is usually a 
packaging step in between.  And you can decide at that point which files 
to install or not to install.





Re: Prevent writes on large objects in read-only transactions

2022-05-27 Thread Laurenz Albe
On Fri, 2022-05-27 at 15:30 +0900, Yugo NAGATA wrote:
> Currently, lo_creat(e), lo_import, lo_unlink, lowrite, lo_put,
> and lo_from_bytea are allowed even in read-only transactions.
> By using them, pg_largeobject and pg_largeobject_metatable can
> be modified in read-only transactions and the effect remains
> after the transaction finished. Is it unacceptable behaviours, 
> isn't it?

+1

Yours,
Laurenz Albe




Re: "ERROR: latch already owned" on gharial

2022-05-27 Thread Thomas Munro
On Thu, May 26, 2022 at 2:35 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > On a more practical note, I don't have access to the BF database right
> > now.  Would you mind checking if "latch already owned" has occurred on
> > any other animals?
>
> Looking back 6 months, these are the only occurrences of that string
> in failed tests:
>
>  sysname | branch |  snapshot   | stage  |
>  l
> -++-++---
>  gharial | HEAD   | 2022-04-28 23:37:51 | Check  | 2022-04-28 
> 18:36:26.981 MDT [22642:1] ERROR:  latch already owned
>  gharial | HEAD   | 2022-05-06 11:33:11 | IsolationCheck | 2022-05-06 
> 10:10:52.727 MDT [7366:1] ERROR:  latch already owned
>  gharial | HEAD   | 2022-05-24 06:31:31 | IsolationCheck | 2022-05-24 
> 02:44:51.850 MDT [13089:1] ERROR:  latch already owned
> (3 rows)

Thanks.  Hmm.  So far it's always a parallel worker.  The best idea I
have is to include the ID of the mystery PID in the error message and
see if that provides a clue next time.
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 78c6a89271..07b8273a7d 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -402,6 +402,8 @@ InitSharedLatch(Latch *latch)
 void
 OwnLatch(Latch *latch)
 {
+	pid_t		previous_owner;
+
 	/* Sanity checks */
 	Assert(latch->is_shared);
 
@@ -410,8 +412,11 @@ OwnLatch(Latch *latch)
 	Assert(selfpipe_readfd >= 0 && selfpipe_owner_pid == MyProcPid);
 #endif
 
-	if (latch->owner_pid != 0)
-		elog(ERROR, "latch already owned");
+	previous_owner = latch->owner_pid;
+	if (previous_owner != 0)
+		elog(ERROR,
+			 "latch already owned by PID %lu",
+			 (unsigned long) previous_owner);
 
 	latch->owner_pid = MyProcPid;
 }


Re: doc: CREATE FOREIGN TABLE .. PARTITION OF .. DEFAULT

2022-05-27 Thread Etsuro Fujita
On Fri, May 27, 2022 at 1:58 AM Robert Haas  wrote:
> Committed, except I adjusted the v11 version so that the CREATE
> FOREIGN TABLE documentation would match the CREATE TABLE documentation
> in that branch.

I think we should fix the syntax synopsis in the Parameters section
of the CREATE FOREIGN TABLE reference page as well.  Attached is a
patch for that.

Sorry for being late for this.

Best regards,
Etsuro Fujita


Further-fix-CREATE-FOREIGN-TABLE-synopsis.patch
Description: Binary data


Re: suboverflowed subtransactions concurrency performance optimize

2022-05-27 Thread Amit Kapila
On Thu, May 26, 2022 at 12:53 PM Michael Paquier  wrote:
>
> On Tue, May 24, 2022 at 04:52:50PM -0700, Andres Freund wrote:
>
> > 2) xid wraparound. There's nothing forcing SubTransGetTopmostTransaction() 
> > to
> >be called regularly, so even if a backend isn't idle, the cache could 
> > just
> >get more and more outdated until hitting wraparound
>
> Hence, you mean that the non-regularity of the call makes it more
> exposed to an inconsistent result after a wraparound?
>

Won't in theory the similar cache in transam.c is also prone to
similar behavior?

Anyway, how about if we clear this cache for subtrans whenever
TransactionXmin is advanced and cachedFetchSubXid precedes it? The
comments atop SubTransGetTopmostTransaction seem to state that we
don't care about the exact topmost parent when the intermediate one
precedes TransactionXmin. I think it should preserve the optimization
because anyway for such cases there is a fast path in
SubTransGetTopmostTransaction.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-05-27 Thread Peter Smith
On Fri, May 27, 2022 at 5:04 PM shiy.f...@fujitsu.com
 wrote:
>
> On Wed, May 25, 2022 7:55 PM vignesh C  wrote:
> >
> > The attached v16 patch has the changes for the same.
> >
>
> Thanks for updating the patch.
>
> Some comments for the document in 0002 patch.
>
> 1.
> +   
> +Lock the required tables in node1 and
> +node2 till the setup is completed.
> +   
> +
> +   
> +Create a publication in node1:
> +
> +node1=# CREATE PUBLICATION pub_node1 FOR TABLE t1;
> +CREATE PUBLICATION
> +
>
> If the table is locked in the very beginning, we will not be able to create 
> the
> publication (because the locks have conflict). Maybe we should switch the 
> order
> of creating publication and locking tables here.
>

I agree. It seems some of the locking instructions in the earlier
sections 31.11.1 - 31.11.4 are contradictory to the later "generic"
steps given in "31.11.5. Generic steps to add a new node to the
existing set of nodes". I'm assuming the generic steps are the
"correct" steps

e.g. generic steps say get the lock on new node tables AFTER the
publication of new node.
e.g. generic steps say do NOT get a table lock on the node one you are
(first) joining to.

~~~

Furthermore, the generic steps are describing attaching a new N+1th
node to some (1 ... N) other nodes.

So I think in the TWO-node case (section 31.11.1) node2 should be
treated as the "new" node that you are attaching to the "first" node1.
IMO the section 31.11.1 example should be reversed so that it obeys
the "generic" pattern.
e.g. It should be doing the CREATE PUBLICATION pub_node2 first (since
that is the "new" node)

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Remove useless tests about TRUNCATE on foreign table

2022-05-27 Thread Yugo NAGATA
Hello,

I found that tests for TRUNCATE on foreign tables are left
in the foreign_data regression test. Now TRUNCATE on foreign
tables are allowed, so I think the tests should be removed.

Currently, the results of the test is 
 "ERROR:  foreign-data wrapper "dummy" has no handler",
but it is just because the foreign table has no handler,
not due to TRUNCATE.

The patch is attached.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index 5bf03680d2..33505352cc 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -1822,11 +1822,6 @@ Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')
 Inherits: fd_pt1
 
--- TRUNCATE doesn't work on foreign tables, either directly or recursively
-TRUNCATE ft2;  -- ERROR
-ERROR:  foreign-data wrapper "dummy" has no handler
-TRUNCATE fd_pt1;  -- ERROR
-ERROR:  foreign-data wrapper "dummy" has no handler
 DROP TABLE fd_pt1 CASCADE;
 NOTICE:  drop cascades to foreign table ft2
 -- IMPORT FOREIGN SCHEMA
@@ -2047,11 +2042,6 @@ ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1);   -- ERROR
 ERROR:  child table is missing constraint "fd_pt2chk1"
 ALTER FOREIGN TABLE fd_pt2_1 ADD CONSTRAINT fd_pt2chk1 CHECK (c1 > 0);
 ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1);
--- TRUNCATE doesn't work on foreign tables, either directly or recursively
-TRUNCATE fd_pt2_1;  -- ERROR
-ERROR:  foreign-data wrapper "dummy" has no handler
-TRUNCATE fd_pt2;  -- ERROR
-ERROR:  foreign-data wrapper "dummy" has no handler
 DROP FOREIGN TABLE fd_pt2_1;
 DROP TABLE fd_pt2;
 -- foreign table cannot be part of partition tree made of temporary
diff --git a/src/test/regress/sql/foreign_data.sql b/src/test/regress/sql/foreign_data.sql
index 9dfff66f54..eefb860adc 100644
--- a/src/test/regress/sql/foreign_data.sql
+++ b/src/test/regress/sql/foreign_data.sql
@@ -746,10 +746,6 @@ ALTER TABLE fd_pt1 RENAME CONSTRAINT fd_pt1chk3 TO f2_check;
 \d+ fd_pt1
 \d+ ft2
 
--- TRUNCATE doesn't work on foreign tables, either directly or recursively
-TRUNCATE ft2;  -- ERROR
-TRUNCATE fd_pt1;  -- ERROR
-
 DROP TABLE fd_pt1 CASCADE;
 
 -- IMPORT FOREIGN SCHEMA
@@ -833,10 +829,6 @@ ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1);   -- ERROR
 ALTER FOREIGN TABLE fd_pt2_1 ADD CONSTRAINT fd_pt2chk1 CHECK (c1 > 0);
 ALTER TABLE fd_pt2 ATTACH PARTITION fd_pt2_1 FOR VALUES IN (1);
 
--- TRUNCATE doesn't work on foreign tables, either directly or recursively
-TRUNCATE fd_pt2_1;  -- ERROR
-TRUNCATE fd_pt2;  -- ERROR
-
 DROP FOREIGN TABLE fd_pt2_1;
 DROP TABLE fd_pt2;
 


Re: enable/disable broken for statement triggers on partitioned tables

2022-05-27 Thread Peter Eisentraut



On 24.05.22 23:23, Zhihong Yu wrote:

Hi,

     AT_EnableTrig,              /* ENABLE TRIGGER name */
+   AT_EnableTrigRecurse,       /* internal to commands/tablecmds.c */
     AT_EnableAlwaysTrig,        /* ENABLE ALWAYS TRIGGER name */
+   AT_EnableAlwaysTrigRecurse, /* internal to commands/tablecmds.c */

Is it better to put the new enum's at the end of the AlterTableType?

This way the numeric values for existing ones don't change.


That's a concern if backpatching.  Otherwise, it's better to put them 
like shown in the patch.





RE: bogus: logical replication rows/cols combinations

2022-05-27 Thread houzj.f...@fujitsu.com
On Friday, May 27, 2022 1:54 PM Justin Pryzby  wrote:
> 
> On Fri, May 27, 2022 at 11:17:00AM +0530, Amit Kapila wrote:
> > On Tue, May 24, 2022 at 11:03 AM houzj.f...@fujitsu.com
>  wrote:
> > >
> > > On Friday, May 20, 2022 11:06 AM Amit Kapila 
> wrote:
> > >
> > > Thanks for pointing it out. Here is the new version patch which add this
> version check.
> >
> > I have added/edited a few comments and ran pgindent. The attached
> > looks good to me. I'll push this early next week unless there are more
> > comments/suggestions.
> 
> A minor doc review.
> Note that I also sent some doc comments at
> 20220519120724.go19...@telsasoft.com.
> 
> +  lists among publications in which case ALTER
> PUBLICATION
> +  command will be successful but later the WalSender in publisher
> + or the
> 
> COMMA in which
> 
> remove "command" ?
> 
> s/in publisher/on the publisher/
> 
> +   Subscription having several publications in which the same table has been
> +   published with different column lists is not supported.
> 
> Either "Subscriptions having .. are not supported"; or, "A subscription 
> having ..
> is not supported".

Thanks for the comments. Here is the new version patch set which fixes these.

Best regards,
Hou zj


0001-language-fixes-on-HEAD-from-Justin.patch
Description: 0001-language-fixes-on-HEAD-from-Justin.patch


v6-0001-Prohibit-combining-publications-with-different-co.patch
Description:  v6-0001-Prohibit-combining-publications-with-different-co.patch


RE: Handle infinite recursion in logical replication setup

2022-05-27 Thread shiy.f...@fujitsu.com
On Wed, May 25, 2022 7:55 PM vignesh C  wrote:
> 
> The attached v16 patch has the changes for the same.
> 

Thanks for updating the patch.

Some comments for the document in 0002 patch.

1.
+   
+Lock the required tables in node1 and
+node2 till the setup is completed.
+   
+
+   
+Create a publication in node1:
+
+node1=# CREATE PUBLICATION pub_node1 FOR TABLE t1;
+CREATE PUBLICATION
+

If the table is locked in the very beginning, we will not be able to create the
publication (because the locks have conflict). Maybe we should switch the order
of creating publication and locking tables here.

2.
In the case of "Adding a new node when data is present in the new node", we need
to truncate table t1 in node3, but the truncate operation would be blocked
because the table has be locked before. Maybe we need some changes for it.

Regards,
Shi yu



Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

2022-05-27 Thread Kyotaro Horiguchi
At Fri, 27 May 2022 02:01:27 +, "Imseih (AWS), Sami"  
wrote in 
> After further research, we found the following.
> 
> Testing on 13.6 with the attached patch we see
> that the missingContrecPtr is being incorrectly
> set on the standby and the promote in the tap
> test fails.
> 
> Per the comments in xlog.c, the
> missingContrecPtr should not be set when
> in standby mode.
> 
> /*
>  * When not in standby mode we find that WAL ends in an incomplete
>  * record, keep track of that record.  After recovery is done,
>  * we'll write a record to indicate downstream WAL readers that
>  * that portion is to be ignored.
>  */
> if (!StandbyMode &&
> !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))
> {
> abortedRecPtr = xlogreader->abortedRecPtr;
> missingContrecPtr = xlogreader->missingContrecPtr;
> elog(LOG, "missingContrecPtr == %ld", missingContrecPtr);
> }
> 
> If StandbyModeRequested is checked instead, which
> checks for the presence of a standby signal file,
> The missingContrecPtr is not set on the
> standby and the test succeeds.
> 
> if (!StandbyModeRequested &&
> !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr))
> {
> abortedRecPtr = xlogreader->abortedRecPtr;
> missingContrecPtr = xlogreader->missingContrecPtr;
> elog(LOG, "missingContrecPtr == %ld", missingContrecPtr);
> }
> 
> If this is a bug as it appears, it appears the original patch
> to resolve this issue is not needed and the ideal fix
> Is to ensure that a standby does not set
> missingContrecPtr.
> 
> Would like to see what others think.

Due to lack of information, I don't have a clear idea of what is
happening here. If that change "fixed" the "issue", that seems to mean
that crash recovery passed an immature contrecord (@6/A860) but
recovery did not stop at the time then continues until 10/B1FA3D88.  A
possibility is, crash recovery ended at the immature contrecord then
the server moved to archive recovery mode and was able to continue
from the same (aborted) LSN on the archived WAL. This means 6/A8 in
pg_wal had been somehow removed after archived, or the segment 6/A7
(not A8) in both pg_xlog and archive are in different histories, which
seems to me what we wanted to prevent by the
XLOG_OVERWRITE_CONTRECORD.  Didn't you use the same archive content
for repeated recovery testing?

And from the fact that entering crash recovery means the cluster
didn't have an idea of how far it should recover until consistency,
that is contained in backup label control file.  That could happen
when a crashed primary is as-is reused as the new standby of the new
primary.

So.. I'd like to hear exactly what you did as the testing.

When standby mode is requested, if crash recovery fails with immature
contrecord, I think we shouldn't continue recovery. But I'm not sure
we need to explictly reject that case.  Further study is needed..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Prevent writes on large objects in read-only transactions

2022-05-27 Thread Yugo NAGATA
Hello,

Currently, lo_creat(e), lo_import, lo_unlink, lowrite, lo_put,
and lo_from_bytea are allowed even in read-only transactions.
By using them, pg_largeobject and pg_largeobject_metatable can
be modified in read-only transactions and the effect remains
after the transaction finished. Is it unacceptable behaviours, 
isn't it?

Also, when such transactions are used in recovery mode, it fails
but the messages output are not user friendly, like:

 postgres=# select lo_creat(42);
 ERROR:  cannot assign OIDs during recovery

 postgres=# select lo_create(42);
 ERROR:  cannot assign TransactionIds during recovery

 postgres=# select lo_unlink(16389);
 ERROR:  cannot acquire lock mode AccessExclusiveLock on database objects while 
recovery is in progress
 HINT:  Only RowExclusiveLock or less can be acquired on database objects 
during recovery.
 

So, I would like propose to explicitly prevent such writes operations
on large object in read-only transactions, like:

 postgres=# SELECT lo_create(42);
 ERROR:  cannot execute lo_create in a read-only transaction

The patch is attached.


Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c
index 5804532881..2ffdba53d6 100644
--- a/src/backend/libpq/be-fsstubs.c
+++ b/src/backend/libpq/be-fsstubs.c
@@ -245,6 +245,8 @@ be_lo_creat(PG_FUNCTION_ARGS)
 {
 	Oid			lobjId;
 
+	PreventCommandIfReadOnly("lo_creat");
+
 	lo_cleanup_needed = true;
 	lobjId = inv_create(InvalidOid);
 
@@ -256,6 +258,8 @@ be_lo_create(PG_FUNCTION_ARGS)
 {
 	Oid			lobjId = PG_GETARG_OID(0);
 
+	PreventCommandIfReadOnly("lo_create");
+
 	lo_cleanup_needed = true;
 	lobjId = inv_create(lobjId);
 
@@ -306,6 +310,8 @@ be_lo_unlink(PG_FUNCTION_ARGS)
 {
 	Oid			lobjId = PG_GETARG_OID(0);
 
+	PreventCommandIfReadOnly("lo_unlink");
+
 	/*
 	 * Must be owner of the large object.  It would be cleaner to check this
 	 * in inv_drop(), but we want to throw the error before not after closing
@@ -368,6 +374,8 @@ be_lowrite(PG_FUNCTION_ARGS)
 	int			bytestowrite;
 	int			totalwritten;
 
+	PreventCommandIfReadOnly("lowrite");
+
 	bytestowrite = VARSIZE_ANY_EXHDR(wbuf);
 	totalwritten = lo_write(fd, VARDATA_ANY(wbuf), bytestowrite);
 	PG_RETURN_INT32(totalwritten);
@@ -413,6 +421,8 @@ lo_import_internal(text *filename, Oid lobjOid)
 	LargeObjectDesc *lobj;
 	Oid			oid;
 
+	PreventCommandIfReadOnly("lo_import");
+
 	/*
 	 * open the file to be read in
 	 */
@@ -539,6 +549,8 @@ lo_truncate_internal(int32 fd, int64 len)
 {
 	LargeObjectDesc *lobj;
 
+	PreventCommandIfReadOnly("lo_truncate");
+
 	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
 		ereport(ERROR,
 (errcode(ERRCODE_UNDEFINED_OBJECT),
@@ -815,6 +827,8 @@ be_lo_from_bytea(PG_FUNCTION_ARGS)
 	LargeObjectDesc *loDesc;
 	int			written PG_USED_FOR_ASSERTS_ONLY;
 
+	PreventCommandIfReadOnly("lo_from_bytea");
+
 	lo_cleanup_needed = true;
 	loOid = inv_create(loOid);
 	loDesc = inv_open(loOid, INV_WRITE, CurrentMemoryContext);
@@ -837,6 +851,8 @@ be_lo_put(PG_FUNCTION_ARGS)
 	LargeObjectDesc *loDesc;
 	int			written PG_USED_FOR_ASSERTS_ONLY;
 
+	PreventCommandIfReadOnly("lo_put");
+
 	lo_cleanup_needed = true;
 	loDesc = inv_open(loOid, INV_WRITE, CurrentMemoryContext);
 
diff --git a/src/test/regress/expected/largeobject.out b/src/test/regress/expected/largeobject.out
index 8e32de04e6..14344edfba 100644
--- a/src/test/regress/expected/largeobject.out
+++ b/src/test/regress/expected/largeobject.out
@@ -506,6 +506,39 @@ SELECT lo_create(2121);
 (1 row)
 
 COMMENT ON LARGE OBJECT 2121 IS 'testing comments';
+-- Test writes on large objects in read-only transactions
+START TRANSACTION READ ONLY;
+SELECT lo_create(42);
+ERROR:  cannot execute lo_create in a read-only transaction
+ROLLBACK;
+START TRANSACTION READ ONLY;
+SELECT lo_creat(42);
+ERROR:  cannot execute lo_creat in a read-only transaction
+ROLLBACK;
+START TRANSACTION READ ONLY;
+SELECT lo_unlink(42);
+ERROR:  cannot execute lo_unlink in a read-only transaction
+ROLLBACK;
+START TRANSACTION READ ONLY;
+SELECT lowrite(42, 'x');
+ERROR:  cannot execute lowrite in a read-only transaction
+ROLLBACK;
+START TRANSACTION READ ONLY;
+SELECT lo_import(:'filename');
+ERROR:  cannot execute lo_import in a read-only transaction
+ROLLBACK;
+START TRANSACTION READ ONLY;
+SELECT lo_truncate(42, 0);
+ERROR:  cannot execute lo_truncate in a read-only transaction
+ROLLBACK;
+START TRANSACTION READ ONLY;
+SELECT lo_from_bytea(0, 'x');
+ERROR:  cannot execute lo_from_bytea in a read-only transaction
+ROLLBACK;
+START TRANSACTION READ ONLY;
+SELECT lo_put(42, 0, 'x');
+ERROR:  cannot execute lo_put in a read-only transaction
+ROLLBACK;
 -- Clean up
 DROP TABLE lotest_stash_values;
 DROP ROLE regress_lo_user;
diff --git a/src/test/regress/sql/largeobject.sql b/src/test/regress/sql/largeobject.sql
index fd3518889e..8b1a2b70c5 100644
--- a/src/test/regress/sql/largeobject.sql
+++ b/src/test/regress/sql/largeobject.sql
@@ -271,6 +271,39 @@ SELECT 

Re: doc phrase: "inheritance child"

2022-05-27 Thread Amit Langote
On Wed, May 25, 2022 at 1:30 PM Ashutosh Bapat
 wrote:
>
> -   If true, the stats include inheritance child columns, not just the
> +   If true, the stats include child tables, not just the
>
> We are replacing columns with tables; is that intentional?
>
> Partitioned tables do not have their own stats, it's just aggregated 
> partition stats.
> ...
> -   If true, the stats include inheritance child columns, not just the
> +   If true, the stats include child childs, not just the
> values in the specified relation
>
>   
> @@ -13152,7 +13152,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts 
> ppx
> inherited bool
>
>
> -   If true, this row includes inheritance child columns, not just the
> +   If true, this row includes child tables, not just the
> values in the specified table
>
>   
>
> Replacing inheritance child "column" with "tables", is that intentional?

I was a bit confused by these too, though perhaps the original text is
not as clear as it could be?  Would the following be a good rewrite:

If true, the stats cover the contents not only of the specified table,
but also of its child tables or partitions.  (If the table is
partitioned, which contains no data by itself, the stats only cover
the contents of partitions).

Although, maybe the parenthetical is unnecessary.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com