date:20220725

Re: explain analyze rows=%.0f

2022-07-25 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

The previous review was incorrectly posted. Updating the pat.ch review

Re: making relfilenodes 56 bits

2022-07-25 Thread Dilip Kumar

On Tue, Jul 26, 2022 at 10:05 AM Amul Sul  wrote:
>
> Few more typos in 0004 patch as well:
>
> the a value
> interger
> previosly
> currenly
>

Thanks for the review, I will fix it in the next version.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: making relfilenodes 56 bits

2022-07-25 Thread Dilip Kumar

On Fri, Jul 22, 2022 at 4:21 PM vignesh C  wrote:
>
> On Wed, Jul 20, 2022 at 4:57 PM Dilip Kumar  wrote:
> >

> Thanks for the patch, my comments from the initial review:
> 1) Since we have changed the macros to inline functions, should we
> change the function names similar to the other inline functions in the
> same file like: ClearBufferTag, InitBufferTag & BufferTagsEqual:

I have thought about it while doing so but I am not sure whether it is
a good idea or not, because before my change these all were macros
with 2 naming conventions so I just changed to inline function so why
to change the name.

> -#define BUFFERTAGS_EQUAL(a,b) \
> -( \
> -   RelFileLocatorEquals((a).rlocator, (b).rlocator) && \
> -   (a).blockNum == (b).blockNum && \
> -   (a).forkNum == (b).forkNum \
> -)
> +static inline void
> +CLEAR_BUFFERTAG(BufferTag *tag)
> +{
> +   tag->rlocator.spcOid = InvalidOid;
> +   tag->rlocator.dbOid = InvalidOid;
> +   tag->rlocator.relNumber = InvalidRelFileNumber;
> +   tag->forkNum = InvalidForkNumber;
> +   tag->blockNum = InvalidBlockNumber;
> +}
>
> 2) We could move this macros along with the other macros at the top of the 
> file:
> +/*
> + * The freeNext field is either the index of the next freelist entry,
> + * or one of these special values:
> + */
> +#define FREENEXT_END_OF_LIST   (-1)
> +#define FREENEXT_NOT_IN_LIST   (-2)

Yeah we can do that.

> 3) typo thn should be then:
> + * can raise it as necessary if we end up with more mapped relations. For
> + * now, we just pick a round number that is modestly larger thn the expected
> + * number of mappings.
> + */
>
> 4) There is one whitespace issue:
> git am v10-0004-Widen-relfilenumber-from-32-bits-to-56-bits.patch
> Applying: Widen relfilenumber from 32 bits to 56 bits
> .git/rebase-apply/patch:1500: space before tab in indent.
> (relfilenumber; \
> warning: 1 line adds whitespace errors.

Okay, I will fix it.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: making relfilenodes 56 bits

2022-07-25 Thread Dilip Kumar

On Thu, Jul 21, 2022 at 9:53 AM Thomas Munro  wrote:
>
> On Wed, Jul 20, 2022 at 11:27 PM Dilip Kumar  wrote:
> > [v10 patch set]
>
> Hi Dilip, I'm experimenting with these patches and will hopefully have
> more to say soon, but I just wanted to point out that this builds with
> warnings and failed on 3/4 of the CI OSes on cfbot's last run.  Maybe
> there is the good kind of uninitialised data on Linux, and the bad
> kind of uninitialised data on those other pesky systems?

Thanks, I have figured out the issue, I will post the patch soon.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: making relfilenodes 56 bits

2022-07-25 Thread Dilip Kumar

On Mon, Jul 25, 2022 at 9:51 PM Ashutosh Sharma  wrote:
>
> Hi,
>
> As oid and relfilenumber are linked with each other, I still see that if the 
> oid value reaches the threshold limit, we are unable to create a table with 
> storage. For example I set FirstNormalObjectId to 4294967294 (one value less 
> than the range limit of 2^32 -1 = 4294967295). Now when I try to create a 
> table, the CREATE TABLE command gets stuck because it is unable to find the 
> OID for the comp type although it can find a new relfilenumber.
>

First of all if the OID value reaches to max oid then it should wrap
around to the FirstNormalObjectId and find a new non conflicting OID.
Since in your case the first normaloid is 4294967294 and max oid is
42949672945 there is no scope of wraparound because in this case you
can create at most one object and once you created that then there is
no more unused oid left and with the current patch we are not at all
trying do anything about this.

Now come to the problem we are trying to solve with 56bits
relfilenode.  Here we are not trying to extend the limit of the system
to create more than 4294967294 objects.  What we are trying to solve
is to not to reuse the same disk filenames for different objects.  And
also notice that the relfilenodes can get consumed really faster than
oid so chances of wraparound is more, I mean you can truncate/rewrite
the same relation multiple times so that relation will have the same
oid but will consume multiple relfilenodes.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-25 Thread Amit Kapila

On Tue, Jul 26, 2022 at 7:00 AM Masahiko Sawada  wrote:
>
> On Mon, Jul 25, 2022 at 7:57 PM shiy.f...@fujitsu.com
>  wrote:
> >
> > Hi,
> >
> > I did some performance test for the master branch patch (based on v6 patch) 
> > to
> > see if the bsearch() added by this patch will cause any overhead.
>
> Thank you for doing performance tests!
>
> >
> > I tested them three times and took the average.
> >
> > The results are as follows, and attach the bar chart.
> >
> > case 1
> > -
> > No catalog modifying transaction.
> > Decode 800k pgbench transactions. (8 clients, 100k transactions per client)
> >
> > master  7.5417
> > patched 7.4107
> >
> > case 2
> > -
> > There's one catalog modifying transaction.
> > Decode 100k/500k/1M transactions.
> >
> > 100k500k1M
> > master  0.0576  0.1491  0.4346
> > patched 0.0586  0.1500  0.4344
> >
> > case 3
> > -
> > There are 64 catalog modifying transactions.
> > Decode 100k/500k/1M transactions.
> >
> > 100k500k1M
> > master  0.0600  0.1666  0.4876
> > patched 0.0620  0.1653  0.4795
> >
> > (Because the result of case 3 shows that there is a overhead of about 3% in 
> > the
> > case decoding 100k transactions with 64 catalog modifying transactions, I
> > tested the next run of 100k xacts with or without catalog modifying
> > transactions, to see if it affects subsequent decoding.)
> >
> > case 4.1
> > -
> > After the test steps in case 3 (64 catalog modifying transactions, decode 
> > 100k
> > transactions), run 100k xacts and then decode.
> >
> > master  0.3699
> > patched 0.3701
> >
> > case 4.2
> > -
> > After the test steps in case 3 (64 catalog modifying transactions, decode 
> > 100k
> > transactions), run 64 DDLs(without checkpoint) and 100k xacts, then decode.
> >
> > master  0.3687
> > patched 0.3696
> >
> > Summary of the tests:
> > After applying this patch, there is a overhead of about 3% in the case 
> > decoding
> > 100k transactions with 64 catalog modifying transactions. This is an extreme
> > case, so maybe it's okay.
>
> Yes. If we're worried about the overhead and doing bsearch() is the
> cause, probably we can try simplehash instead of the array.
>

I am not sure if we need to go that far for this extremely corner
case. Let's first try your below idea.

> An improvement idea is that we pass the parsed->xinfo down to
> SnapBuildXidHasCatalogChanges(), and then return from that function
> before doing bearch() if the parsed->xinfo doesn't have
> XACT_XINFO_HAS_INVALS. That would save calling bsearch() for
> non-catalog-modifying transactions. Is it worth trying?
>

I think this is worth trying and this might reduce some of the
overhead as well in the case presented by Shi-San.

-- 
With Regards,
Amit Kapila.

Re: Cygwin cleanup

2022-07-25 Thread Thomas Munro

On Tue, Jul 26, 2022 at 4:34 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > 3.  You can't really run PostgreSQL on Cygwin for real, because its
> > implementation of signals does not have reliable signal masking, so
> > unsubtle and probably also subtle breakage occurs.  That was reported
> > upstream by Noah years ago, but they aren't working on a fix.
> > lorikeet shows random failures, and presumably any CI system will do
> > the same...
>
> If that's an accurate statement, shouldn't we just drop Cygwin support?

This thread rejected the idea last time around:

https://www.postgresql.org/message-id/flat/136712b0-0619-5619-4634-0f0286acaef7%402ndQuadrant.com

lorikeet still shows the issue.  Failures often involve assertions
about PMSignalState or mq->mq_sender.  Hmm, it's running Cygwin 3.2.0
(March 2021) and the latest release is 3.3.5, so it's remotely
possible that it's been fixed recently.  Maybe that'd be somewhere in
here, but it's not jumping out:

https://github.com/cygwin/cygwin/commits/master/winsup/cygwin/signal.cc

(Oooh, another implementation of signalfd...)

Re: Allow file inclusion in pg_hba and pg_ident files

2022-07-25 Thread Michael Paquier

On Tue, Jul 26, 2022 at 01:04:02PM +0800, Julien Rouhaud wrote:
> It doesn't have much impact most of the time.  The filename is reported if
> there's an IO error while reading the already opened correct file.  The real
> problem is if the hba_file and ident_file are stored in different directory,
> any secondary file (@filename) in the pg_ident.conf would be searched in the
> wrong directory.  With the pending file inclusion patchset, the problem is
> immediately visible as the view is reporting the wrong file name.

Oops, obviously.  I'll go and fix that as that's on me.

> Simple fix attached.  I'll add a v15 open item shortly.

Thanks.  Better not to lose track of it.
--
Michael


signature.asc
Description: PGP signature

Re: Allow file inclusion in pg_hba and pg_ident files

2022-07-25 Thread Julien Rouhaud

Hi,

On Mon, Mar 28, 2022 at 04:22:32PM +0900, Michael Paquier wrote:
> On Mon, Mar 28, 2022 at 04:20:07PM +0900, Michael Paquier wrote:
> > See the attached, for reference, but it would fail with EXEC_BACKEND
> > on WIN32.
> 
> Ditto.

While working on the full regression test coverage for the file inclusion
thing, I discovered an embarrassing typo in the pg_ident_file_mapping
infrastructure, which was using the hba file name rather than the ident file
name in one of the calls.

It doesn't have much impact most of the time.  The filename is reported if
there's an IO error while reading the already opened correct file.  The real
problem is if the hba_file and ident_file are stored in different directory,
any secondary file (@filename) in the pg_ident.conf would be searched in the
wrong directory.  With the pending file inclusion patchset, the problem is
immediately visible as the view is reporting the wrong file name.

Simple fix attached.  I'll add a v15 open item shortly.
>From d36010e7fec78b3ea25255767b8a2478f85fd325 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Tue, 26 Jul 2022 12:52:35 +0800
Subject: [PATCH v1] Fix fill_ident_view incorrect usage of HbaFileName

Thinko introduced in a2c84990bea.

Patchpatch to 15, as the original commit.

Author: Julien Rouhaud
---
 src/backend/utils/adt/hbafuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/hbafuncs.c b/src/backend/utils/adt/hbafuncs.c
index 598259718c..9e5794071c 100644
--- a/src/backend/utils/adt/hbafuncs.c
+++ b/src/backend/utils/adt/hbafuncs.c
@@ -512,7 +512,7 @@ fill_ident_view(Tuplestorestate *tuple_store, TupleDesc 
tupdesc)
 errmsg("could not open usermap file \"%s\": 
%m",
IdentFileName)));

-   linecxt = tokenize_auth_file(HbaFileName, file, &ident_lines, DEBUG3);
+   linecxt = tokenize_auth_file(IdentFileName, file, &ident_lines, DEBUG3);
FreeFile(file);

/* Now parse all the lines */
-- 
2.37.0

Re: Introduce wait_for_subscription_sync for TAP tests

2022-07-25 Thread Amit Kapila

On Tue, Jul 26, 2022 at 7:07 AM Masahiko Sawada  wrote:
>
> Hi,
>
> In tap tests for logical replication, we have the following code in many 
> places:
>
> $node_publisher->wait_for_catchup('tap_sub');
> my $synced_query =
>   "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
> IN ('r', 's');";
> $node_subscriber->poll_query_until('postgres', $synced_query)
>   or die "Timed out while waiting for subscriber to synchronize data";
>
> Also, we sometime forgot to check either one, like we fixed in commit
> 1f50918a6fb02207d151e7cb4aae4c36de9d827c.
>
> I think we can have a new function to wait for all subscriptions to
> synchronize data. The attached patch introduce a new function
> wait_for_subscription_sync(). With this function, we can replace the
> above code with this one function as follows:
>
> $node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
>

+1. This reduces quite some code in various tests and will make it
easier to write future tests.

Few comments/questions:

1.
-$node_publisher->wait_for_catchup('mysub1');
-
-# Also wait for initial table sync to finish.
-my $synced_query =
-  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
-$node_subscriber->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
-
 # Also wait for initial table sync to finish.
-$node_subscriber->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'mysub1');

It seems to me without your patch there is an extra poll in the above
test. If so, we can probably remove that in a separate patch?

2.
+# wait for the replication to catchup if required.
+if (defined($publisher))
+{
+ croak 'subscription name must be specified' unless defined($subname);
+ $publisher->wait_for_catchup($subname, 'replay');
+}
+
+# then, wait for all table states to be ready.
+print "Waiting for all subscriptions in \"$name\" to synchronize data\n";
+my $query = qq[SELECT count(1) = 0
+ FROM pg_subscription_rel
+ WHERE srsubstate NOT IN ('r', 's');];
+$self->poll_query_until($dbname, $query)
+ or croak "timed out waiting for subscriber to synchronize data";

In the tests, I noticed that a few places did wait_for_catchup after
the subscription check, and at other places, we did that check before
as you have it here. Ideally, I think wait_for_catchup should be after
confirming the initial sync is over as without initial sync, the
publisher node won't be completely in sync with the subscriber. What
do you think?

3. In the code quoted in the previous point, why did you pass the
second parameter as 'replay' when we have not used that in the tests
otherwise?

-- 
With Regards,
Amit Kapila.

Re: fairywren hung in pg_basebackup tests

2022-07-25 Thread Noah Misch

On Mon, Jul 25, 2022 at 11:35:12AM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Mon, Jul 25, 2022 at 09:44:21AM -0400, Andrew Dunstan wrote:
> >> Perhaps we should have a guard in system_or_bail() and/or system_log()
> >> which bails if some element of @_ is undefined.
> 
> +1, seeing how hard this is to diagnose.
> 
> > That would be reasonable.  Also reasonable to impose some long timeout, 
> > maybe
> > 10x or 100x PG_TEST_TIMEOUT_DEFAULT, on calls to those functions.
> 
> Why would it need to be more than PG_TEST_TIMEOUT_DEFAULT?

We run some long commands, like the parallel_schedule runs.  Those currently
use plain system(), but they probably should have used system_log() from a
logging standpoint.  If they had, PG_TEST_TIMEOUT_DEFAULT would have been too
short.  One could argue that anything that slow should declare its intent to
be that slow, but that argument is getting into the territory of a policy
change rather than a backstop for clearly-unintended longevity.

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Peter Smith

On Tue, Jul 26, 2022 at 2:09 PM Amit Kapila  wrote:
>
> On Tue, Jul 26, 2022 at 5:04 AM Peter Smith  wrote:
> >
> > On Mon, Jul 25, 2022 at 7:33 PM vignesh C  wrote:
> > >
> > > On Mon, Jul 25, 2022 at 12:58 PM Peter Smith  
> > > wrote:
> > > >
> > > > Firstly, I have some (case-sensitivity) questions about the previous
> > > > patch which was already pushed [1].
> > > >
> > > > Q1. create_subscription docs
> > > >
> > > > I did not understand why the docs refer to slot_name = NONE, yet the
> > > > newly added option says origin = none/any. I think that saying origin
> > > > = NONE/ANY would be more consistent with the existing usage of NONE in
> > > > this documentation.
> > >
> > > Both NONE and none are ok in the case of origin, if you want I can
> > > change it to NONE/ANY in case of origin to keep it consistent.
> > >
> >
> > I preferred the special origin values should be documented as NONE/ANY
> > for better consistency, but let's see what others think about it.
> >
> > There will also be associated minor changes needed for a few
> > error/hint messages.
> >
>
> I am not really sure how much we gain by maintaining consistency with
> slot_name because if due to this we have to change the error messages
> as well then it can create an inconsistency with reserved origin
> names. Consider message: DETAIL:  Origin names "any", "none", and
> names starting with "pg_" are reserved. Now, if we change this to
> "ANY", "NONE" in the above message, it will look a bit odd as "pg_"
> starts with lower case letters.
>

Sure, the message looks a bit odd with the quotes like you wrote
above, but I would not suggest to change it that way - I was thinking
more like below (which is similar to the style the slot_name messages
use)

CURRENT
DETAIL:  Origin names "any", "none", and names starting with "pg_" are reserved.

SUGGESTED
DETAIL:  Origin names ANY, NONE, and names starting with "pg_" are reserved.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: making relfilenodes 56 bits

2022-07-25 Thread Amul Sul

On Fri, Jul 22, 2022 at 4:21 PM vignesh C  wrote:
>
> On Wed, Jul 20, 2022 at 4:57 PM Dilip Kumar  wrote:
> >
> > On Mon, Jul 18, 2022 at 4:51 PM Dilip Kumar  wrote:
> > >
> > > I was doing some more testing by setting the FirstNormalRelFileNumber
> > > to a high value(more than 32 bits) I have noticed a couple of problems
> > > there e.g. relpath is still using OIDCHARS macro which says max
> > > relfilenumber file name can be only 10 character long which is no
> > > longer true.  So there we need to change this value to 20 and also
> > > need to carefully rename the macros and other variable names used for
> > > this purpose.
> > >
> > > Similarly there was some issue in macro in buf_internal.h while
> > > fetching the relfilenumber.  So I will relook into all those issues
> > > and repost the patch soon.
> >
> > I have fixed these existing issues and there was also some issue in
> > pg_dump.c which was creating problems in upgrading to the same version
> > while using a higher range of the relfilenumber.
> >
> > There was also an issue where the user table from the old cluster's
> > relfilenode could conflict with the system table of the new cluster.
> > As a solution currently for system table object (while creating
> > storage first time) we are keeping the low range of relfilenumber,
> > basically we are using the same relfilenumber as OID so that during
> > upgrade the normal user table from the old cluster will not conflict
> > with the system tables in the new cluster.  But with this solution
> > Robert told me (in off list chat) a problem that in future if we want
> > to make relfilenumber completely unique within a cluster by
> > implementing the CREATEDB differently then we can not do that as we
> > have created fixed relfilenodes for the system tables.
> >
> > I am not sure what exactly we can do to avoid that because even if we
> > do something  to avoid that in the new cluster the old cluster might
> > be already using the non-unique relfilenode so after upgrading the new
> > cluster will also get those non-unique relfilenode.
>
> Thanks for the patch, my comments from the initial review:
> 1) Since we have changed the macros to inline functions, should we
> change the function names similar to the other inline functions in the
> same file like: ClearBufferTag, InitBufferTag & BufferTagsEqual:
> -#define BUFFERTAGS_EQUAL(a,b) \
> -( \
> -   RelFileLocatorEquals((a).rlocator, (b).rlocator) && \
> -   (a).blockNum == (b).blockNum && \
> -   (a).forkNum == (b).forkNum \
> -)
> +static inline void
> +CLEAR_BUFFERTAG(BufferTag *tag)
> +{
> +   tag->rlocator.spcOid = InvalidOid;
> +   tag->rlocator.dbOid = InvalidOid;
> +   tag->rlocator.relNumber = InvalidRelFileNumber;
> +   tag->forkNum = InvalidForkNumber;
> +   tag->blockNum = InvalidBlockNumber;
> +}
>
> 2) We could move this macros along with the other macros at the top of the 
> file:
> +/*
> + * The freeNext field is either the index of the next freelist entry,
> + * or one of these special values:
> + */
> +#define FREENEXT_END_OF_LIST   (-1)
> +#define FREENEXT_NOT_IN_LIST   (-2)
>
> 3) typo thn should be then:
> + * can raise it as necessary if we end up with more mapped relations. For
> + * now, we just pick a round number that is modestly larger thn the expected
> + * number of mappings.
> + */
>

Few more typos in 0004 patch as well:

the a value
interger
previosly
currenly

> 4) There is one whitespace issue:
> git am v10-0004-Widen-relfilenumber-from-32-bits-to-56-bits.patch
> Applying: Widen relfilenumber from 32 bits to 56 bits
> .git/rebase-apply/patch:1500: space before tab in indent.
> (relfilenumber; \
> warning: 1 line adds whitespace errors.
>
> Regards,
> Vignesh
>

Regards,
Amul

Re: Cygwin cleanup

2022-07-25 Thread Tom Lane

Thomas Munro  writes:
> 3.  You can't really run PostgreSQL on Cygwin for real, because its
> implementation of signals does not have reliable signal masking, so
> unsubtle and probably also subtle breakage occurs.  That was reported
> upstream by Noah years ago, but they aren't working on a fix.
> lorikeet shows random failures, and presumably any CI system will do
> the same...

If that's an accurate statement, shouldn't we just drop Cygwin support?
Now that we have a native Windows build, it's hard to see how any live
user would prefer to use the Cygwin build.

regards, tom lane

Cygwin cleanup

2022-07-25 Thread Thomas Munro

Hi,

Continuing a discussion started over at [1].  Moving this to a new
thread so that other thread can focus on Unix cleanup, and both
threads can get CI coverage...

1.  In a few places, it is alleged that both __CYGWIN__ and WIN32
might be defined at the same time.  Do you think we should try to get
rid of that possibility?  I understand that we have to have a few
tests for __CYGWIN__ here and there, because eg file permissions don't
work quite right and there's not much we can do about that.  But it
seems a bit unhelpful if we also have to worry about a more-or-less
POSIX-ish build taking WIN32 paths at uncertain times if we forget to
defend against that, or wonder why some places are not consistent.

A quick recap of the three flavours of Windows platform we have to
handle, as I understand it:

 * MSVC: Windowsy toolchain, Windowsy C
   * custom perl scripts instead of configure
   * msbuild instead of make
   * MSVC compiler
   * Windows C APIs
   * we provide our own emulation of some POSIX C APIs on top
 * MSYS: Unixy toolchain, Windowsy C
   * configure (portname = "win32")
   * make
   * GCC compiler
   * Windows C APIs
   * we provide our own emulation of some POSIX C APIs on top
 * Cygwin: Unixy toolchain, Unixy C
   * configure (portname = "cygwin")
   * make
   * GCC compiler
   * POSIX C APIs (emulations provided by the Cygwin runtime libraries)

(The configure/make part will be harmonised by the Meson project.)

The macro WIN32 is visibly defined by something in/near msbuild in
MSVC builds: /D WIN32 is right here in the build transcripts (whereas
the compiler defines _WIN32; good compiler).  I am not sure how
exactly it is first defined in MSYS builds; I suspect that MSYS gcc
might define it itself, but I don't have access to MSYS to check.  As
for Cygwin, the only translation unit where I could find both
__CYGWIN__ and WIN32 defined is dirmod.c, and that results from
including  and ultimately  (even though WIN32
isn't defined yet at that time).  I couldn't understand why we do
that, but I probably didn't read enough commit history.  The purpose
of dirmod.c on Cygwin today is only to wrap otherwise pure POSIX code
in retry loops to handle those spurious EACCES errors due to NT
sharing violations, so there is no need for that.

Proposal: let's make it a programming rule that we don't allow
definitions from Windows headers to leak into Cygwin translation
units, preferably by never including them, or if we really must, let's
grant specific exemptions in an isolated and well documented way.  We
don't seem to need any such exemptions currently.  Places where we
currently worry about the contradictory macros could become
conditional #error directives instead.

2.  To make it possible to test any of that, you either need a working
Windows+Cygwin setup, or working CI.  I'm a salty old Unix hacker so I
opted for the latter, and I also hope this will eventually be useful
to others.  Unfortunately I haven't figured out how to get everything
working yet, so some of the check-world tests are failing.  Clues most
welcome!

The version I'm posting here is set to run always, so that cfbot will
show it alongside others.  But I would imagine that if we got a
committable-quality version of this, it'd probably be opt-in, so you'd
have to say "ci-os-only: cygwin", or "ci-os-only: cygwin, windows" etc
in a commit to your private github account to ask for it (or maybe
we'll come up with a way to tell cfbot we want the full works of CI
checks; the same decision will come up for MSYS, OpenBSD and NetBSD CI
support that my colleague is working on).  There are other things to
fix too, including abysmal performance; see commit message.

3.  You can't really run PostgreSQL on Cygwin for real, because its
implementation of signals does not have reliable signal masking, so
unsubtle and probably also subtle breakage occurs.  That was reported
upstream by Noah years ago, but they aren't working on a fix.
lorikeet shows random failures, and presumably any CI system will do
the same...  I even wondered about putting our own magic entry/exit
macros into signal handlers, that would use atomics to implement a
second level of signal masking (?!) but that's an uncommonly large
bandaid for a defective platform...  and trying to fix Cygwin itself
is a rabbithole too far for me.

4.  When building with Cygwin GCC 11.3 you get a bunch of warnings
that don't show up on other platforms, seemingly indicating that it
interprets -Wimplicit-fallthrough=3 differently.  Huh?

[1] 
https://www.postgresql.org/message-id/CA%2BhUKGKZ_FjkBnjGADk%2Bpa2g4oKDcG8%3DSE5V23sPTP0EELfyzQ%40mail.gmail.com
From 8f300f5b804d5fd2268709d40e31b52c86d6799c Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 25 Jul 2022 23:05:10 +1200
Subject: [PATCH 1/2] WIP CI support for Cygwin.

XXX Doesn't get all the way through yet...

XXX Needs some --with-X options

XXX This should use a canned Docker image with all the right packages
installed

XXX We would never w

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Amit Kapila

On Tue, Jul 26, 2022 at 5:04 AM Peter Smith  wrote:
>
> On Mon, Jul 25, 2022 at 7:33 PM vignesh C  wrote:
> >
> > On Mon, Jul 25, 2022 at 12:58 PM Peter Smith  wrote:
> > >
> > > Firstly, I have some (case-sensitivity) questions about the previous
> > > patch which was already pushed [1].
> > >
> > > Q1. create_subscription docs
> > >
> > > I did not understand why the docs refer to slot_name = NONE, yet the
> > > newly added option says origin = none/any. I think that saying origin
> > > = NONE/ANY would be more consistent with the existing usage of NONE in
> > > this documentation.
> >
> > Both NONE and none are ok in the case of origin, if you want I can
> > change it to NONE/ANY in case of origin to keep it consistent.
> >
>
> I preferred the special origin values should be documented as NONE/ANY
> for better consistency, but let's see what others think about it.
>
> There will also be associated minor changes needed for a few
> error/hint messages.
>

I am not really sure how much we gain by maintaining consistency with
slot_name because if due to this we have to change the error messages
as well then it can create an inconsistency with reserved origin
names. Consider message: DETAIL:  Origin names "any", "none", and
names starting with "pg_" are reserved. Now, if we change this to
"ANY", "NONE" in the above message, it will look a bit odd as "pg_"
starts with lower case letters.

--
With Regards,
Amit Kapila.

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Amit Kapila

On Tue, Jul 26, 2022 at 7:48 AM Peter Smith  wrote:
>
> On Tue, Jul 26, 2022 at 11:43 AM Jonathan S. Katz  
> wrote:
> >
> ...
> > That said, this introduces a new restriction for this particular
> > scenario that doesn't exist on other scenarios. Instead, I would
> > advocate we document how to correctly set up the two-way replication
> > scenario (which we have a draft!), document the warnings around the
> > conflicts, perhaps include Vignesh's instructions on how to remediate a
> > conflict on initial sync, and consider throwing a WARNING as you suggested.
> >
> > Thoughts?
>
> Perhaps a WARNING can be useful if the SUBSCRIPTION was created with
> enabled=false because then the user still has a chance to reconsider,
>

Agreed. I think but in that case, when the user enables it, we need to
ensure that we won't allow replicating (during initial sync) remote
data. If this is really required/preferred, it can be done as a
separate enhancement.

-- 
With Regards,
Amit Kapila.

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Amit Kapila

On Tue, Jul 26, 2022 at 7:13 AM Jonathan S. Katz  wrote:
>
> On 7/25/22 4:54 AM, vignesh C wrote:
> >
> > Let's take a simple case to understand why copy_data = force is
> > required to replicate between two primaries for table t1 which has
> > data as given below:
>
> > step4 - Node-2:
> > Create Subscription sub2 Connection '' Publication
> > pub1_2 with (origin = none, copy_data=on);
> > If we had allowed the create subscription to be successful with
> > copy_data = on. After this the data will be something like this:
> > Node-1:
> > 1, 2, 3, 4, 5, 6, 7, 8
> >
> > Node-2:
> > 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8
> >
> > So, you can see that data on Node-2 (5, 6, 7, 8) is duplicated. In
> > case, table t1 has a unique key, it will lead to a unique key
> > violation and replication won't proceed.
> >
> > To avoid this we will throw an error:
> > ERROR:  could not replicate table "public.t1"
> > DETAIL:  CREATE/ALTER SUBSCRIPTION with origin = none and copy_data =
> > on is not allowed when the publisher has subscribed same table.
> > HINT:  Use CREATE/ALTER SUBSCRIPTION with copy_data = off/force.
>
> Thanks for the example. I agree that it is fairly simple to reproduce.
>
> I understand that "copy_data = force" is meant to protect a user from
> hurting themself. I'm not convinced that this is the best way to do so.
>
> For example today I can subscribe to multiple publications that write to
> the same table. If I have a primary key on that table, and two of the
> subscriptions try to write an identical ID, we conflict. We don't have
> any special flags or modes to guard against that from happening, though
> we do have documentation on conflicts and managing them.
>
> AFAICT the same issue with "copy_data" also exists in the above scenario
> too, even without the "origin" attribute.
>

That's true but there is no parameter like origin = NONE which
indicates that constraint violations or duplicate data problems won't
occur due to replication. In the current case, I think the situation
is different because a user has specifically asked not to replicate
any remote data by specifying origin = NONE, which should be dealt
differently. Note that current users or their setup won't see any
difference/change unless they specify the new parameter origin as
NONE.

-- 
With Regards,
Amit Kapila.

Re: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?

2022-07-25 Thread Kyotaro Horiguchi

At Mon, 25 Jul 2022 14:21:38 +0530, Bharath Rupireddy 
 wrote in 
> Hm. I think we must take this opportunity to clean it up. You are
> right, we don't need to parse the label file contents (just like we
> used to do previously after reading it from the file) in
> do_pg_backup_stop(), instead we can just pass a structure. Also,
> do_pg_backup_stop() isn't modifying any labelfile contents, but using
> startxlogfilename, startpoint and backupfrom from the labelfile
> contents. I think this information can easily be passed as a single
> structure. In fact, I might think a bit more here and wrap label_file,
> tblspc_map_file to a single structure something like below and pass it
> across the functions.
> 
> typedef struct BackupState
> {
> StringInfo label_file;
> StringInfo tblspc_map_file;
> char startxlogfilename[MAXFNAMELEN];
> XLogRecPtr startpoint;
> char backupfrom[20];
> } BackupState;
> 
> This way, the code is more readable, structured and we can remove 2
> sscanf() calls, 2 "invalid data in file" errors, 1 strchr() call, 1
> strstr() call. Only thing is that it creates code diff from the
> previous PG versions which is fine IMO. If okay, I'm happy to prepare
> a patch.
> 
> Thoughts?

It is more or less what was in my mind, but it seems that we don't
need StringInfo there, or should avoid it to signal the strings are
not editable.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Peter Smith

On Tue, Jul 26, 2022 at 11:43 AM Jonathan S. Katz  wrote:
>
...
> That said, this introduces a new restriction for this particular
> scenario that doesn't exist on other scenarios. Instead, I would
> advocate we document how to correctly set up the two-way replication
> scenario (which we have a draft!), document the warnings around the
> conflicts, perhaps include Vignesh's instructions on how to remediate a
> conflict on initial sync, and consider throwing a WARNING as you suggested.
>
> Thoughts?

Perhaps a WARNING can be useful if the SUBSCRIPTION was created with
enabled=false because then the user still has a chance to reconsider,
but otherwise, I don't see what good a warning does if the potentially
harmful initial copy is going to proceed anyway; isn't that like
putting a warning sign at the bottom of a cliff?

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Fix annotations nextFullXid

2022-07-25 Thread Fujii Masao





On 2022/07/23 14:01, Zhang Mingli wrote:

Hi,

VariableCacheData.nextFullXid is renamed to nextXid in commit 
https://github.com/postgres/postgres//commit/fea10a64340e529805609126740a540c8f9daab4 


Fix the annotations for less confusion.


Thanks for the patch! LGTM. I will commit it.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Remove useless arguments in ReadCheckpointRecord().

2022-07-25 Thread Fujii Masao





On 2022/07/26 9:42, Kyotaro Horiguchi wrote:

At Sun, 24 Jul 2022 22:40:16 -0400, Tom Lane  wrote in

Fujii Masao  writes:

On 2022/07/22 17:31, Kyotaro Horiguchi wrote:

I believed that it is recommended to move to the style not having the
outmost parens.  That style has been introduced by e3a87b4991.



I read the commit log, but I'm not sure yet if it's really recommended to 
remove extra parens even from the existing calls to errmsg(). Removing extra 
parens can interfere with back-patching of the changes around those errmsg(), 
can't it?


Right, so I wouldn't be in a hurry to change existing calls.  If you're
editing an ereport call for some other reason, it's fine to remove the
excess parens in it, because you're creating a backpatch hazard there
anyway.  But otherwise, I think such changes are make-work in themselves
and risk creating more make-work for somebody else later.


So, I meant to propose to remove extra parens for errmsg()'s where the
message string is edited.  Is it fine in that criteria?


Even in that case, removing parens may interfere with the back-patch. For 
example, please imagine the case where wasShutdown is changed to be set to true 
in the following code and this changed is back-patched to v15. If we modify 
only the log message in the following errmsg() and leave the parens around 
that, git cherry-pick of the change of wasShutdown to v15 would be completed 
successfully. But if we remove the parens, git cherry-pick would fail.

ereport(FATAL,
(errmsg("could not locate required checkpoint record"),
 errhint("If you are restoring from a backup, touch 
\"%s/recovery.signal\" and add required recovery options.\n"
 "If you are not restoring from a backup, try removing 
the file \"%s/backup_label\".\n"
 "Be careful: removing \"%s/backup_label\" 
will result in a corrupt cluster if restoring from a backup.",
 DataDir, DataDir, DataDir)));
wasShutdown = false;/* keep compiler quiet */

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: predefined role(s) for VACUUM and ANALYZE

2022-07-25 Thread Kyotaro Horiguchi

At Tue, 26 Jul 2022 10:47:12 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> WARNING:  skipping "pg_statistic" --- only superusers, roles with privileges 
> of pg_vacuum_analyze, or the database owner can vacuum it
> WARNING:  skipping "pg_type" --- only superusers, roles with privileges of 
> pg_vacuum_analyze, or the database owner can vacuum it
> 

> WARNING:  skipping "user_mappings" --- only table or database owner can 
> vacuum it

By the way, the last error above dissapears by granting
pg_vacuum_analyze to the role. Is there a reason the message is left
alone?  And If I specified the view directly, I would get the
following message.

postgres=> vacuum information_schema.user_mappings;
WARNING:  skipping "user_mappings" --- cannot vacuum non-tables or special 
system tables

So, "VACUUM;" does something wrong? Or is it the designed behavior?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: optimize lookups in snapshot [sub]xip arrays

2022-07-25 Thread Zhang Mingli





> On Jul 26, 2022, at 03:04, Nathan Bossart  wrote:
>> 
> From the discussion thus far, it seems there is interest in optimizing
> [sub]xip lookups, so I'd like to spend some time moving it forward.  I
> think the biggest open question is which approach to take.  Both the SIMD
> and hash table approaches seem viable, but I think I prefer the SIMD
> approach at the moment (see the last paragraph of quoted text for the
> reasons).  What do folks think?
> 
> -- 
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
> 
> 

+1, I’m not familiar with SIMD, will try to review this patch.


Regards,
Zhang Mingli

Re: predefined role(s) for VACUUM and ANALYZE

2022-07-25 Thread Kyotaro Horiguchi

At Mon, 25 Jul 2022 09:40:49 -0700, Nathan Bossart  
wrote in 
> On Mon, Jul 25, 2022 at 12:58:36PM +0530, Bharath Rupireddy wrote:
> > Thanks. I'm personally happy with more granular levels of control (as
> > we don't have to give full superuser access to just run a few commands
> > or maintenance operations) for various postgres commands. The only
> > concern is that we might eventually end up with many predefined roles
> > (perhaps one predefined role per command), spreading all around the
> > code base and it might be difficult for the users to digest all of the
> > roles in. It will be great if we can have some sort of rules or
> > methods to define a separate role for a command.
> 
> Yeah, in the future, I could see this growing to a couple dozen predefined
> roles.  Given they are relatively inexpensive and there are already 12 of
> them, I'm personally not too worried about the list becoming too unwieldy.
> Another way to help users might be to create additional aggregate
> predefined roles (like pg_monitor) for common combinations.

I agree to the necessity of that execution control, but I fear a
little how many similar roles will come in future (but it doesn't seem
so much?).  I didn't think so when pg_checkpoint was introdueced,
though.  That being said, since we're going to control
maintenance'ish-command execution via predefined roles so it is fine
in that criteria.

One arguable point would be whether we will need to put restriction
the target relations that Bob can vacuum/analyze. If we need that, the
new predeefined role is not sufficient then need a new syntax for that.

GRANT EXECUTION COMMAND VACUUM ON TABLE rel1 TO bob.
GRANT EXECUTION COMMAND VACUUM ON TABLES OWNED BY alice TO bob.
GRANT EXECUTION COMMAND VACUUM ON ALL TABLES OWNED BY alice TO bob.

However, one problem of these syntaxes is they cannot do something to
future relations.

So, considering that aspect, I would finally agree to the proposal
here. (In short +1 for what the patch does.)

About the patch, it seems fine as the whole except the change in error
messages.

-  (errmsg("skipping \"%s\" --- only superuser can analyze it",
+  (errmsg("skipping \"%s\" --- only superusers and roles with "
+  "privileges of pg_vacuum_analyze can analyze it",

The message looks a bit too verbose or lengty especially when many
relations are rejected.

WARNING:  skipping "pg_statistic" --- only superusers, roles with privileges of 
pg_vacuum_analyze, or the database owner can vacuum it
WARNING:  skipping "pg_type" --- only superusers, roles with privileges of 
pg_vacuum_analyze, or the database owner can vacuum it

WARNING:  skipping "user_mappings" --- only table or database owner can vacuum 
it
VACUUM

Couldn't we simplify the message?  For example "skipping \"%s\" ---
insufficient priviledges".  We could add a detailed (not a DETAIL:)
message at the end to cover all of the skipped relations, but it may
be too much.

By the way the patch splits an error message into several parts but
that later makes it harder to search for the message in the tree.  *I*
would suggest not splitting message strings.

# I refrain from suggesing removing parens surrounding errmsg() :p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Peter Smith

Here are some review comments for the patch v38-0002:

==

 - terminology

There seemed to be an inconsistent alternation of the terms
"primaries" and "nodes"... For example "Setting replication between
two primaries" versus "Adding a new node..." (instead of "Adding a new
primary..."?). I have included suggested minor rewording changes in
the review comments below, but please check in case I miss something.
Because I suggested changes to some titles maybe you will also want to
change the section ids too.

~~~

1. Commit message

The documentation was recently modified to remove the term
"bidirectional replication" and replace it all with "replication
between primaries", so this commit message (and also the patch name
itself) should be similarly modified.

~~~

2.
+   
+Replication between primaries is useful for creating a multi-master
+database environment for replicating write operations performed by any of
+the member nodes. The steps to create replication between primaries in
+various scenarios are given below. Note: User is responsible for designing
+their schemas in a way to minimize the risk of conflicts. See
+ for the details of logical
+replication conflicts. The logical replication restrictions applies to
+the replication between primaries also. See
+ for the details of
+logical replication restrictions.
+   

2a.
"User" -> "The user"

2b.
"The logical replication restrictions applies to..." --> "The logical
replication restrictions apply to..."

2c.
These are important notes. Instead of just being part of the text
blurb, perhaps these should be rendered as SGML  (or put them
both in a single  if you want)

~~~

3. Setting replication between two primaries

+   Setting replication between two primaries
+   
+The following steps demonstrate how to setup replication between two
+primaries when there is no table data present on both nodes
+node1 and node2:
+   

SUGGESTED
The following steps demonstrate how to set up replication between two
primaries (node1 and node2) when there is no table data present on
both nodes:

~~~

4.
+   
+Now the replication setup between two primaries node1
+and node2 is complete. Any incremental changes from
+node1 will be replicated to node2,
+and any incremental changes from node2 will be
+replicated to node1.
+   

"between two primaries" -> "between primaries"

~~~

5. Adding a new node when there is no table data on any of the nodes

SUGGESTION (title)
Adding a new primary when there is no table data on any of the primaries

~~~

6.
+   
+The following steps demonstrate adding a new node node3
+to the existing node1 and node2 when
+there is no t1 data on any of the nodes. This requires

SUGGESTION
The following steps demonstrate adding a new primary (node3) to the
existing primaries (node1 and node2) when there is no t1 data on any
of the nodes.

~~~

7. Adding a new node when table data is present on the existing nodes

SUGGESTION (title)
Adding a new primary when table data is present on the existing primaries

~~~

8.
+
+ The following steps demonstrate adding a new node node3
+ which has no t1 data to the existing
+ node1 and node2 where
+ t1 data is present. This needs similar steps; the only

SUGGESTION
The following steps demonstrate adding a new primary (node3) that has
no t1 data to the existing primaries (node1 and node2) where t1 data
is present.

~~~

9. Adding a new node when table data is present on the new node

SUGGESTION (title)
Adding a new primary that has existing table data

~~~

10.
+   
+
+ Adding a new node when table data is present on the new node is not
+ supported.
+
+   

SUGGESTION
Adding a new primary that has existing table data is not supported.

~~~

11. Generic steps for adding a new node to an existing set of primaries

SUGGESTION (title)
Generic steps for adding a new primary to an existing set of primaries

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Jonathan S. Katz


On 7/25/22 4:54 AM, vignesh C wrote:

On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz  wrote:


On 7/22/22 12:47 AM, Amit Kapila wrote:

On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz  wrote:



BTW, do you have any opinion on the idea of the first remaining patch
where we accomplish two things: a) Checks and throws an error if
'copy_data = on' and 'origin = none' but the publication tables were
also replicated from other publishers. b) Adds 'force' value for
copy_data parameter to allow copying in such a case. The primary
reason for this patch is to avoid loops or duplicate data in the
initial phase. We can't skip copying based on origin as we can do
while replicating changes from WAL. So, we detect that the publisher
already has data from some other node and doesn't allow replication
unless the user uses the 'force' option for copy_data.


In general, I agree with the patch; but I'm not sure why we are calling
"copy_data = force" in this case and how it varies from "on". I
understand the goal is to prevent the infinite loop, but is there some
technical restriction why we can't set "origin = none, copy_data = on"
and have this work (and apologies if I missed that upthread)?


Let's take a simple case to understand why copy_data = force is
required to replicate between two primaries for table t1 which has
data as given below:



step4 - Node-2:
Create Subscription sub2 Connection '' Publication
pub1_2 with (origin = none, copy_data=on);
If we had allowed the create subscription to be successful with
copy_data = on. After this the data will be something like this:
Node-1:
1, 2, 3, 4, 5, 6, 7, 8

Node-2:
1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8

So, you can see that data on Node-2 (5, 6, 7, 8) is duplicated. In
case, table t1 has a unique key, it will lead to a unique key
violation and replication won't proceed.

To avoid this we will throw an error:
ERROR:  could not replicate table "public.t1"
DETAIL:  CREATE/ALTER SUBSCRIPTION with origin = none and copy_data =
on is not allowed when the publisher has subscribed same table.
HINT:  Use CREATE/ALTER SUBSCRIPTION with copy_data = off/force.


Thanks for the example. I agree that it is fairly simple to reproduce.

I understand that "copy_data = force" is meant to protect a user from 
hurting themself. I'm not convinced that this is the best way to do so.


For example today I can subscribe to multiple publications that write to 
the same table. If I have a primary key on that table, and two of the 
subscriptions try to write an identical ID, we conflict. We don't have 
any special flags or modes to guard against that from happening, though 
we do have documentation on conflicts and managing them.


AFAICT the same issue with "copy_data" also exists in the above scenario 
too, even without the "origin" attribute. However, I think this case is 
more noticeable for "origin=none" because we currently default 
"copy_data" to "true" and in this case data can be copied in two 
directions.


That said, this introduces a new restriction for this particular 
scenario that doesn't exist on other scenarios. Instead, I would 
advocate we document how to correctly set up the two-way replication 
scenario (which we have a draft!), document the warnings around the 
conflicts, perhaps include Vignesh's instructions on how to remediate a 
conflict on initial sync, and consider throwing a WARNING as you suggested.


Thoughts?

Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature

Introduce wait_for_subscription_sync for TAP tests

2022-07-25 Thread Masahiko Sawada

Hi,

In tap tests for logical replication, we have the following code in many places:

$node_publisher->wait_for_catchup('tap_sub');
my $synced_query =
  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
$node_subscriber->poll_query_until('postgres', $synced_query)
  or die "Timed out while waiting for subscriber to synchronize data";

Also, we sometime forgot to check either one, like we fixed in commit
1f50918a6fb02207d151e7cb4aae4c36de9d827c.

I think we can have a new function to wait for all subscriptions to
synchronize data. The attached patch introduce a new function
wait_for_subscription_sync(). With this function, we can replace the
above code with this one function as follows:

$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v1-0001-Introduce-wait_for_subscription_sync-for-TAP-test.patch
Description: Binary data

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-25 Thread Masahiko Sawada

On Mon, Jul 25, 2022 at 7:57 PM shiy.f...@fujitsu.com
 wrote:
>
> Hi,
>
> I did some performance test for the master branch patch (based on v6 patch) to
> see if the bsearch() added by this patch will cause any overhead.

Thank you for doing performance tests!

>
> I tested them three times and took the average.
>
> The results are as follows, and attach the bar chart.
>
> case 1
> -
> No catalog modifying transaction.
> Decode 800k pgbench transactions. (8 clients, 100k transactions per client)
>
> master  7.5417
> patched 7.4107
>
> case 2
> -
> There's one catalog modifying transaction.
> Decode 100k/500k/1M transactions.
>
> 100k500k1M
> master  0.0576  0.1491  0.4346
> patched 0.0586  0.1500  0.4344
>
> case 3
> -
> There are 64 catalog modifying transactions.
> Decode 100k/500k/1M transactions.
>
> 100k500k1M
> master  0.0600  0.1666  0.4876
> patched 0.0620  0.1653  0.4795
>
> (Because the result of case 3 shows that there is a overhead of about 3% in 
> the
> case decoding 100k transactions with 64 catalog modifying transactions, I
> tested the next run of 100k xacts with or without catalog modifying
> transactions, to see if it affects subsequent decoding.)
>
> case 4.1
> -
> After the test steps in case 3 (64 catalog modifying transactions, decode 100k
> transactions), run 100k xacts and then decode.
>
> master  0.3699
> patched 0.3701
>
> case 4.2
> -
> After the test steps in case 3 (64 catalog modifying transactions, decode 100k
> transactions), run 64 DDLs(without checkpoint) and 100k xacts, then decode.
>
> master  0.3687
> patched 0.3696
>
> Summary of the tests:
> After applying this patch, there is a overhead of about 3% in the case 
> decoding
> 100k transactions with 64 catalog modifying transactions. This is an extreme
> case, so maybe it's okay.

Yes. If we're worried about the overhead and doing bsearch() is the
cause, probably we can try simplehash instead of the array.

An improvement idea is that we pass the parsed->xinfo down to
SnapBuildXidHasCatalogChanges(), and then return from that function
before doing bearch() if the parsed->xinfo doesn't have
XACT_XINFO_HAS_INVALS. That would save calling bsearch() for
non-catalog-modifying transactions. Is it worth trying?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Make name optional in CREATE STATISTICS

2022-07-25 Thread Michael Paquier

On Mon, Jul 25, 2022 at 12:55:54PM +0200, Alvaro Herrera wrote:
> Agreed.  I think you already have the query for that elsewhere in the
> test, so it's just a matter of copying it from there.

I actually already wrote most of it in 2cbc3c1, and I just needed to
extend things a bit to detect the OID changes :)

So, applied, with all the extra tests.
--
Michael


signature.asc
Description: PGP signature

Re: log_line_prefix: make it possible to add the search_path

2022-07-25 Thread Lukas Fittl

On Mon, Jul 25, 2022 at 12:38 AM Pierre Ducroquet 
wrote:

> usecase by not showing the schema, one of them being log_line_prefix.
> It is possible to work around this using the application_name, but a
> mistake
> on the application side would be fatal, while the search_path would still
> indicate the real tables used in a query.
>

I'm assuming this is mostly referring to STATEMENT log lines and other
situations where the original query is output (e.g. auto_explain).

+1 on the benefit of solving this (I've had this use case before), but I
think we can keep this more specific than a general log_line_prefix option.
The search_path isn't relevant to any log line that doesn't reference a
query, since e.g. autovacuum log output fully qualifies its relation names,
and many other common log lines have nothing to do with tables or queries.

What if we instead had something like this, as an extra CONTEXT (or DETAIL)
log line:

LOG: duration: 4079.697 ms execute :
SELECT * FROM x WHERE y = $1 LIMIT $2
DETAIL: parameters: $1 = 'long string', $2 = '1'
CONTEXT: settings: search_path = 'my_tenant_schema, "$user", public'

That way you could determine that the slow query was affecting the "x"
table in "my_tenant_schema".

This log output would be controlled by a new GUC, e.g.
"log_statement_search_path" with three settings: (1) never, (2)
non_default, (3) always.

The default would be "never" (same as today). "non_default" would output
the search path when a SET has modified it in the current session (and so
we couldn't infer it from the config or the role/database overrides).
"always" would always output the search path for statement-related log
lines.

Thanks,
Lukas

-- 
Lukas Fittl

Re: Remove useless arguments in ReadCheckpointRecord().

2022-07-25 Thread Kyotaro Horiguchi

At Sun, 24 Jul 2022 22:40:16 -0400, Tom Lane  wrote in 
> Fujii Masao  writes:
> > On 2022/07/22 17:31, Kyotaro Horiguchi wrote:
> >> I believed that it is recommended to move to the style not having the
> >> outmost parens.  That style has been introduced by e3a87b4991.
> 
> > I read the commit log, but I'm not sure yet if it's really recommended to 
> > remove extra parens even from the existing calls to errmsg(). Removing 
> > extra parens can interfere with back-patching of the changes around those 
> > errmsg(), can't it?
> 
> Right, so I wouldn't be in a hurry to change existing calls.  If you're
> editing an ereport call for some other reason, it's fine to remove the
> excess parens in it, because you're creating a backpatch hazard there
> anyway.  But otherwise, I think such changes are make-work in themselves
> and risk creating more make-work for somebody else later.

So, I meant to propose to remove extra parens for errmsg()'s where the
message string is edited.  Is it fine in that criteria?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2022-07-25 Thread Zhihong Yu

On Mon, Jul 25, 2022 at 4:39 PM David Rowley  wrote:

> On Fri, 22 Jul 2022 at 21:33, Richard Guo  wrote:
> > I can see this problem with
> > the query below:
> >
> > select max(b order by b), max(a order by a) from t group by a;
> >
> > When processing the first aggregate, we compose the 'currpathkeys' as
> > {a, b} and mark this aggregate in 'aggindexes'. When it comes to the
> > second aggregate, we compose its pathkeys as {a} and decide that it is
> > not stronger than 'currpathkeys'. So the second aggregate is not
> > recorded in 'aggindexes'. As a result, we fail to mark aggpresorted for
> > the second aggregate.
>
> Yeah, you're right. I have a missing check to see if currpathkeys are
> better than the pathkeys for the current aggregate. In your example
> case we'd have still processed the 2nd aggregate the old way instead
> of realising we could take the new pre-sorted path for faster
> processing.
>
> I've adjusted that in the attached to make it properly include the
> case where currpathkeys are better.
>
> Thanks for the review.
>
> David
>
Hi,

sort order the the planner chooses is simply : there is duplicate `the`

+   /* mark this aggregate is covered by 'currpathkeys'
*/

is covered by -> as covered by

Cheers

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2022-07-25 Thread David Rowley

On Fri, 22 Jul 2022 at 21:33, Richard Guo  wrote:
> I can see this problem with
> the query below:
>
> select max(b order by b), max(a order by a) from t group by a;
>
> When processing the first aggregate, we compose the 'currpathkeys' as
> {a, b} and mark this aggregate in 'aggindexes'. When it comes to the
> second aggregate, we compose its pathkeys as {a} and decide that it is
> not stronger than 'currpathkeys'. So the second aggregate is not
> recorded in 'aggindexes'. As a result, we fail to mark aggpresorted for
> the second aggregate.

Yeah, you're right. I have a missing check to see if currpathkeys are
better than the pathkeys for the current aggregate. In your example
case we'd have still processed the 2nd aggregate the old way instead
of realising we could take the new pre-sorted path for faster
processing.

I've adjusted that in the attached to make it properly include the
case where currpathkeys are better.

Thanks for the review.

David
From 1ebfac68080b10181086b4c9c9dcb3e3e1582cdb Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Mon, 3 May 2021 16:31:29 +1200
Subject: [PATCH v8] Add planner support for ORDER BY aggregates

ORDER BY aggreagtes have, since implemented in Postgres, been executed by
always performing a sort in nodeAgg.c to sort the tuples in the current
group into the correct order before calling the transition function on the
sorted results.  This was not great as often there might be an index that
could have provided pre-sorted input and allowed the transition functions
to be called as the rows come in, rather than having to store them in a
tuplestore in order to sort them later.

Here we get the planner on-board with picking a plan that provides
pre-sorted inputs to ORDER BY aggregates.

Since there can be many ORDER BY aggregates in any given query level, it's
very possible that we can't find an order that suits all aggregates.  The
sort order the the planner chooses is simply the one that suits the most
aggregate functions.  We take the most strictly sorted variation of each
order and see how many aggregate functions can use that, then we try again
with the order of the remaining aggregates to see if another order would
suit more aggregate functions.  For example:

SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ...

would request the sort order to be {a, b} because {a} is a subset of the
sort order of {a,b}, but;

SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ...

would just pick a plan ordered by {a}.

SELECT agg(a ORDER BY a),agg2(a ORDER BY b),agg3(a ORDER BY b) ...

would choose to order by {b} since two aggregates suit that vs just one
that requires order by {a}.

Discussion: 
https://postgr.es/m/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com
---
 .../postgres_fdw/expected/postgres_fdw.out|  46 +--
 src/backend/executor/execExpr.c   |  52 ++-
 src/backend/executor/execExprInterp.c | 102 ++
 src/backend/executor/nodeAgg.c|  34 +-
 src/backend/jit/llvm/llvmjit_expr.c   |  48 +++
 src/backend/jit/llvm/llvmjit_types.c  |   2 +
 src/backend/optimizer/path/pathkeys.c |  44 ++-
 src/backend/optimizer/plan/planagg.c  |   2 +-
 src/backend/optimizer/plan/planner.c  | 306 --
 src/backend/optimizer/prep/prepagg.c  |   7 +-
 src/include/executor/execExpr.h   |  17 +
 src/include/executor/nodeAgg.h|   9 +
 src/include/nodes/pathnodes.h |  16 +-
 src/include/nodes/primnodes.h |   7 +
 src/include/optimizer/paths.h |   4 +-
 src/test/regress/expected/aggregates.out  |  80 -
 .../regress/expected/partition_aggregate.out  | 118 +++
 src/test/regress/expected/sqljson.out |  12 +-
 src/test/regress/expected/tuplesort.out   |  40 +--
 src/test/regress/sql/aggregates.sql   |  43 +++
 20 files changed, 845 insertions(+), 144 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index ade797159d..a42e16e8d2 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3295,15 +3295,18 @@ create operator class my_op_class for type int using 
btree family my_op_family a
 -- extension yet.
 explain (verbose, costs off)
 select array_agg(c1 order by c1 using operator(public.<^)) from ft2 where c2 = 
6 and c1 < 100 group by c2;
- QUERY PLAN

-
+QUERY PLAN 

+--
  GroupAggregate
Output: array_agg(c1 ORDER BY c1 USING <^ NULLS LAST), c2
Group Key: ft2.c2
-   ->  Foreign Scan

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Peter Smith

On Mon, Jul 25, 2022 at 7:33 PM vignesh C  wrote:
>
> On Mon, Jul 25, 2022 at 12:58 PM Peter Smith  wrote:
> >
> > Firstly, I have some (case-sensitivity) questions about the previous
> > patch which was already pushed [1].
> >
> > Q1. create_subscription docs
> >
> > I did not understand why the docs refer to slot_name = NONE, yet the
> > newly added option says origin = none/any. I think that saying origin
> > = NONE/ANY would be more consistent with the existing usage of NONE in
> > this documentation.
>
> Both NONE and none are ok in the case of origin, if you want I can
> change it to NONE/ANY in case of origin to keep it consistent.
>

I preferred the special origin values should be documented as NONE/ANY
for better consistency, but let's see what others think about it.

There will also be associated minor changes needed for a few
error/hint messages.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Peter Smith

On Mon, Jul 25, 2022 at 6:54 PM Amit Kapila  wrote:
>
> On Mon, Jul 25, 2022 at 12:58 PM Peter Smith  wrote:
> >
...
> >
> > Q2. parse_subscription_options
> >
> > Similarly, in the code (parse_subscription_options), I did not
> > understand why the checks for special name values are implemented
> > differently:
> >
> > The new 'origin' code is using pg_strcmpcase to check special values
> > (none/any), and the old 'slot_name' code uses case-sensitive strcmp to
> > check the special value (none).
> >
>
> We have a restriction for slot_names for lower case letters (aka
> "Replication slot names may only contain lower case letters, numbers,
> and the underscore character.") whereas there is no such restriction
> in origin name, that's why the check is different. So, if you try with
> slot_name = 'NONE', you will get the error.
>

2022-07-26 09:06:06.380 AEST [3630] STATEMENT:  create subscription
mysub connection 'host=localhost' publication mypub with
(slot_name='None', enabled=false, create_slot=false);
ERROR:  replication slot name "None" contains invalid character
HINT:  Replication slot names may only contain lower case letters,
numbers, and the underscore character.

You are right. Thanks for the explanation.

(Aside: Probably that error message wording ought to say "contains
invalid characters" instead of "contains invalid character")

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Proposal to provide the facility to set binary format output for specific OID's per session

2022-07-25 Thread Dave Cramer

Hi Sehrope,


On Mon, 25 Jul 2022 at 17:22, Sehrope Sarkuni  wrote:

> Idea here makes sense and I've seen this brought up repeatedly on the JDBC
> lists.
>
> Does the driver need to be aware that this SET command was executed? I'm
> wondering what happens if an end user executes this with an OID the driver
> does not actually know how to handle.
>
I suppose there would be a failure to read the attribute correctly.

>
> > + Oid *tmpOids = palloc(length+1);
> > ...
> > + tmpOids = repalloc(tmpOids, length+1);
>
> These should be: sizeof(Oid) * (length + 1)
>

Yes they should, thanks!

>
> Also, I think you need to specify an explicit context via
> MemoryContextAlloc or the allocated memory will be in the default context
> and released at the end of the command.
>

Also good catch

Thanks,

Dave

>

Re: Proposal to provide the facility to set binary format output for specific OID's per session

2022-07-25 Thread Sehrope Sarkuni

Idea here makes sense and I've seen this brought up repeatedly on the JDBC
lists.

Does the driver need to be aware that this SET command was executed? I'm
wondering what happens if an end user executes this with an OID the driver
does not actually know how to handle.

> + Oid *tmpOids = palloc(length+1);
> ...
> + tmpOids = repalloc(tmpOids, length+1);

These should be: sizeof(Oid) * (length + 1)

Also, I think you need to specify an explicit context via
MemoryContextAlloc or the allocated memory will be in the default context
and released at the end of the command.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Re: Pluggable toaster

2022-07-25 Thread Nikita Malakhov

Hi hackers!

Here is the patch set rebased to master from 22.07. I've got some trouble
rebasing it due to
conflicts, so it took some time.
I've made some corrections according to Matthias and Aleksander comments,
though not all
of them, because some require refactoring of very old code and would have
to take much more
effort. I keep these recommended corrections in mind and already working on
them but it will
require extensive testing and some more work, so the will be presented
later, in next iteration
or next patch - these are optimization of heap AM according
table_relation_fetch_toast_slice,
double-index problem and I'm continue to straighten out code related to
TOAST functionality.
It's quite a task because as I mentioned before, this core was scattered
over Heap AM and
reference implementation of TOAST is very tightly intertwined with Heap AM
itself. Default
toaster uses Heap AM storage so it is unlikely that it will be possible to
fully detach it from
Heap.

However, I've made some more refactoring, removed empty sources, corrected
code according
to naming conventions, and extended README.toastapi document.

0002_toaster_interface_v10 contains TOAST API with Dummy toaster as an
example (but I would,
as recommended, remove Dummy toaster and provide it as an extension), and
default Toaster
was left as-is (reference implementation).

0003_toaster_default_v9 implements reference TOAST as Default Toaster via
TOAST API,
so Heap AM calls Toast only via API, and does not have direct calls to
Toast functionality.

0004_toaster_snapshot_v8 continues refactoring and has some important
changes (added
into README.toastapi new part related TOAST API extensibility - the virtual
functions table).

Also, I'll provide documentation package corrected according to Matthias'
remarks later,
in the next patch set.

Please check attached patch set.
Also, GIT branch with this patch resides here:
https://github.com/postgrespro/postgres/tree/toasterapi_clean

Thank all reviewers for feedback!

On Sat, Jul 23, 2022 at 10:15 AM Nikita Malakhov  wrote:

> Hi hackers!
>
> Matthias, thank you very much for your feedback!
> Sorry, I forgot to attach files.
> Attaching here, but they are for the commit tagged "15beta2", I am
> currently
> rebasing this branch onto the actual master and will provide rebased
> version,
> with some corrections according to your feedback, in a day or two.
>
> >Indexes cannot index toasted values, so why would the toaster oid be
> >interesting for index storage properties?
>
> Here Teodor might correct me. Toast tables are indexed, and Heap TOAST
> mechanics accesses Toasted tuples by index, isn't it the case?
>
> >Moving toasters seems quite expensive when compared to just index
> >checks. When you have many toasters, but only a few hot ones, this
> >currently will move the "cold" toasters around a lot. Why not use a
> >stack instead (or alternatively, a 'zipper' (or similar) data
> >structure), where the hottest toaster is on top, so that we avoid
> >larger memmove calls?
>
> This is a reasonable remark, we'll consider it for the next iteration. Our
> reason
> is that we think there won't be a lot of custom Toasters, most likely less
> then
> a dozen, for the most complex/heavy datatypes so we haven't considered
> these expenses.
>
> >To the best of my knowledge varlena-pointers are unaligned; and this
> >is affirmed by the comment immediately under varattrib_1b_e. Assuming
> >alignment to 16 bits should/would be incorrect in some of the cases.
> >This is handled for normal varatt_external values by memcpy-ing the
> >value into local (aligned) fields before using them, but that doesn't
> >seem to be the case for varatt_custom?
>
> Alignment correction seemed reasonable for us because structures are
> anyway aligned in memory, so when we use 1 and 2-byte fields along
> with 4-byte it may create a lot of padding. Am I wrong? Also, correct
> alignment somewhat simplifies access to structure fields.
>
> >0003: (re-implement default toaster using toaster interface)
> >I see significant changes to the dummy toaster (created in 0002),
> >could those be moved to patch 0002 in the next iteration?
> Will do.
>
> >detoast.c and detoast.h are effectively empty after this patch (only
> >imports and commented-out code remain), please fully remove them
> >instead - that saves on patch diff size.
> Will do.
>
> About the location of toast_ functions: these functions are part of Heap
> TOAST mechanics, and they were scattered among other Heap internals
> sources. I've tried to gather them and put them in more straight order,
> but
> this work is not fully finished yet and will take some time. Working on it.
>
> I'll check if table_relation_fetch_toast_slice could be removed, thanks for
> the remark.
>
> toast_helper - done, will be provided in rebased version.
>
> toast_internals - this one is an internal part of TOAST implemented in
> Heap AM, but I'll try to straighten it out as much as I could.
>
> naming convention

Re: log_line_prefix: make it possible to add the search_path

2022-07-25 Thread Tom Lane

Alvaro Herrera  writes:
> Maybe your idea of allowing arbitrary GUCs is not a bad one, something
> like
>   %{search_path}G
> (where we add a letter at the end just so we can add other things in the
> future that aren't GUCs.)

I'm pretty uncomfortable about the amount of code that could potentially
be reached during an error logging attempt if we do that.

regards, tom lane

Re: Unprivileged user can induce crash by using an SUSET param in PGOPTIONS

2022-07-25 Thread Tom Lane

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Tom Lane  writes:
>> I wonder if it'd be a good idea to convert
>> auto_explain's TAP test to load auto_explain via session_preload_libraries
>> instead of shared_preload_libraries, and then pass in the settings for
>> each test via PGOPTIONS instead of constantly rewriting postgresql.conf.

> That whole config-file rewriting did feel a bit icky when I added more
> tests recently, but I completely forgot about PGOPTIONS and -c.
> Something like the attached is indeed much nicer.

Thanks!  I added a test to verify the permissions-checking issue
and pushed it.

regards, tom lane

Re: Expose Parallelism counters planned/execute in pg_stat_statements

2022-07-25 Thread Anthony Sotolongo


On 23-07-22 00:03, Julien Rouhaud wrote:

Hi,

On Fri, Jul 22, 2022 at 02:11:35PM -0400, Anthony Sotolongo wrote:

On 22-07-22 12:08, Julien Rouhaud wrote:

With your current patch it only says if the plan and execution had parallelism
enabled, but not if it could actually use with parallelism at all.  It gives
some information, but it's not that useful on its own.

The original idea of this patch was  identify when occurred some of the
circumstances under which it was  impossible to execute that plan in
parallel at execution time

as mentioned on the documentation at [1]

For example:

Due to the different client configuration, the execution behavior can be
different , and can affect the performance:

As you can see in the above execution plan


 From psql

     ->  Gather Merge  (cost=779747.43..795700.62 rows=126492
width=40) (actual time=1109.515..1472.369 rows=267351 loops=1)
       Output: t.entity_node_id, t.configuration_id,
t.stream_def_id, t.run_type_id, t.state_datetime, (PARTIAL count(1))
       Workers Planned: 6
       Workers Launched: 6
       ->  Partial GroupAggregate (cost=778747.33..779327.09
rows=21082 width=40) (actual time=889.129..974.028 rows=38193 loops=7)

 From jdbc (from dbeaver)

     ->  Gather Merge  (cost=779747.43..795700.62 rows=126492
width=40) (actual time=4383.576..4385.856 rows=398 loops=1)
       Output: t.entity_node_id, t.configuration_id,
t.stream_def_id, t.run_type_id, t.state_datetime, (PARTIAL count(1))
       Workers Planned: 6
       Workers Launched: 0
       ->  Partial GroupAggregate (cost=778747.33..779327.09
rows=21082 width=40) (actual time=4383.574..4385.814 rows=398 loops=1)

This example was  discussed also at this Thread [2]

With these PSS counters will be easily identified when some of these causes
are happening.

I agree it can be hard to identify, but I don't think that your proposed
approach is enough to be able to do so.  There's no guarantee of an exact 1:1
mapping between planning and execution, so you could totally see the same value
for parallel_planned and parallel_exec and still have the dbeaver behavior
happening.

If you want to be able to distinguish "plan was parallel but execution was
forced to disable it" from "plan wasn't parallel, so was the execution", you
need some specific counters for both situations.


Thanks for your time and feedback, yes we were missing some details, so 
we need to rethink some points to continue

Re: optimize lookups in snapshot [sub]xip arrays

2022-07-25 Thread Nathan Bossart

On Sat, Jul 16, 2022 at 08:59:57PM -0700, Nathan Bossart wrote:
> On Fri, Jul 15, 2022 at 01:08:57PM -0700, Andres Freund wrote:
>> I wonder if we additionally / alternatively could use a faster method of
>> searching the array linearly, e.g. using SIMD.
> 
> I looked into using SIMD.  The patch is attached, but it is only intended
> for benchmarking purposes and isn't anywhere close to being worth serious
> review.  There may be a simpler/better way to implement the linear search,
> but this seemed to work well.  Overall, SIMD provided a decent improvement.
> I had to increase the number of writers quite a bit in order to demonstrate
> where the hash tables began winning.  Here are the numbers:
> 
> writers head simd hash
> 256 663  632  694
> 512 530  618  637
> 768 489  544  573
> 1024364  508  562
> 2048185  306  485
> 4096146  197  441
> 
> While it is unsurprising that the hash tables perform the best, there are a
> couple of advantages to SIMD that might make that approach worth
> considering.  For one, there's really no overhead (i.e., you don't need to
> sort the array or build a hash table), so we can avoid picking an arbitrary
> threshold and just have one code path.  Also, a SIMD implementation for a
> linear search through an array of integers could likely be easily reused
> elsewhere.

>From the discussion thus far, it seems there is interest in optimizing
[sub]xip lookups, so I'd like to spend some time moving it forward.  I
think the biggest open question is which approach to take.  Both the SIMD
and hash table approaches seem viable, but I think I prefer the SIMD
approach at the moment (see the last paragraph of quoted text for the
reasons).  What do folks think?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: very long record lines in expanded psql output

2022-07-25 Thread Andrew Dunstan

On 2021-10-03 Su 16:03, Platon Pronko wrote:
>> Hi,
>>
>> +   pg_log_error("\\pset: allowed xheader_width values
>> are full
>> (default), column, page, or an number specifying exact width.");
>>
>>   an number specifying -> a number specifying
>>
>> Cheers
>>
>
> Fixed, attaching the updated patch. Thank you!
>
>

Committed. There were a couple of bits missing, which I filled in, and I
changed the behaviour when the option is given without an argument, so
that instead of resetting it the current value is shown, similarly to
how most pset options work.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Proposal to provide the facility to set binary format output for specific OID's per session

2022-07-25 Thread Joe Conway


On 7/25/22 10:07, Jack Christensen wrote:
The advantage is to be able to use the binary format with only a single 
network round trip in cases where prepared statements are not possible. 
e.g. when using PgBouncer. Using the simple protocol with this patch 
lets users of pgx (the Go driver mentioned above) and PgBouncer use the 
binary format. The performance gains can be significant especially with 
types such as timestamptz that are very slow to parse.


As far as only sending binary types that the client can understand, the 
client driver would call `set format_binary` at the beginning of the 
session.


+1 makes a lot of sense to me.

Dave please add this to the open commitfest (202209)

--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row

2022-07-25 Thread Hamid Akhtar

On Fri, 1 Jul 2022 at 13:01, Drouvot, Bertrand  wrote:

> Hi,
> On 6/30/22 10:24 AM, Hamid Akhtar wrote:
>
> On Mon, 27 Jun 2022 at 15:52, Bharath Rupireddy <
> bharath.rupireddyforpostg...@gmail.com> wrote:
>
>> On Mon, Jun 27, 2022 at 1:40 PM Drouvot, Bertrand 
>> wrote:
>> >
>> > Hi,
>> >
>> > On 6/27/22 9:31 AM, Hamid Akhtar wrote:
>> >
>> >
>> > Hello Hackers,
>> >
>> > While working on one of my blogs on the B-Tree indexes, I needed to
>> look at a range of B-Tree page statistics. So the goto solution was to use
>> pageinspect. However, reviewing stats for multiple pages meant issuing
>> multiple queries.
>>
>> +1 to improve the API.
>>
> I think it makes sense too.
>
> But what about the other pageinspect's functions that also use a single
> blkno as parameter? Should not the patch also takes care of them?
>
> I've started working on that. But it's going to be a much bigger change
with a lot of code refactoring.

So, taking this one step at a time, IMHO, this patch is good to be reviewed
now.


> Regards,
>
> Bertrand
>

Re: explain analyze rows=%.0f

2022-07-25 Thread Hamid Akhtar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:not tested

LGTM

Re: predefined role(s) for VACUUM and ANALYZE

2022-07-25 Thread Nathan Bossart

On Mon, Jul 25, 2022 at 12:58:36PM +0530, Bharath Rupireddy wrote:
> Thanks. I'm personally happy with more granular levels of control (as
> we don't have to give full superuser access to just run a few commands
> or maintenance operations) for various postgres commands. The only
> concern is that we might eventually end up with many predefined roles
> (perhaps one predefined role per command), spreading all around the
> code base and it might be difficult for the users to digest all of the
> roles in. It will be great if we can have some sort of rules or
> methods to define a separate role for a command.

Yeah, in the future, I could see this growing to a couple dozen predefined
roles.  Given they are relatively inexpensive and there are already 12 of
them, I'm personally not too worried about the list becoming too unwieldy.
Another way to help users might be to create additional aggregate
predefined roles (like pg_monitor) for common combinations.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: making relfilenodes 56 bits

2022-07-25 Thread Ashutosh Sharma

Hi,

As oid and relfilenumber are linked with each other, I still see that if
the oid value reaches the threshold limit, we are unable to create a table
with storage. For example I set FirstNormalObjectId to 4294967294 (one
value less than the range limit of 2^32 -1 = 4294967295). Now when I try to
create a table, the CREATE TABLE command gets stuck because it is unable to
find the OID for the comp type although it can find a new relfilenumber.

postgres=# create table t1(a int);
CREATE TABLE

postgres=# select oid, reltype, relfilenode from pg_class where relname =
't1';
oid |  reltype   | relfilenode
++-
 4294967295 | 4294967294 |  10
(1 row)

postgres=# create table t2(a int);
^CCancel request sent
ERROR:  canceling statement due to user request

creation of t2 table gets stuck as it is unable to find a new oid.
Basically the point that I am trying to put here is even though we will be
able to find the new relfile number by increasing the relfilenumber size
but still the commands like above will not execute if the oid value (of 32
bits) has reached the threshold limit.

--
With Regards,
Ashutosh Sharma.



On Wed, Jul 20, 2022 at 4:57 PM Dilip Kumar  wrote:

> On Mon, Jul 18, 2022 at 4:51 PM Dilip Kumar  wrote:
> >
> > I was doing some more testing by setting the FirstNormalRelFileNumber
> > to a high value(more than 32 bits) I have noticed a couple of problems
> > there e.g. relpath is still using OIDCHARS macro which says max
> > relfilenumber file name can be only 10 character long which is no
> > longer true.  So there we need to change this value to 20 and also
> > need to carefully rename the macros and other variable names used for
> > this purpose.
> >
> > Similarly there was some issue in macro in buf_internal.h while
> > fetching the relfilenumber.  So I will relook into all those issues
> > and repost the patch soon.
>
> I have fixed these existing issues and there was also some issue in
> pg_dump.c which was creating problems in upgrading to the same version
> while using a higher range of the relfilenumber.
>
> There was also an issue where the user table from the old cluster's
> relfilenode could conflict with the system table of the new cluster.
> As a solution currently for system table object (while creating
> storage first time) we are keeping the low range of relfilenumber,
> basically we are using the same relfilenumber as OID so that during
> upgrade the normal user table from the old cluster will not conflict
> with the system tables in the new cluster.  But with this solution
> Robert told me (in off list chat) a problem that in future if we want
> to make relfilenumber completely unique within a cluster by
> implementing the CREATEDB differently then we can not do that as we
> have created fixed relfilenodes for the system tables.
>
> I am not sure what exactly we can do to avoid that because even if we
> do something  to avoid that in the new cluster the old cluster might
> be already using the non-unique relfilenode so after upgrading the new
> cluster will also get those non-unique relfilenode.
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>

Re: Pre-allocating WAL files

2022-07-25 Thread Nathan Bossart

On Thu, Jul 14, 2022 at 11:34:07AM -0700, Nathan Bossart wrote:
> It's now been over a year since I first posted a patch in this thread, and
> I still sense very little interest for this feature.  I intend to mark it
> as Withdrawn at the end of this commitfest.

Done.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: allow specifying action when standby encounters incompatible parameter settings

2022-07-25 Thread Nathan Bossart

On Mon, Jul 18, 2022 at 03:17:10PM -0700, Nathan Bossart wrote:
> Given this feedback, I intend to mark the associated commitfest entry as
> Withdrawn at the conclusion of the current commitfest.

Done.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: log_line_prefix: make it possible to add the search_path

2022-07-25 Thread Alvaro Herrera

On 2022-Jul-25, Pierre wrote:

> On Monday, July 25, 2022 11:52:41 AM CEST Alvaro Herrera wrote:

> > It seems that this would be too noisy to be truly usable.  What if we
> > emitted a log line when the variable changed, and the value that's in
> > use when the connection starts?
> 
> Then the log files would be filled by these messages, only to be able to make 
> use of the few slow queries that are logged during the day.

Ah, yeah, that's not useful for that case ...

> Or it would need to be a log before each slow query. I'm not sure how
> well it would work.

... and this would probably be prohibitively complex to implement and
use, as well as too slow for the high traffic case.


Maybe your idea of allowing arbitrary GUCs is not a bad one, something
like
  %{search_path}G
(where we add a letter at the end just so we can add other things in the
future that aren't GUCs.)

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: fairywren hung in pg_basebackup tests

2022-07-25 Thread Andrew Dunstan



On 2022-07-25 Mo 11:24, Thomas Munro wrote:
> On Tue, Jul 26, 2022 at 3:08 AM Tom Lane  wrote:
>> I wrote:
>>> Right, so the "glob" failed to find anything.  Seeing that this test
>>> is new as of 534472375, which postdates fairywren's last successful
>>> run, I'd guess that the "glob" needs adjustment for msys path names.
> The test added by 534472375 is at the end, hundreds of lines later
> than the one that appears to be failing.


Right.


>
>> Hmm ... an alternative theory is that the test is fine, and what
>> it's telling us is that get_dirent_type() is still wrong on msys.
>> Would that end in this symptom?
> Hmm, possibly yes (if it sees a non-symlink, it'll skip it).   If
> someone can run the test on an msys system, perhaps they could put a
> debugging elog() into the code modified by 9d3444dc to log d_name and
> the d_type that is returned?  I'm struggling to understand why msys
> would change the answer though.



I have no idea either. The link exists and it is a junction. I'll see
about logging details.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Refactoring postgres_fdw/connection.c

2022-07-25 Thread Fujii Masao


Hi,

When reviewing the postgres_fdw parallel-abort patch [1], I found that
there are several duplicate codes in postgres_fdw/connection.c.
Which seems to make it harder to review the patch changing connection.c.
So I'd like to remove such duplicate codes and refactor the functions
in connection.c. I attached the following three patches.

There are two functions, pgfdw_get_result() and pgfdw_get_cleanup_result(),
to get a query result. They have almost the same code, call PQisBusy(),
WaitLatchOrSocket(), PQconsumeInput() and PQgetResult() in the loop,
but only pgfdw_get_cleanup_result() allows its callers to specify the timeout.
0001 patch transforms pgfdw_get_cleanup_result() to the common function
to get a query result and makes pgfdw_get_result() use it instead of
its own (duplicate) code. The patch also renames pgfdw_get_cleanup_result()
to pgfdw_get_result_timed().

pgfdw_xact_callback() and pgfdw_subxact_callback() have similar codes to
issue COMMIT or RELEASE SAVEPOINT commands. 0002 patch adds the common function,
pgfdw_exec_pre_commit(), for that purpose, and changes those functions
so that they use the common one.

pgfdw_finish_pre_commit_cleanup() and pgfdw_finish_pre_subcommit_cleanup()
have similar codes to wait for the results of COMMIT or RELEASE SAVEPOINT 
commands.
0003 patch adds the common function, pgfdw_finish_pre_commit(), for that 
purpose,
and replaces those functions with the common one.
That is, pgfdw_finish_pre_commit_cleanup() and 
pgfdw_finish_pre_subcommit_cleanup()
are no longer necessary and 0003 patch removes them.

[1] https://commitfest.postgresql.org/38/3392/

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONFrom aa115d03880968c2e5bab68415e06e17a337a45b Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Mon, 25 Jul 2022 17:25:24 +0900
Subject: [PATCH 1/3] Refactor pgfdw_get_result() and
 pgfdw_get_cleanup_result().

---
 contrib/postgres_fdw/connection.c | 125 +++---
 1 file changed, 47 insertions(+), 78 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c 
b/contrib/postgres_fdw/connection.c
index 939d114f02..cbee285480 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -108,8 +108,8 @@ static void pgfdw_reset_xact_state(ConnCacheEntry *entry, 
bool toplevel);
 static bool pgfdw_cancel_query(PGconn *conn);
 static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query,
 bool 
ignore_errors);
-static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
-
PGresult **result, bool *timed_out);
+static bool pgfdw_get_result_timed(PGconn *conn, TimestampTz endtime,
+  PGresult 
**result, bool *timed_out);
 static void pgfdw_abort_cleanup(ConnCacheEntry *entry, bool toplevel);
 static void pgfdw_finish_pre_commit_cleanup(List *pending_entries);
 static void pgfdw_finish_pre_subcommit_cleanup(List *pending_entries,
@@ -799,53 +799,12 @@ pgfdw_exec_query(PGconn *conn, const char *query, 
PgFdwConnState *state)
 PGresult *
 pgfdw_get_result(PGconn *conn, const char *query)
 {
-   PGresult   *volatile last_res = NULL;
-
-   /* In what follows, do not leak any PGresults on an error. */
-   PG_TRY();
-   {
-   for (;;)
-   {
-   PGresult   *res;
-
-   while (PQisBusy(conn))
-   {
-   int wc;
-
-   /* Sleep until there's something to do */
-   wc = WaitLatchOrSocket(MyLatch,
-  
WL_LATCH_SET | WL_SOCKET_READABLE |
-  
WL_EXIT_ON_PM_DEATH,
-  
PQsocket(conn),
-  -1L, 
PG_WAIT_EXTENSION);
-   ResetLatch(MyLatch);
-
-   CHECK_FOR_INTERRUPTS();
-
-   /* Data available in socket? */
-   if (wc & WL_SOCKET_READABLE)
-   {
-   if (!PQconsumeInput(conn))
-   pgfdw_report_error(ERROR, NULL, 
conn, false, query);
-   }
-   }
-
-   res = PQgetResult(conn);
-   if (res == NULL)
-   break;  /* query is complete */
+   PGresult   *result = NULL;
 
-   PQclear(last_res);
-

Re: fairywren hung in pg_basebackup tests

2022-07-25 Thread Tom Lane

Noah Misch  writes:
> On Mon, Jul 25, 2022 at 09:44:21AM -0400, Andrew Dunstan wrote:
>> Perhaps we should have a guard in system_or_bail() and/or system_log()
>> which bails if some element of @_ is undefined.

+1, seeing how hard this is to diagnose.

> That would be reasonable.  Also reasonable to impose some long timeout, maybe
> 10x or 100x PG_TEST_TIMEOUT_DEFAULT, on calls to those functions.

Why would it need to be more than PG_TEST_TIMEOUT_DEFAULT?

regards, tom lane

Re: Unprivileged user can induce crash by using an SUSET param in PGOPTIONS

2022-07-25 Thread Dagfinn Ilmari Mannsåker

Tom Lane  writes:

> I wonder if it'd be a good idea to convert
> auto_explain's TAP test to load auto_explain via session_preload_libraries
> instead of shared_preload_libraries, and then pass in the settings for
> each test via PGOPTIONS instead of constantly rewriting postgresql.conf.

That whole config-file rewriting did feel a bit icky when I added more
tests recently, but I completely forgot about PGOPTIONS and -c.
Something like the attached is indeed much nicer.

- ilmari

>From c9d6850e89ff95df1034c13ca9c10aaf1cf0f02f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Mon, 25 Jul 2022 16:23:09 +0100
Subject: [PATCH] Use PGOPTIONS and session_preload_libraries in auto_explain
 tests

Instead of constantly rewriting postgresql.conf.  Also change the from
shared_preload_libraries to session_preload_libraries to givesome
coverage of extensions providing custom GUCs.
---
 contrib/auto_explain/t/001_auto_explain.pl | 24 +++---
 1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/contrib/auto_explain/t/001_auto_explain.pl b/contrib/auto_explain/t/001_auto_explain.pl
index 1d952fb54d..b86761919b 100644
--- a/contrib/auto_explain/t/001_auto_explain.pl
+++ b/contrib/auto_explain/t/001_auto_explain.pl
@@ -16,38 +16,20 @@ sub query_log
 	my ($node, $sql, $params) = @_;
 	$params ||= {};
 
-	if (keys %$params)
-	{
-		for my $key (keys %$params)
-		{
-			$node->append_conf('postgresql.conf', "$key = $params->{$key}\n");
-		}
-		$node->reload;
-	}
+	local $ENV{PGOPTIONS} = join " ", map { "-c $_=$params->{$_}" } keys %$params;
 
 	my $log= $node->logfile();
 	my $offset = -s $log;
 
 	$node->safe_psql("postgres", $sql);
 
-	my $log_contents = slurp_file($log, $offset);
-
-	if (keys %$params)
-	{
-		for my $key (keys %$params)
-		{
-			$node->adjust_conf('postgresql.conf', $key, undef);
-		}
-		$node->reload;
-	}
-
-	return $log_contents;
+	return slurp_file($log, $offset);
 }
 
 my $node = PostgreSQL::Test::Cluster->new('main');
 $node->init;
 $node->append_conf('postgresql.conf',
-	"shared_preload_libraries = 'auto_explain'");
+	"session_preload_libraries = 'auto_explain'");
 $node->append_conf('postgresql.conf', "auto_explain.log_min_duration = 0");
 $node->append_conf('postgresql.conf', "auto_explain.log_analyze = on");
 $node->start;
-- 
2.30.2

Re: fairywren hung in pg_basebackup tests

2022-07-25 Thread Thomas Munro

On Tue, Jul 26, 2022 at 3:08 AM Tom Lane  wrote:
> I wrote:
> > Right, so the "glob" failed to find anything.  Seeing that this test
> > is new as of 534472375, which postdates fairywren's last successful
> > run, I'd guess that the "glob" needs adjustment for msys path names.

The test added by 534472375 is at the end, hundreds of lines later
than the one that appears to be failing.

> Hmm ... an alternative theory is that the test is fine, and what
> it's telling us is that get_dirent_type() is still wrong on msys.
> Would that end in this symptom?

Hmm, possibly yes (if it sees a non-symlink, it'll skip it).   If
someone can run the test on an msys system, perhaps they could put a
debugging elog() into the code modified by 9d3444dc to log d_name and
the d_type that is returned?  I'm struggling to understand why msys
would change the answer though.

Re: fairywren hung in pg_basebackup tests

2022-07-25 Thread Noah Misch

On Mon, Jul 25, 2022 at 09:44:21AM -0400, Andrew Dunstan wrote:
> On 2022-07-24 Su 15:10, Noah Misch wrote:
> > On Sun, Jul 24, 2022 at 12:55:56PM -0400, Tom Lane wrote:
> >> Andrew Dunstan  writes:
> >>> fairywren (msys2 animal) is currently hung in the pg_basebackup tests.
> >>> Here's the bottom of the regress log. I don't have further info as yet,
> >>> but can dig is someone has a suggestion.
> >> Hm, what's with the "Use of uninitialized value" warnings?
> > The warnings are sequelae of:
> >
> >>> [13:11:33.593](0.000s) not ok 97 - one tablespace tar was created
> > >From that, it follows that $tblspc_tars[0] is undef at:
> >
> > PostgreSQL::Test::Utils::system_or_bail($tar, 'xf', $tblspc_tars[0],
> > '-C', $repTsDir);
> >
> >>> # Running: C:/Windows/System32/tar xf  -C
> >>> C:\tools\nmsys64\home\pgrunner\bf\root\REL_15_STABLE\pgsql.build\src\bin\pg_basebackup\tmp_check\tmp_test_jVCb/tblspc1replica
> 
> Perhaps we should have a guard in system_or_bail() and/or system_log()
> which bails if some element of @_ is undefined.

That would be reasonable.  Also reasonable to impose some long timeout, maybe
10x or 100x PG_TEST_TIMEOUT_DEFAULT, on calls to those functions.

Re: [PATCH v1] strengthen backup history filename check

2022-07-25 Thread Junwang Zhao

On Mon, Jul 25, 2022 at 7:39 PM Bharath Rupireddy
 wrote:
>
> On Mon, Jul 25, 2022 at 5:01 PM Junwang Zhao  wrote:
> >
> > This patch makes the backup history filename check more tight.
>
> Can you please elaborate a bit on the issue with existing
> IsBackupHistoryFileName(), if there's any?
>

There are two call of this function, `CleanupBackupHistory` and
`SetWALFileNameForCleanup`, there
seems no issue of the existing IsBackupHistoryFileName() since the
creation of the backup history file
will make sure it is of format "%08X%08X%08X.%08X.backup".

The patch just makes `IsBackupHistoryFileName()` more match to
`BackupHistoryFileName()`, thus
more easier to understand.

> Also, the patch does have hard coded numbers [1] which isn't good from
> a readability perspective, adding macros and/or comments would help
> here.

I'm not sure using macros is a good idea here, cause I noticed
`IsTLHistoryFileName()` also uses
some hard code numbers [1].

[1]
static inline bool
IsTLHistoryFileName(const char *fname)
{
return (strlen(fname) == 8 + strlen(".history") &&
strspn(fname, "0123456789ABCDEF") == 8 &&
strcmp(fname + 8, ".history") == 0);
}

>
> [1]
>  static inline bool
>  IsBackupHistoryFileName(const char *fname)
>  {
> - return (strlen(fname) > XLOG_FNAME_LEN &&
> + return (strlen(fname) == XLOG_FNAME_LEN + 9 + strlen(".backup") &&
>   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN &&
> - strcmp(fname + strlen(fname) - strlen(".backup"), ".backup") == 0);
> + strspn(fname + XLOG_FNAME_LEN + 1, "0123456789ABCDEF") == 8 &&
> + strcmp(fname + XLOG_FNAME_LEN + 9, ".backup") == 0);
>  }
>
> Regards,
> Bharath Rupireddy.

-- 
Regards
Junwang Zhao

Re: Cleaning up historical portability baggage

2022-07-25 Thread Andrew Dunstan



On 2022-07-25 Mo 10:35, Thomas Munro wrote:
> On Sun, Jul 24, 2022 at 12:23 PM Tom Lane  wrote:
>> After looking through these briefly, I'm pretty concerned about
>> whether this won't break our Cygwin build in significant ways.
>> For example, lorikeet reports "HAVE_SETSID 1", a condition that
>> you want to replace with !WIN32.  The question here is whether
>> or not WIN32 is defined in a Cygwin build.  I see some places
>> in our code that believe it is not, but others that believe that
>> it is --- and the former ones are mostly like
>> #if defined(__CYGWIN__) || defined(WIN32)
>> which means they wouldn't actually fail if they are wrong about that.
> I spent a large chunk of today figuring out how to build PostgreSQL
> under Cygwin/GCC on CI.  My method for answering this question was to
> put the following on the end of 192 .c files that contain the pattern
> /#if.*WIN32/:
>
> +
> +#if defined(WIN32) && defined(__CYGWIN__)
> +#pragma message "contradiction"
> +#endif
>
> Only one of them printed that message: dirmod.c.  The reason is that
> it goes out of its way to include Windows headers:
>
> #if defined(WIN32) || defined(__CYGWIN__)
> #ifndef __CYGWIN__
> #include 
> #else
> #include 
> #include 
> #endif
> #endif
>
> The chain  ->  ->  leads to WIN32 here:
>
> https://github.com/mirror/mingw-w64/blob/master/mingw-w64-headers/include/minwindef.h#L15
>
> I'm left wondering if we should de-confuse matters by ripping out all
> the checks and comments that assume that this problem is more
> widespread, and then stick a big notice about it in dirmod.c, to
> contain this Jekyll/Hide situation safely inside about 8 feet of
> concrete.


Clearly it's something we've been aware of before, port.h has:


* Note: Some CYGWIN includes might #define WIN32.
 */
#if defined(WIN32) && !defined(__CYGWIN__)
#include "port/win32_port.h"
#endif


I can test any patches you want on lorikeet.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: fairywren hung in pg_basebackup tests

2022-07-25 Thread Tom Lane

I wrote:
> Right, so the "glob" failed to find anything.  Seeing that this test
> is new as of 534472375, which postdates fairywren's last successful
> run, I'd guess that the "glob" needs adjustment for msys path names.

Hmm ... an alternative theory is that the test is fine, and what
it's telling us is that get_dirent_type() is still wrong on msys.
Would that end in this symptom?

regards, tom lane

Re: fairywren hung in pg_basebackup tests

2022-07-25 Thread Tom Lane

Andrew Dunstan  writes:
> On 2022-07-24 Su 15:10, Noah Misch wrote:
>> On Sun, Jul 24, 2022 at 12:55:56PM -0400, Tom Lane wrote:
>>> Andrew Dunstan  writes:
 fairywren (msys2 animal) is currently hung in the pg_basebackup tests.
 Here's the bottom of the regress log. I don't have further info as yet,
 but can dig is someone has a suggestion.

>>> Hm, what's with the "Use of uninitialized value" warnings?

>> The warnings are sequelae of:
>>> [13:11:33.593](0.000s) not ok 97 - one tablespace tar was created
>> From that, it follows that $tblspc_tars[0] is undef at:
>> PostgreSQL::Test::Utils::system_or_bail($tar, 'xf', $tblspc_tars[0],
>> '-C', $repTsDir);

Right, so the "glob" failed to find anything.  Seeing that this test
is new as of 534472375, which postdates fairywren's last successful
run, I'd guess that the "glob" needs adjustment for msys path names.

regards, tom lane

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread vignesh C

On Mon, Jul 25, 2022 at 2:24 PM vignesh C  wrote:
>
> On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz  
> wrote:
> >
> > On 7/22/22 12:47 AM, Amit Kapila wrote:
> > > On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz  
> > > wrote:
> >
> > >> 1. I'm concerned by calling this "Bidirectional replication" in the docs
> > >> that we are overstating the current capabilities. I think this is
> > >> accentuated int he opening paragraph:
> > >>
> > >> ==snip==
> > >>Bidirectional replication is useful for creating a multi-master 
> > >> database
> > >>environment for replicating read/write operations performed by any of 
> > >> the
> > >>member nodes.
> > >> ==snip==
> > >>
> > >> For one, we're not replicating reads, we're replicating writes. Amongst
> > >> the writes, at this point we're only replicating DML. A reader could
> > >> think that deploying can work for a full bidirectional solution.
> > >>
> > >> (Even if we're aspirationally calling this section "Bidirectional
> > >> replication", that does make it sound like we're limited to two nodes,
> > >> when we can support more than two).
> > >>
> > >
> > > Right, I think the system can support N-Way replication.
> >
> > I did some more testing of the feature, i.e. doing 3-node and 4-node
> > tests. While logical replication today can handle replicating between
> > multiple nodes (N-way), the "origin = none" does require setting up
> > subscribers between each of the nodes.
> >
> > For example, if I have 4 nodes A, B, C, D and I want to replicate the
> > same table between all of them, I need to set up subscribers between all
> > of them (A<=>B, A<=>C, A<=>D, B<=>C, B<=>D, C<=>D). However, each node
> > can replicate between each other in a way that's convenient (vs. having
> > to do something funky with partitions) so this is still a big step forward.
> >
> > This is a long way of saying that I do think it's fair to say we support
> > "N-way" replication so long as you are set up in a mesh (vs. a ring,
> > e.g. A=>B=>C=>D=>A).
> >
> > > Among the above "Replicating changes between primaries" sounds good to
> > > me or simply "Replication between primaries". As this is a sub-section
> > > on the Logical Replication page, I feel it is okay to not use Logical
> > > in the title.
> >
> > Agreed, I think that's fine.
> >
> > >> At a minimum, I think we should reference the documentation we have in
> > >> the logical replication section on conflicts. We may also want to advise
> > >> that a user is responsible for designing their schemas in a way to
> > >> minimize the risk of conflicts.
> > >>
> > >
> > > This sounds reasonable to me.
> > >
> > > One more point about docs, it appears to be added as the last
> > > sub-section on the Logical Replication page. Is there a reason for
> > > doing so? I feel this should be third sub-section after describing
> > > Publication and Subscription.
> >
> > When I first reviewed, I had not built the docs. Did so on this pass.
> >
> > I agree with the positioning argument, i.e. it should go after
> > "Subscription" in the table of contents -- but it makes me question a
> > couple of things:
> >
> > 1. The general ordering of the docs
> > 2. How we describe that section (more on that in a sec)
> > 3. If "row filters" should be part of "subscription" instead of its own
> > section.
> >
> > If you look at the current order, "Quick setup" is the last section; one
> > would think the "quick" portion goes first :) Given a lot of this is for
> > the current docs, I may start a separate discussion on -docs for this part.
> >
> > For the time being, I agree it should be moved to the section after
> > "Subscription".
> >
> > I think what this section describes is "Configuring Replication Between
> > Nodes" as it covers a few different scenarios.
> >
> > I do think we need to iterate on these docs -- the examples with the
> > commands are generally OK and easy to follow, but a few things I noticed:
> >
> > 1. The general description of the section needs work. We may want to
> > refine the description of the use cases, and in the warning, link to
> > instructions on how to take backups.
> > 2. We put the "case not supported" in the middle, not at the end.
> > 3. The "generic steps for adding a new node..." section uses a
> > convention for steps that is not found in the docs. We also don't
> > provide an example for this section, and this is the most complicated
> > scenario to set up.
> >
> > I may be able to propose some suggestions in a few days.
> >
> > > BTW, do you have any opinion on the idea of the first remaining patch
> > > where we accomplish two things: a) Checks and throws an error if
> > > 'copy_data = on' and 'origin = none' but the publication tables were
> > > also replicated from other publishers. b) Adds 'force' value for
> > > copy_data parameter to allow copying in such a case. The primary
> > > reason for this patch is to avoid loops or duplicate data in the
> > > initial phase. We can't skip copying base

[Patch] Fix bounds check in trim_array()

2022-07-25 Thread Martin Kalcher


Hi,

while working on something else i encountered a bug in the trim_array() 
function. The bounds check fails for empty arrays without any 
dimensions. It reads the size of the non existing first dimension to 
determine the arrays length.


  select trim_array('{}'::int[], 10);
  
   {}

  select trim_array('{}'::int[], 100);
  ERROR:  number of elements to trim must be between 0 and 64

The attached patch fixes that check.

MartinFrom b6173a8f8f94cddd5347db482b8e4480c0e546e7 Mon Sep 17 00:00:00 2001
From: Martin Kalcher 
Date: Mon, 25 Jul 2022 16:26:14 +0200
Subject: [PATCH] Fix bounds check in trim_array()

The bounds check in trim_array() failed for empty arrays without any
dimension. It read the size of the first dimension without checking its
existence.
---
 src/backend/utils/adt/arrayfuncs.c   | 2 +-
 src/test/regress/expected/arrays.out | 2 ++
 src/test/regress/sql/arrays.sql  | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index fb167f226a..0342ebf4fd 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -6839,7 +6839,7 @@ trim_array(PG_FUNCTION_ARGS)
 {
 	ArrayType  *v = PG_GETARG_ARRAYTYPE_P(0);
 	int			n = PG_GETARG_INT32(1);
-	int			array_length = ARR_DIMS(v)[0];
+	int			array_length = (ARR_NDIM(v) > 0) ? ARR_DIMS(v)[0] : 0;
 	int16		elmlen;
 	bool		elmbyval;
 	char		elmalign;
diff --git a/src/test/regress/expected/arrays.out b/src/test/regress/expected/arrays.out
index ce6f3a65f9..97920f38c2 100644
--- a/src/test/regress/expected/arrays.out
+++ b/src/test/regress/expected/arrays.out
@@ -2445,3 +2445,5 @@ SELECT trim_array(ARRAY[1, 2, 3], -1); -- fail
 ERROR:  number of elements to trim must be between 0 and 3
 SELECT trim_array(ARRAY[1, 2, 3], 10); -- fail
 ERROR:  number of elements to trim must be between 0 and 3
+SELECT trim_array(ARRAY[]::int[], 1); -- fail
+ERROR:  number of elements to trim must be between 0 and 0
diff --git a/src/test/regress/sql/arrays.sql b/src/test/regress/sql/arrays.sql
index f774faf856..791af5c0ce 100644
--- a/src/test/regress/sql/arrays.sql
+++ b/src/test/regress/sql/arrays.sql
@@ -754,3 +754,4 @@ FROM
 
 SELECT trim_array(ARRAY[1, 2, 3], -1); -- fail
 SELECT trim_array(ARRAY[1, 2, 3], 10); -- fail
+SELECT trim_array(ARRAY[]::int[], 1); -- fail
-- 
2.37.1

Re: Cleaning up historical portability baggage

2022-07-25 Thread Thomas Munro

On Sun, Jul 24, 2022 at 12:23 PM Tom Lane  wrote:
> After looking through these briefly, I'm pretty concerned about
> whether this won't break our Cygwin build in significant ways.
> For example, lorikeet reports "HAVE_SETSID 1", a condition that
> you want to replace with !WIN32.  The question here is whether
> or not WIN32 is defined in a Cygwin build.  I see some places
> in our code that believe it is not, but others that believe that
> it is --- and the former ones are mostly like
> #if defined(__CYGWIN__) || defined(WIN32)
> which means they wouldn't actually fail if they are wrong about that.

I spent a large chunk of today figuring out how to build PostgreSQL
under Cygwin/GCC on CI.  My method for answering this question was to
put the following on the end of 192 .c files that contain the pattern
/#if.*WIN32/:

+
+#if defined(WIN32) && defined(__CYGWIN__)
+#pragma message "contradiction"
+#endif

Only one of them printed that message: dirmod.c.  The reason is that
it goes out of its way to include Windows headers:

#if defined(WIN32) || defined(__CYGWIN__)
#ifndef __CYGWIN__
#include 
#else
#include 
#include 
#endif
#endif

The chain  ->  ->  leads to WIN32 here:

https://github.com/mirror/mingw-w64/blob/master/mingw-w64-headers/include/minwindef.h#L15

I'm left wondering if we should de-confuse matters by ripping out all
the checks and comments that assume that this problem is more
widespread, and then stick a big notice about it in dirmod.c, to
contain this Jekyll/Hide situation safely inside about 8 feet of
concrete.

I'll respond to your other complaints with new patches tomorrow.

Re: Unprivileged user can induce crash by using an SUSET param in PGOPTIONS

2022-07-25 Thread Tom Lane

Nathan Bossart  writes:
> Given all this, I think I'm inclined for the new argument.

Pushed like that then (after a bit more fooling with the comments).

I haven't done anything about a test case.  We can't rely on plperl
getting built, and even if we could, it doesn't have any TAP-style
tests so it'd be hard to get it to test this scenario.  However,
I do see that we're not testing session_preload_libraries anywhere,
which seems bad.  I wonder if it'd be a good idea to convert
auto_explain's TAP test to load auto_explain via session_preload_libraries
instead of shared_preload_libraries, and then pass in the settings for
each test via PGOPTIONS instead of constantly rewriting postgresql.conf.

regards, tom lane

Re: Proposal to provide the facility to set binary format output for specific OID's per session

2022-07-25 Thread Jack Christensen

On Mon, Jul 25, 2022 at 4:57 AM Dave Cramer  wrote:

>
> Dave Cramer
>
>
> On Sun, 24 Jul 2022 at 23:02, Kyotaro Horiguchi 
> wrote:
>
>> At Fri, 22 Jul 2022 11:00:18 -0400, Dave Cramer 
>> wrote in
>> > As a proof of concept I provide the attached patch which implements the
>> > ability to specify which oids will be returned in binary format per
>> > session.
>> ...
>> > Both the JDBC driver and the go driver can exploit this change with no
>> > changes. I haven't confirmed if other drivers would work without
>> changes.
>>
>> I'm not sure about the needs of that, but binary exchange format is
>> not the one that can be turned on ignoring the peer's capability.
>
> I'm not sure what this means. The client is specifying which types it
> wants in binary format.
>
>>   If
>> JDBC driver wants some types be sent in binary format, it seems to be
>> able to be specified in bind message.
>>
> To be clear it's not just the JDBC client; the original idea came from the
> author of go driver.
> And yes you can specify it in the bind message but you have to specify it
> in *every* bind message which pretty much negates any advantage you might
> get out of binary format due to the extra round trip.
>
> Regards,
> Dave
>
>>
>> regards.
>>
>> --
>> Kyotaro Horiguchi
>> NTT Open Source Software Center
>>
>
The advantage is to be able to use the binary format with only a single
network round trip in cases where prepared statements are not possible.
e.g. when using PgBouncer. Using the simple protocol with this patch lets
users of pgx (the Go driver mentioned above) and PgBouncer use the binary
format. The performance gains can be significant especially with types such
as timestamptz that are very slow to parse.

As far as only sending binary types that the client can understand, the
client driver would call `set format_binary` at the beginning of the
session.

Jack Christensen

Re: Perform streaming logical transactions by background workers and parallel apply

2022-07-25 Thread Amit Kapila

On Fri, Jul 22, 2022 at 8:26 AM wangw.f...@fujitsu.com
 wrote:
>
> On Tues, Jul 19, 2022 at 10:29 AM I wrote:
> > Attach the news patches.
>
> Not able to apply patches cleanly because the change in HEAD (366283961a).
> Therefore, I rebased the patch based on the changes in HEAD.
>
> Attach the new patches.
>

Few comments on 0001:
==
1.
-   substream bool
+   substream char
   
   
-   If true, the subscription will allow streaming of in-progress
-   transactions
+   Controls how to handle the streaming of in-progress transactions:
+   f = disallow streaming of in-progress transactions,
+   t = spill the changes of in-progress transactions to
+   disk and apply at once after the transaction is committed on the
+   publisher,
+   p = apply changes directly using a background worker

Shouldn't the description of 'p' be something like: apply changes
directly using a background worker, if available, otherwise, it
behaves the same as 't'

2.
Note that if an error happens when
+  applying changes in a background worker, the finish LSN of the
+  remote transaction might not be reported in the server log.

Is there any case where finish LSN can be reported when applying via
background worker, if not, then we should use 'won't' instead of
'might not'?

3.
+#define PG_LOGICAL_APPLY_SHM_MAGIC 0x79fb2447 // TODO Consider change

It is better to change this as the same magic number is used by
PG_TEST_SHM_MQ_MAGIC

4.
+ /* Ignore statistics fields that have been updated. */
+ s.cursor += IGNORE_SIZE_IN_MESSAGE;

Can we change the comment to: "Ignore statistics fields that have been
updated by the main apply worker."? Will it be better to name the
define as "SIZE_STATS_MESSAGE"?

5.
+/* Apply Background Worker main loop */
+static void
+LogicalApplyBgwLoop(shm_mq_handle *mqh, volatile ApplyBgworkerShared *shared)
{
...
...

+ apply_dispatch(&s);
+
+ if (ConfigReloadPending)
+ {
+ ConfigReloadPending = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ }
+
+ MemoryContextSwitchTo(oldctx);
+ MemoryContextReset(ApplyMessageContext);

We should not process the config file under ApplyMessageContext. You
should switch context before processing the config file. See other
similar usages in the code.

6.
+/* Apply Background Worker main loop */
+static void
+LogicalApplyBgwLoop(shm_mq_handle *mqh, volatile ApplyBgworkerShared *shared)
{
...
...
+ MemoryContextSwitchTo(oldctx);
+ MemoryContextReset(ApplyMessageContext);
+ }
+
+ MemoryContextSwitchTo(TopMemoryContext);
+ MemoryContextReset(ApplyContext);
...
}

I don't see the need to reset ApplyContext here as we don't do
anything in that context here.

-- 
With Regards,
Amit Kapila.

Re: fairywren hung in pg_basebackup tests

2022-07-25 Thread Andrew Dunstan



On 2022-07-24 Su 15:10, Noah Misch wrote:
> On Sun, Jul 24, 2022 at 12:55:56PM -0400, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> fairywren (msys2 animal) is currently hung in the pg_basebackup tests.
>>> Here's the bottom of the regress log. I don't have further info as yet,
>>> but can dig is someone has a suggestion.
>> Hm, what's with the "Use of uninitialized value" warnings?
> The warnings are sequelae of:
>
>>> [13:11:33.593](0.000s) not ok 97 - one tablespace tar was created
> >From that, it follows that $tblspc_tars[0] is undef at:
>
>   PostgreSQL::Test::Utils::system_or_bail($tar, 'xf', $tblspc_tars[0],
>   '-C', $repTsDir);
>
>>> # Running: C:/Windows/System32/tar xf  -C
>>> C:\tools\nmsys64\home\pgrunner\bf\root\REL_15_STABLE\pgsql.build\src\bin\pg_basebackup\tmp_check\tmp_test_jVCb/tblspc1replica



Perhaps we should have a guard in system_or_bail() and/or system_log()
which bails if some element of @_ is undefined.




> I can confirm that Windows tar hangs when invoked that way.  For preventing
> the hang, the test file could die() or skip the tar-program-using section
> after failing 'one tablespace tar was created'.  That still leaves a question
> about why pg_basebackup didn't make the tablespace tar file.  I would check
> 010_pg_basebackup_main.log for clues about that.



The same thing has happened again on HEAD. I don't see anything in that
file that gives any clue.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: log_line_prefix: make it possible to add the search_path

2022-07-25 Thread Pierre

On Monday, July 25, 2022 11:52:41 AM CEST Alvaro Herrera wrote:
> On 2022-Jul-25, Pierre Ducroquet wrote:
> > This is great for performance, but several tools are lacking around this
> > usecase by not showing the schema, one of them being log_line_prefix.
> > 
> > The attached patch implements this, using %S. I've not written the
> > documentation yet, since I'm not sure this would be acceptable as is, or
> > if a more "generic" method should be used (I thought of %{name} to fetch
> > an arbitrary GUC, but did not implement due to a lack of need for that
> > feature)
> It seems that this would be too noisy to be truly usable.  What if we
> emitted a log line when the variable changed, and the value that's in
> use when the connection starts?

Then the log files would be filled by these messages, only to be able to make 
use of the few slow queries that are logged during the day. Or it would need 
to be a log before each slow query. I'm not sure how well it would work.

Re: COPY FROM FORMAT CSV FORCE_NULL(*) ?

2022-07-25 Thread Andrew Dunstan

On 2022-07-25 Mo 00:18, jian he wrote:
> Hi, there.
>
> copy force null git commit
> 
> didn't attach a discussion link. So I don't know if it's already been
> discussed.
>
> Current seem you cannot do
>     COPY forcetest  FROM STDIN WITH (FORMAT csv,  FORCE_NULL(*));
>
> can we have FORCE_NULL(*)? Since We already have FORCE_QUOTE(*). 
>

We only started adding discussion links in later years. Here's a link to
the original discussion.

Offhand I don't see why we shouldn't have this. Someone interested
enough would need to submit a patch.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

2022-07-25 Thread Bharath Rupireddy

On Thu, May 5, 2022 at 2:57 PM Bharath Rupireddy
 wrote:
>
> On Fri, Apr 29, 2022 at 4:02 PM Alvaro Herrera  
> wrote:
> >
> > If I read the code right, this patch emits logs when
> > pg_logical_slot_get_changes and pg_replication_slot_advance SQL
> > functions are called.  Is this desirable/useful, for the use case you
> > stated at start of thread?  I think it is most likely pointless.  If you
> > get rid of those, then the only acquisitions that would log messages are
> > those in StartReplication and StartLogicalReplication.  So I wonder if
> > it would be better to leave the API of ReplicationSlotAcquire() alone,
> > and instead make StartReplication and StartLogicalReplication
> > responsible for those messages.
> >
> > I didn't look at the release-side messages you're adding, but I suppose
> > it should be symmetrical.

Here's the v6 patch, a much simpler one - no changes to any of the
existing function APIs. Please see the sample logs at [1]. There's a
bit of duplicate code in the v6 patch, if the overall approach looks
okay, I can remove that too in the next version of the patch.

Thoughts?

[1]
2022-07-25 12:30:14.847 UTC [152873] LOG:  acquired physical
replication slot "foo"
2022-07-25 12:30:20.878 UTC [152873] LOG:  released physical
replication slot "foo"

2022-07-25 12:49:18.023 UTC [168738] LOG:  acquired logical
replication slot "bar"
2022-07-25 12:49:28.105 UTC [168738] LOG:  released logical
replication slot "bar"

Regards,
Bharath Rupireddy.
From c57cfd8dd27eaf9e0c17c47769cb24b8ab961116 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 25 Jul 2022 12:51:26 +
Subject: [PATCH v6] Add LOG messages when replication slots become active and
 inactive

These logs will be extremely useful on production servers to debug
and analyze inactive replication slot issues.
---
 doc/src/sgml/config.sgml|  7 +++---
 src/backend/replication/slot.c  | 21 
 src/backend/replication/walsender.c | 39 +
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e2d728e0c4..8bf3c2ab53 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7461,9 +7461,10 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
   
   

-Causes each replication command to be logged in the server log.
-See  for more information about
-replication command. The default value is off.
+Causes each replication command and related activity to be logged in
+the server log. See  for more
+information about replication command. The default value is
+off.
 Only superusers and users with the appropriate SET
 privilege can change this setting.

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 850b74936f..43563bec7e 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -179,8 +179,29 @@ ReplicationSlotShmemExit(int code, Datum arg)
 {
 	/* Make sure active replication slots are released */
 	if (MyReplicationSlot != NULL)
+	{
+		bool is_physical;
+		char slotname[NAMEDATALEN] = {0};
+
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+		strcpy(slotname, NameStr(MyReplicationSlot->data.name));
+
 		ReplicationSlotRelease();
 
+		if (is_physical)
+		{
+			ereport(log_replication_commands ? LOG : DEBUG3,
+	(errmsg("released physical replication slot \"%s\"",
+			slotname)));
+		}
+		else
+		{
+			ereport(log_replication_commands ? LOG : DEBUG3,
+	(errmsg("released logical replication slot \"%s\"",
+			slotname)));
+		}
+	}
+
 	/* Also cleanup all the temporary slots. */
 	ReplicationSlotCleanup();
 }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3a86786cc3..ebd34316e2 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -322,8 +322,29 @@ WalSndErrorCleanup(void)
 		wal_segment_close(xlogreader);
 
 	if (MyReplicationSlot != NULL)
+	{
+		bool is_physical;
+		char slotname[NAMEDATALEN] = {0};
+
+		is_physical = SlotIsPhysical(MyReplicationSlot);
+		strcpy(slotname, NameStr(MyReplicationSlot->data.name));
+
 		ReplicationSlotRelease();
 
+		if (is_physical)
+		{
+			ereport(log_replication_commands ? LOG : DEBUG3,
+	(errmsg("released physical replication slot \"%s\"",
+			slotname)));
+		}
+		else
+		{
+			ereport(log_replication_commands ? LOG : DEBUG3,
+	(errmsg("released logical replication slot \"%s\"",
+			slotname)));
+		}
+	}
+
 	ReplicationSlotCleanup();
 
 	replication_active = false;
@@ -704,6 +725,10 @@ StartReplication(StartReplicationCmd *cmd)
 	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 	 errmsg("cannot use a logical replication slot for physical replication")));
 
+		ereport(log_replication_commands ? LOG : DEBUG3,
+(errmsg("acquired physical replication slot \"%s\"",
+		cmd->slotname)));
+

Re: Custom tuplesorts for extensions

2022-07-25 Thread Pavel Borisov

>
> Thank you for caching this.  Fixed in the revision attached.
>
> Testing subsets of patchsets in cfbot looks like a good idea to me.
> However, I'm not sure if we always require subsets to be consistent.
>

Hi, hackers!

I've looked through a new v6 of a patchset and find it ok. When applied
0001+0002 only I don't see warnings anymore. Build and tests are successful
and Cfbot also looks good. I've marked the patch as RfC.

-- 
Best regards,
Pavel Borisov

Re: Typo in misc_sanity.sql?

2022-07-25 Thread Japin Li



On Mon, 25 Jul 2022 at 19:39, Tom Lane  wrote:
> Japin Li  writes:
>> I found the misc_sanity has a SQL to check system catalogs that
>> do not have primary keys, however, in current exceptions it says
>> pg_depend, pg_shdepend don't have a unique key.
>
> As indeed they do not:
>
> regression=# \d pg_depend
>   Table "pg_catalog.pg_depend"
>Column|  Type   | Collation | Nullable | Default 
> -+-+---+--+-
>  classid | oid |   | not null | 
>  objid   | oid |   | not null | 
>  objsubid| integer |   | not null | 
>  refclassid  | oid |   | not null | 
>  refobjid| oid |   | not null | 
>  refobjsubid | integer |   | not null | 
>  deptype | "char"  |   | not null | 
> Indexes:
> "pg_depend_depender_index" btree (classid, objid, objsubid)
> "pg_depend_reference_index" btree (refclassid, refobjid, refobjsubid)
>
> regression=# \d pg_shdepend
> Table "pg_catalog.pg_shdepend"
>Column   |  Type   | Collation | Nullable | Default 
> +-+---+--+-
>  dbid   | oid |   | not null | 
>  classid| oid |   | not null | 
>  objid  | oid |   | not null | 
>  objsubid   | integer |   | not null | 
>  refclassid | oid |   | not null | 
>  refobjid   | oid |   | not null | 
>  deptype| "char"  |   | not null | 
> Indexes:
> "pg_shdepend_depender_index" btree (dbid, classid, objid, objsubid), 
> tablespace "pg_global"
> "pg_shdepend_reference_index" btree (refclassid, refobjid), tablespace 
> "pg_global"
> Tablespace: "pg_global"
>

Yeah, they do not have unique keys, however, here we check primary keys. So,
IMO, the description exceptions should say they do not have primary keys,
rather than do not have unique keys.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: [PATCH v1] strengthen backup history filename check

2022-07-25 Thread Bharath Rupireddy

On Mon, Jul 25, 2022 at 5:01 PM Junwang Zhao  wrote:
>
> This patch makes the backup history filename check more tight.

Can you please elaborate a bit on the issue with existing
IsBackupHistoryFileName(), if there's any?

Also, the patch does have hard coded numbers [1] which isn't good from
a readability perspective, adding macros and/or comments would help
here.

[1]
 static inline bool
 IsBackupHistoryFileName(const char *fname)
 {
- return (strlen(fname) > XLOG_FNAME_LEN &&
+ return (strlen(fname) == XLOG_FNAME_LEN + 9 + strlen(".backup") &&
  strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN &&
- strcmp(fname + strlen(fname) - strlen(".backup"), ".backup") == 0);
+ strspn(fname + XLOG_FNAME_LEN + 1, "0123456789ABCDEF") == 8 &&
+ strcmp(fname + XLOG_FNAME_LEN + 9, ".backup") == 0);
 }

Regards,
Bharath Rupireddy.

Re: Typo in misc_sanity.sql?

2022-07-25 Thread Tom Lane

Japin Li  writes:
> I found the misc_sanity has a SQL to check system catalogs that
> do not have primary keys, however, in current exceptions it says
> pg_depend, pg_shdepend don't have a unique key.

As indeed they do not:

regression=# \d pg_depend
  Table "pg_catalog.pg_depend"
   Column|  Type   | Collation | Nullable | Default 
-+-+---+--+-
 classid | oid |   | not null | 
 objid   | oid |   | not null | 
 objsubid| integer |   | not null | 
 refclassid  | oid |   | not null | 
 refobjid| oid |   | not null | 
 refobjsubid | integer |   | not null | 
 deptype | "char"  |   | not null | 
Indexes:
"pg_depend_depender_index" btree (classid, objid, objsubid)
"pg_depend_reference_index" btree (refclassid, refobjid, refobjsubid)

regression=# \d pg_shdepend
Table "pg_catalog.pg_shdepend"
   Column   |  Type   | Collation | Nullable | Default 
+-+---+--+-
 dbid   | oid |   | not null | 
 classid| oid |   | not null | 
 objid  | oid |   | not null | 
 objsubid   | integer |   | not null | 
 refclassid | oid |   | not null | 
 refobjid   | oid |   | not null | 
 deptype| "char"  |   | not null | 
Indexes:
"pg_shdepend_depender_index" btree (dbid, classid, objid, objsubid), 
tablespace "pg_global"
"pg_shdepend_reference_index" btree (refclassid, refobjid), tablespace 
"pg_global"
Tablespace: "pg_global"

Your proposed wording seems to give strictly less information,
so I do not see why it's an improvement.

regards, tom lane

[PATCH v1] strengthen backup history filename check

2022-07-25 Thread Junwang Zhao

This patch makes the backup history filename check more tight.

-- 
Regards
Junwang Zhao


0001-strengthen-backup-history-filename-check.patch
Description: Binary data

Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2022-07-25 Thread Matthias van de Meent

On Wed, 13 Jul 2022 at 07:54, Michael Paquier  wrote:
>
> On Mon, Jul 11, 2022 at 02:26:46PM +0200, Matthias van de Meent wrote:
> > Thanks for reviewing.
>
> I think that v6 is over-engineered because there should be no need to
> add a check in xlogreader.c as long as the origin of the problem is
> blocked, no?  And the origin here is when the record is assembled.  At
> least this is the cleanest solution for HEAD, but not in the
> back-branches if we'd care about doing something with records already
> generated, and I am not sure that we need to care about other things
> than HEAD, TBH.

I would prefer it if we would fix the "cannot catch up to primary
because of oversized WAL record" issue in backbranches too. Rather
than failing to recover after failure or breaking replication streams,
I'd rather be unable to write the singular offending WAL record and
break up to one transaction.

> So it seems to me that there is no need to create a
> XLogRecMaxLength which is close to a duplicate of
> DecodeXLogRecordRequiredSpace().
>
> @@ -519,7 +549,7 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
>XLogRecPtr *fpw_lsn, int *num_fpi, bool *topxid_included)
>
>  {
> XLogRecData *rdt;
> -   uint32  total_len = 0;
> +   uint64  total_len = 0;
> This has no need to change.
>
> My suggestion from upthread was close to what you proposed, but I had
> in mind something simpler, as of:
>
> +   /*
> +* Ensure that xlogreader.c can read the record.
> +*/
> +   if (unlikely(!AllocSizeIsValid(DecodeXLogRecordRequiredSpace(total_len
> +   elog(ERROR, "too much WAL data");

Huh, yeah, I hadn't thought of that, but that's much simpler indeed.

> This would be the amount of data allocated by the WAL reader when it
> is possible to allocate an oversized record, related to the business
> of the circular buffer depending on if the read is blocking or not.

Yes, I see your point.

> Among the two problems to solve at hand, the parts where the APIs are
> changed and made more robust with unsigned types and where block data
> is not overflowed with its 16-byte limit are committable, so I'd like
> to do that first (still need to check its performance with some micro
> benchmark on XLogRegisterBufData()).

> The second part to block the
> creation of the assembled record is simpler, now
> DecodeXLogRecordRequiredSpace() would make the path a bit hotter,
> though we could inline it in the worst case?

I think that would be better for performance, yes.
DecodeXLogRecordRequiredSpace will already be optimized to just a
single addition by any of `-O[123]`, so keeping this indirection is
quite expensive (relative to the operation being performed).

As for your patch patch:

> +XLogRegisterData(char *data, uint32 len)
> {
> XLogRecData *rdata;
>
> Assert(begininsert_called);
>
> -if (num_rdatas >= max_rdatas)
> +if (unlikely(num_rdatas >= max_rdatas))
> elog(ERROR, "too much WAL data");
> rdata = &rdatas[num_rdatas++];

XLogRegisterData is designed to be called multiple times for each
record, and this allows the user of the API to overflow the internal
mainrdata_len field if we don't check that the field does not exceed
the maximum record length (or overflow the 32-bit field). As such, I'd
still want a len-check in that function.

I'll send an updated patch by tomorrow.

- Matthias

Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

2022-07-25 Thread Bharath Rupireddy

On Mon, Apr 25, 2022 at 5:17 PM Bharath Rupireddy
 wrote:
>
> On Mon, Apr 25, 2022 at 6:38 AM Michael Paquier  wrote:
> >
> > On Fri, Apr 22, 2022 at 07:17:37PM +0530, Bharath Rupireddy wrote:
> > > Right. We find enough disk space and go to write and suddenly the
> > > write operations fail for some reason or the VM crashes because of a
> > > reason other than disk space. I think the foolproof solution is to
> > > figure out the available disk space before prepadding or compressing
> > > and also use the
> > > write-first-to-temp-file-and-then-rename-it-to-original-file as
> > > proposed in the earlier patches in this thread.
> >
> > Yes, what would count here is only the amount of free space in a
> > partition.  The total amount of space available becomes handy once you
> > begin introducing things like percentage-based quota policies for the
> > disk when archiving.  The free amount of space could be used to define
> > a policy based on the maximum number of bytes you need to leave
> > around, as well, but this is not perfect science as this depends of
> > what FSes decide to do underneath.  There are a couple of designs
> > possible here.  When I had to deal with my upthread case I have chosen
> > one as I had no need to worry only about Linux, it does not mean that
> > this is the best choice that would fit with the long-term community
> > picture.  This comes down to how much pg_receivewal should handle
> > automatically, and how it should handle it.
>
> Thanks. I'm not sure why we are just thinking of crashes due to
> out-of-disk space. Figuring out free disk space before writing a huge
> file (say a WAL file) is a problem in itself to the core postgres as
> well, not just pg_receivewal.
>
> I think we are off-track a bit here. Let me illustrate what's the
> whole problem is and the idea:
>
> If the node/VM on which pg_receivewal runs, goes down/crashes or fails
> during write operation while padding the target WAL file (the .partial
> file) with zeros, the unfilled target WAL file ((let me call this file
> a partially padded .partial file) will be left over and subsequent
> reads/writes to that it will fail with "write-ahead log file \"%s\"
> has %zd bytes, should be 0 or %d" error which requires manual
> intervention to remove it. In a service, this manual intervention is
> what we would like to avoid. Let's not much bother right now for
> compressed file writes (for now at least) as they don't have a
> prepadding phase.
>
> The proposed solution is to make the prepadding atomic - prepad the
> .partial file as .partial.tmp name and after the prepadding
> rename (durably if sync option is chosen for pg_receivewal) to
> .partial. Before prepadding  .partial.tmp, delete the
> .partial.tmp if it exists.
>
> The above problem isn't unique to pg_receivewal alone, pg_basebackup
> too uses CreateWalDirectoryMethod and dir_open_for_write via
> ReceiveXlogStream.
>
> IMHO, pg_receivewal checking for available disk space before writing
> any file should better be discussed separately?

Here's the v3 patch after rebasing.

I just would like to reiterate the issue the patch is trying to solve:
At times (when no space  is left on the device or when the process is
taken down forcibly (VM/container crash)), there can be leftover
uninitialized .partial files (note that padding i.e. filling 16MB WAL
files with all zeros is done in non-compression cases) due to which
pg_receivewal fails to come up after the crash. To address this, the
proposed patch makes the padding 16MB WAL files atomic (first write a
.temp file, pad it and durably rename it to the original .partial
file, ready to be filled with received WAL records). This approach is
similar to what core postgres achieves atomicity while creating new
WAL file (see how XLogFileInitInternal() creates xlogtemp.%d file
first and then how InstallXLogFileSegment() durably renames it to
original WAL file).

Thoughts?

Regards,
Bharath Rupireddy.
From f57d4cc3d3750e9104989a38450a91beea78aafd Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 25 Jul 2022 11:02:38 +
Subject: [PATCH v3] Avoid partially-padded files under
 pg_receivewal/pg_basebackup dirs

pg_receivewal and pg_basebackup can have partially-padded files
in their target directories when a failure occurs while padding.
The failure can be because of no space left on device (which the
user can later increase/clean up to make some free disk space) or
host VM/server crashed.

To address the above problem, first create a temporary file, pad
it and then rename it to the target file.
---
 src/bin/pg_basebackup/walmethods.c | 99 +++---
 1 file changed, 90 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index e90aa0ba37..083861e217 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -82,6 +82,8 @@ typedef struct DirectoryMethodFile
 #define dir_set_error(ms

Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2022-07-25 Thread Heikki Linnakangas


On 13/07/2022 08:54, Michael Paquier wrote:

I think that v6 is over-engineered because there should be no need to
add a check in xlogreader.c as long as the origin of the problem is
blocked, no?  And the origin here is when the record is assembled.  At
least this is the cleanest solution for HEAD, but not in the
back-branches if we'd care about doing something with records already
generated, and I am not sure that we need to care about other things
than HEAD, TBH.  So it seems to me that there is no need to create a
XLogRecMaxLength which is close to a duplicate of
DecodeXLogRecordRequiredSpace().

@@ -519,7 +549,7 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr *fpw_lsn, int *num_fpi, bool *topxid_included)

  {
 XLogRecData *rdt;
-   uint32  total_len = 0;
+   uint64  total_len = 0;
This has no need to change.

My suggestion from upthread was close to what you proposed, but I had
in mind something simpler, as of:

+   /*
+* Ensure that xlogreader.c can read the record.
+*/
+   if (unlikely(!AllocSizeIsValid(DecodeXLogRecordRequiredSpace(total_len
+   elog(ERROR, "too much WAL data");

This would be the amount of data allocated by the WAL reader when it
is possible to allocate an oversized record, related to the business
of the circular buffer depending on if the read is blocking or not.


The way this is written, it would change whenever we add/remove fields 
in DecodedBkpBlock, for example. That's fragile; if you added a field in 
a back-branch, you could accidentally make the new minor version unable 
to read maximum-sized WAL records generated with an older version. I'd 
like the maximum to be more explicit.


How large exactly is the maximum size that this gives? I'd prefer to set 
the limit conservatively to 1020 MB, for example, with a compile-time 
static assertion that 
AllocSizeIsValid(DecodeXLogRecordRequiredSpace(1020 MB)).



Among the two problems to solve at hand, the parts where the APIs are
changed and made more robust with unsigned types and where block data
is not overflowed with its 16-byte limit are committable, so I'd like
to do that first (still need to check its performance with some micro
benchmark on XLogRegisterBufData()).


+1. I'm not excited about adding the "unlikely()" hints, though. We have 
a pg_attribute_cold hint in ereport(), that should be enough.


- Heikki

RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-25 Thread shiy.f...@fujitsu.com

Hi,

I did some performance test for the master branch patch (based on v6 patch) to
see if the bsearch() added by this patch will cause any overhead.

I tested them three times and took the average.

The results are as follows, and attach the bar chart.

case 1
-
No catalog modifying transaction.
Decode 800k pgbench transactions. (8 clients, 100k transactions per client)

master  7.5417
patched 7.4107

case 2
-
There's one catalog modifying transaction.
Decode 100k/500k/1M transactions.

100k500k1M
master  0.0576  0.1491  0.4346
patched 0.0586  0.1500  0.4344

case 3
-
There are 64 catalog modifying transactions.
Decode 100k/500k/1M transactions.

100k500k1M
master  0.0600  0.1666  0.4876
patched 0.0620  0.1653  0.4795

(Because the result of case 3 shows that there is a overhead of about 3% in the
case decoding 100k transactions with 64 catalog modifying transactions, I
tested the next run of 100k xacts with or without catalog modifying
transactions, to see if it affects subsequent decoding.)

case 4.1
-
After the test steps in case 3 (64 catalog modifying transactions, decode 100k
transactions), run 100k xacts and then decode.

master  0.3699
patched 0.3701

case 4.2
-
After the test steps in case 3 (64 catalog modifying transactions, decode 100k
transactions), run 64 DDLs(without checkpoint) and 100k xacts, then decode.

master  0.3687
patched 0.3696

Summary of the tests:
After applying this patch, there is a overhead of about 3% in the case decoding
100k transactions with 64 catalog modifying transactions. This is an extreme
case, so maybe it's okay. And case 4.1 and case 4.2 shows that the patch has no
effect on subsequent decoding. In other cases, there are no significant
differences.

For your information, here are the parameters specified in postgresql.conf in
the test.

shared_buffers = 8GB
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off

Regards,
Shi yu

Re: Make name optional in CREATE STATISTICS

2022-07-25 Thread Alvaro Herrera

On 2022-Jul-25, Michael Paquier wrote:

> On Mon, Jul 25, 2022 at 11:49:50AM +0200, Alvaro Herrera wrote:
> > On 2022-Jul-23, Michael Paquier wrote:

> >> By the way, it seems that 83011ce also broke the case of "REINDEX
> >> DATABASE CONCURRENTLY", where the parser missed the addition of a
> >> DefElem for "concurrently" in this case.
> > 
> > Wow.
> 
> For this one, we have a gap in the test, actually.  It seems to me
> that we'd better make sure that the OID of the indexes rebuilt
> concurrently is changed.  There is a REINDEX DATABASE CONCURRENTLY
> already in the TAP tests, and the only thing that would be needed for
> the job is an extra query that compares the OID saved before the
> reindex with the one in the catalogs after the fact..

Agreed.  I think you already have the query for that elsewhere in the
test, so it's just a matter of copying it from there.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No tengo por qué estar de acuerdo con lo que pienso"
 (Carlos Caszeli)

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2022-07-25 Thread Andrey Borodin




> 25 июля 2022 г., в 14:29, Bharath Rupireddy 
>  написал(а):
> 
> Hm, after thinking for a while, I tend to agree with the above
> approach - meaning, query cancel interrupt processing can completely
> be disabled in SyncRepWaitForLSN() and process proc die interrupt
> immediately, this approach requires no GUC as opposed to the proposed
> v1 patch upthread.
GUC was proposed here[0] to maintain compatibility with previous behaviour. But 
I think that having no GUC here is fine too. If we do not allow cancelation of 
unreplicated backends, of course.


>> 
>> And yes, we need additional complexity - but in some other place. 
>> Transaction can also be locally committed in presence of a server crash. But 
>> this another difficult problem. Crashed server must not allow data queries 
>> until LSN of timeline end is successfully replicated to 
>> synchronous_standby_names.
> 
> Hm, that needs to be done anyways. How about doing as proposed
> initially upthread [1]? Also, quoting the idea here [2].
> 
> Thoughts?
> 
> [1] 
> https://www.postgresql.org/message-id/CALj2ACUrOB59QaE6=jf2cfayv1mr7fzd8tr4ym5+oweyg1s...@mail.gmail.com
> [2] 2) Wait for sync standbys to catch up upon restart after the crash or
> in the next txn after the old locally committed txn was canceled. One
> way to achieve this is to let the backend, that's making the first
> connection, wait for sync standbys to catch up in ClientAuthentication
> right after successful authentication. However, I'm not sure this is
> the best way to do it at this point.


I think ideally startup process should not allow read only connections in 
CheckRecoveryConsistency() until WAL is not replicated to quorum al least up 
until new timeline LSN.

Thanks!

[0] https://commitfest.postgresql.org/34/2402/

Re: Make name optional in CREATE STATISTICS

2022-07-25 Thread Michael Paquier

On Mon, Jul 25, 2022 at 11:49:50AM +0200, Alvaro Herrera wrote:
> On 2022-Jul-23, Michael Paquier wrote:
>> As the problem comes down to the fact that INDEX/TABLE, SCHEMA and
>> DATABASE/SYSTEM need to handle names for different object types each,
>> I think that we could do something like the attached, removing one
>> block on the way at the cost of an extra parser node.
> 
> Yeah, looks good.  I propose to also test the error for reindexing a
> different database, which is currently uncovered, as attached.

Good idea.

>> By the way, it seems that 83011ce also broke the case of "REINDEX
>> DATABASE CONCURRENTLY", where the parser missed the addition of a
>> DefElem for "concurrently" in this case.
> 
> Wow.

For this one, we have a gap in the test, actually.  It seems to me
that we'd better make sure that the OID of the indexes rebuilt
concurrently is changed.  There is a REINDEX DATABASE CONCURRENTLY
already in the TAP tests, and the only thing that would be needed for
the job is an extra query that compares the OID saved before the
reindex with the one in the catalogs after the fact..
--
Michael


signature.asc
Description: PGP signature

Re: make -C libpq check fails obscurely if tap tests are disabled

2022-07-25 Thread Alvaro Herrera

On 2022-Jul-25, Alvaro Herrera wrote:

> Anyway, I tried a revert of 1bd201214965 -- I ended up with the
> attached.


-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
>From 387835c04436bc59b9c82e32d9259c74d68ceabf Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 25 Jul 2022 12:26:49 +0200
Subject: [PATCH] Try to undo 1bd201214965

---
 src/Makefile.global.in | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 14fdd4ef7b..d100137228 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -882,19 +882,14 @@ endif
 # using $(eval).  It will set up a target so that it recurses into a
 # given subdirectory.  For the tree-wide all/install/check/installcheck cases,
 # ensure we do our one-time tasks before recursing (see targets above).
-# Note that to avoid a nasty bug in make 3.80,
-# this function was written to not use any complicated constructs (like
-# multiple targets on a line) and also not contain any lines that expand
-# to more than about 200 bytes.  This is why we make it apply to just one
-# subdirectory at a time, rather than to a list of subdirectories.
 # $1: target name, e.g., all
 # $2: subdir name
 # $3: target to run in subdir, usually same as $1
 define _create_recursive_target
-.PHONY: $(1)-$(2)-recurse
-$(1): $(1)-$(2)-recurse
-$(1)-$(2)-recurse: $(if $(filter all install check installcheck, $(3)), submake-generated-headers) $(if $(filter check, $(3)), temp-install)
-	$$(MAKE) -C $(2) $(3)
+.PHONY: $(patsubst %,$(1)-%-recursive,$(2))
+$(1): $(patsubst %,$(1)-%-recursive,$(2))
+$(2:%=$(1)-%-recursive):
+	$$(MAKE) -C $$(patsubst $(1)-%-recursive,%,$$@) $(3)
 endef
 # Note that the use of $$ on the last line above is important; we want
 # $(MAKE) to be evaluated when the rule is run, not when the $(eval) is run
@@ -905,7 +900,7 @@ endef
 # $1: targets to make recursive (defaults to list of standard targets)
 # $2: list of subdirs (defaults to SUBDIRS variable)
 # $3: target to run in subdir (defaults to current element of $1)
-recurse = $(foreach target,$(if $1,$1,$(standard_targets)),$(foreach subdir,$(if $2,$2,$(SUBDIRS)),$(eval $(call _create_recursive_target,$(target),$(subdir),$(if $3,$3,$(target))
+recurse = $(foreach target,$(if $1,$1,$(standard_targets)),$(eval $(call _create_recursive_target,$(target),$(if $2,$2,$(SUBDIRS)),$(if $3,$3,$(target)
 
 # If a makefile's list of SUBDIRS varies depending on configuration, then
 # any subdirectories excluded from SUBDIRS should instead be added to
-- 
2.30.2

Re: make -C libpq check fails obscurely if tap tests are disabled

2022-07-25 Thread Alvaro Herrera

On 2022-Jul-22, Tom Lane wrote:

> Barring objections, I'll push the attached patch.  I suppose we
> could undo whatever dumbing-down was done in _create_recursive_target,
> but is it worth troubling with?

Excellent, many thanks.  I tried to get Make 3.80 built here, to no
avail.  I have updated that to 3.81, but still haven't found a way past
the automake phase.

Anyway, I tried a revert of 1bd201214965 -- I ended up with the
attached.  However, while a serial compile fails, parallel ones fail
randomly, and apparently because two submakes compete in building
libpq.a and each deletes the other's file.  What I think this is saying
is that the 3.80-induced wording of that function limits concurrency of
the generated recursive rules, which prevents the problem from
occurring; and if we were to fix that bug we would probably end up with
more concurrency.

Here's the bottom of the 'make -j8' log:

rm -f libpq.a
ranlib libpq.a
ranlib: 'libpq.a': No such file
make[5]: *** [/pgsql/source/master/src/Makefile.shlib:261: libpq.a] Error 1
make[5]: *** Waiting for unfinished jobs
ar crs libpq.a fe-auth-scram.o fe-connect.o fe-exec.o fe-lobj.o fe-misc.o 
fe-print.o fe-protocol3.o fe-secure.o fe-trace.o legacy-pqsignal.o 
libpq-events.o pqexpbuffer.o fe-auth.o fe-secure-common.o fe-secure-openssl.o
ranlib libpq.a
touch libpq.a
rm -f libpq.so.5
ln -s libpq.so.5.16 libpq.so.5
rm -f libpq.so
ln -s libpq.so.5.16 libpq.so
touch libpq-refs-stamp
rm -f libpq.so.5
ln -s libpq.so.5.16 libpq.so.5
rm -f libpq.so
ln -s libpq.so.5.16 libpq.so
rm -f libpq.so.5
ln -s libpq.so.5.16 libpq.so.5
touch libpq-refs-stamp
rm -f libpq.so
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 
-Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g 
-O2 -pthread -D_REENTRANT -D_THREAD_SAFE -fPIC -shared -Wl,-soname,libecpg.so.6 
-Wl,--version-script=exports.list -o libecpg.so.6.16  connect.o data.o 
descriptor.o error.o execute.o memory.o misc.o prepare.o sqlda.o typename.o 
-L../../../../src/port -L../../../../src/common -L../pgtypeslib -lpgtypes 
-L../../../../src/common -lpgcommon_shlib -L../../../../src/port -lpgport_shlib 
-L../../../../src/interfaces/libpq -lpq  -L/usr/lib/llvm-11/lib  
-Wl,--as-needed -Wl,-rpath,'/pgsql/install/master/lib',--enable-new-dtags  -lm 
ln -s libpq.so.5.16 libpq.so
rm -f libecpg.a
make[4]: *** [../../../../src/Makefile.global:618: submake-libpq] Error 2
ar crs libecpg.a connect.o data.o descriptor.o error.o execute.o memory.o 
misc.o prepare.o sqlda.o typename.o
make[3]: *** [Makefile:17: all-ecpglib-recursive] Error 2
make[3]: *** Waiting for unfinished jobs
ranlib libecpg.a
touch libecpg.a
rm -f libecpg.so.6
ln -s libecpg.so.6.16 libecpg.so.6
rm -f libecpg.so
ln -s libecpg.so.6.16 libecpg.so
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 
-Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g 
-O2 -pthread -D_REENTRANT -D_THREAD_SAFE -fPIC -shared 
-Wl,-soname,libecpg_compat.so.3 -Wl,--version-script=exports.list -o 
libecpg_compat.so.3.16  informix.o -L../../../../src/port 
-L../../../../src/common -L../ecpglib -lecpg -L../pgtypeslib -lpgtypes 
-L../../../../src/common -lpgcommon_shlib -L../../../../src/port -lpgport_shlib 
-L../../../../src/interfaces/libpq -lpq  -L/usr/lib/llvm-11/lib  
-Wl,--as-needed -Wl,-rpath,'/pgsql/install/master/lib',--enable-new-dtags  -lm 
rm -f libecpg_compat.a
ar crs libecpg_compat.a informix.o
ranlib libecpg_compat.a
touch libecpg_compat.a
rm -f libecpg_compat.so.3
ln -s libecpg_compat.so.3.16 libecpg_compat.so.3
rm -f libecpg_compat.so
ln -s libecpg_compat.so.3.16 libecpg_compat.so
make[2]: *** [Makefile:17: all-ecpg-recursive] Error 2
make[1]: *** [Makefile:42: all-interfaces-recursive] Error 2
make: *** [GNUmakefile:11: all-src-recursive] Error 2


-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Los dioses no protegen a los insensatos.  Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)

Re: Collect ObjectAddress for ATTACH DETACH PARTITION to use in event trigger

2022-07-25 Thread Michael Paquier

On Mon, Jul 25, 2022 at 09:25:07AM +, houzj.f...@fujitsu.com wrote:
> BTW, while reviewing it, I found there are some more subcommands that the
> get_altertable_subcmdtypes() doesn't handle(e.g., ADD/DROP/SET IDENTITY and 
> re ADD
> STAT). Shall we fix them all while on it ?
> 
> Attach a minor patch to fix those which is based on the v2 patch set.

@@ -300,6 +300,18 @@ get_altertable_subcmdinfo(PG_FUNCTION_ARGS)
[ ... ]
default:
strtype = "unrecognized";
break;

Removing the "default" clause would help here as we would get compiler
warnings if there is anything missing.  One way to do things is to set
strtype to NULL before going through the switch and have a failsafe as
some commands are internal so they may not be worth adding to the
output.
--
Michael


signature.asc
Description: PGP signature

Re: Proposal to provide the facility to set binary format output for specific OID's per session

2022-07-25 Thread Dave Cramer

Dave Cramer


On Sun, 24 Jul 2022 at 23:02, Kyotaro Horiguchi 
wrote:

> At Fri, 22 Jul 2022 11:00:18 -0400, Dave Cramer 
> wrote in
> > As a proof of concept I provide the attached patch which implements the
> > ability to specify which oids will be returned in binary format per
> > session.
> ...
> > Both the JDBC driver and the go driver can exploit this change with no
> > changes. I haven't confirmed if other drivers would work without changes.
>
> I'm not sure about the needs of that, but binary exchange format is
> not the one that can be turned on ignoring the peer's capability.

I'm not sure what this means. The client is specifying which types it wants
in binary format.

>   If
> JDBC driver wants some types be sent in binary format, it seems to be
> able to be specified in bind message.
>
To be clear it's not just the JDBC client; the original idea came from the
author of go driver.
And yes you can specify it in the bind message but you have to specify it
in *every* bind message which pretty much negates any advantage you might
get out of binary format due to the extra round trip.

Regards,
Dave

>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

Re: log_line_prefix: make it possible to add the search_path

2022-07-25 Thread Alvaro Herrera

On 2022-Jul-25, Pierre Ducroquet wrote:

> This is great for performance, but several tools are lacking around this 
> usecase by not showing the schema, one of them being log_line_prefix.

> The attached patch implements this, using %S. I've not written the 
> documentation yet, since I'm not sure this would be acceptable as is, or if a 
> more "generic" method should be used (I thought of %{name} to fetch an 
> arbitrary GUC, but did not implement due to a lack of need for that feature)

It seems that this would be too noisy to be truly usable.  What if we
emitted a log line when the variable changed, and the value that's in
use when the connection starts?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)

Re: Make name optional in CREATE STATISTICS

2022-07-25 Thread Alvaro Herrera

On 2022-Jul-23, Michael Paquier wrote:

> As the problem comes down to the fact that INDEX/TABLE, SCHEMA and
> DATABASE/SYSTEM need to handle names for different object types each,
> I think that we could do something like the attached, removing one
> block on the way at the cost of an extra parser node.

Yeah, looks good.  I propose to also test the error for reindexing a
different database, which is currently uncovered, as attached.

> By the way, it seems that 83011ce also broke the case of "REINDEX
> DATABASE CONCURRENTLY", where the parser missed the addition of a
> DefElem for "concurrently" in this case.

Wow.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index dfb0d85d66..f9037761f9 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -9115,8 +9115,8 @@ ReindexStmt:
 	ReindexStmt *n = makeNode(ReindexStmt);
 
 	n->kind = REINDEX_OBJECT_SCHEMA;
-	n->name = $5;
 	n->relation = NULL;
+	n->name = $5;
 	n->params = $2;
 	if ($4)
 		n->params = lappend(n->params,
@@ -9126,9 +9126,10 @@ ReindexStmt:
 			| REINDEX opt_reindex_option_list reindex_target_all opt_concurrently opt_single_name
 {
 	ReindexStmt *n = makeNode(ReindexStmt);
+
 	n->kind = $3;
-	n->name = $5;
 	n->relation = NULL;
+	n->name = $5;
 	n->params = $2;
 	if ($4)
 		n->params = lappend(n->params,
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 801b396398..6cd57e3eaa 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2529,6 +2529,9 @@ ERROR:  cannot reindex system catalogs concurrently
 -- Warns about catalog relations
 REINDEX SCHEMA CONCURRENTLY pg_catalog;
 WARNING:  cannot reindex system catalogs concurrently, skipping all
+-- Not the current database
+REINDEX DATABASE not_current_database;
+ERROR:  can only reindex the currently open database
 -- Check the relation status, there should not be invalid indexes
 \d concur_reindex_tab
  Table "public.concur_reindex_tab"
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 4b75790e47..a3738833b2 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1076,6 +1076,8 @@ REINDEX (CONCURRENTLY) SYSTEM postgres; -- ditto
 REINDEX (CONCURRENTLY) SYSTEM;  -- ditto
 -- Warns about catalog relations
 REINDEX SCHEMA CONCURRENTLY pg_catalog;
+-- Not the current database
+REINDEX DATABASE not_current_database;
 
 -- Check the relation status, there should not be invalid indexes
 \d concur_reindex_tab

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread vignesh C

On Mon, Jul 25, 2022 at 12:58 PM Peter Smith  wrote:
>
> Firstly, I have some (case-sensitivity) questions about the previous
> patch which was already pushed [1].
>
> Q1. create_subscription docs
>
> I did not understand why the docs refer to slot_name = NONE, yet the
> newly added option says origin = none/any. I think that saying origin
> = NONE/ANY would be more consistent with the existing usage of NONE in
> this documentation.

Both NONE and none are ok in the case of origin, if you want I can
change it to NONE/ANY in case of origin to keep it consistent.

Regards,
Vignesh

Re: Collect ObjectAddress for ATTACH DETACH PARTITION to use in event trigger

2022-07-25 Thread Alvaro Herrera

On 2022-Jul-25, houzj.f...@fujitsu.com wrote:

> BTW, while reviewing it, I found there are some more subcommands that the
> get_altertable_subcmdtypes() doesn't handle(e.g., ADD/DROP/SET IDENTITY and 
> re ADD
> STAT). Shall we fix them all while on it ?
> 
> Attach a minor patch to fix those which is based on the v2 patch set.

Yeah, I suppose these are all commands that were added after the last
serious round of event trigger hacking, so it would be good to have
everything on board.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2022-07-25 Thread Bharath Rupireddy

On Tue, May 10, 2022 at 5:55 PM Andrey Borodin  wrote:
>
> > 10 мая 2022 г., в 12:59, Bharath Rupireddy 
> >  написал(а):
> >
> > If okay, I can make the GUC behave this way - value 0 existing
> > behaviour i.e. no wait for sync repl ack, just process query cancels
> > and proc die interrupts immediately; value -1 wait unboundedly for the
> > ack; value > 0 wait for specified milliseconds for the ack.
> +1 if we make -1 and 0 only valid values.
>
> > query cancels or proc die interrupts
>
> Please note, that typical HA tool would need to handle query cancels and proc 
> die interrupts differently.

Hm, after thinking for a while, I tend to agree with the above
approach - meaning, query cancel interrupt processing can completely
be disabled in SyncRepWaitForLSN() and process proc die interrupt
immediately, this approach requires no GUC as opposed to the proposed
v1 patch upthread.

However, it's good to see what other hackers think about this.

> When the network is partitioned and somewhere standby is promoted you 
> definitely want infinite wait for cancels. Yet once upon a time you want to 
> shutdown postgres without coredump - thus proc die needs to be processed.

Agree.

On Mon, May 9, 2022 at 4:39 PM Andrey Borodin  wrote:
>
> > On 26 Apr 2022, at 11:26, Laurenz Albe  wrote:
> >
> > Is it worth adding additional complexity that is not a complete solution?
>
> Its not additional complexity. It is removing additional complexity that made 
> sync rep interruptible. (But I'm surely talking not about GUCs like 
> synchronous_replication_naptime_before_cancel - wait of sync rep must be 
> indefinite until synchrous_commit\synchronous_standby_names are satisfied )
>
> And yes, we need additional complexity - but in some other place. Transaction 
> can also be locally committed in presence of a server crash. But this another 
> difficult problem. Crashed server must not allow data queries until LSN of 
> timeline end is successfully replicated to synchronous_standby_names.

Hm, that needs to be done anyways. How about doing as proposed
initially upthread [1]? Also, quoting the idea here [2].

Thoughts?

[1] 
https://www.postgresql.org/message-id/CALj2ACUrOB59QaE6=jf2cfayv1mr7fzd8tr4ym5+oweyg1s...@mail.gmail.com
[2] 2) Wait for sync standbys to catch up upon restart after the crash or
in the next txn after the old locally committed txn was canceled. One
way to achieve this is to let the backend, that's making the first
connection, wait for sync standbys to catch up in ClientAuthentication
right after successful authentication. However, I'm not sure this is
the best way to do it at this point.

Regards,
Bharath Rupireddy.

RE: Collect ObjectAddress for ATTACH DETACH PARTITION to use in event trigger

2022-07-25 Thread houzj.f...@fujitsu.com

On Saturday, July 23, 2022 6:58 PM Michael Paquier  wrote:
> 
> On Sat, Jul 23, 2022 at 05:44:28PM +0900, Michael Paquier wrote:
> > Changing get_altertable_subcmdtypes() to return a set of rows made of
> > (subcommand, object description) is what I actually meant upthread as
> > it feels natural given a CollectedCommand in input, and as
> > pg_event_trigger_ddl_commands() only gives access to a set of
> > CollectedCommands.  This is also a test module so there is no issue in
> > changing the existing function definitions.
> >
> > But your point would be to have a new function that takes in input a
> > CollectedATSubcmd, returning back the object address or its
> > description?  How would you make sure that a subcommand maps to a
> > correct object address?
> 
> FWIW, I was thinking about something among the lines of 0002 on top of Hou's
> patch.

Thanks for the patches. The patches look good to me.

BTW, while reviewing it, I found there are some more subcommands that the
get_altertable_subcmdtypes() doesn't handle(e.g., ADD/DROP/SET IDENTITY and re 
ADD
STAT). Shall we fix them all while on it ?

Attach a minor patch to fix those which is based on the v2 patch set.

Best regards,
Hou zj




v2-0003-Add-support-for-some-missed-commands-in-test_ddl_dep_patch
Description:  v2-0003-Add-support-for-some-missed-commands-in-test_ddl_dep_patch

Re: Add last failed connection error message to pg_stat_wal_receiver

2022-07-25 Thread Michael Paquier

On Mon, Jul 25, 2022 at 12:19:40PM +0530, Bharath Rupireddy wrote:
> Thanks a lot Cary for reviewing. It will be great if you can add
> yourself as a reviewer and set the status accordingly in the CF entry
> here - https://commitfest.postgresql.org/38/3666/.

Hmm.  This stands for the connection error, but there are other things
that could cause a failure down the road, like an incorrect system
ID or a TLI-related report, so that seems a bit limited to me?
--
Michael


signature.asc
Description: PGP signature

Re: Collect ObjectAddress for ATTACH DETACH PARTITION to use in event trigger

2022-07-25 Thread Michael Paquier

On Mon, Jul 25, 2022 at 08:42:18AM +0530, Amit Kapila wrote:
> What I intended to say is similar to what you have done in the patch
> but in a new function. OTOH, your point that it is okay to change
> function signature/name in the test module seems reasonable to me.

Thanks.  Let's do with the function change then.  As introduced
orginally in b488c58, it returns an array that gets just unnested
once, so I'd like to think that it had better be a SRF from the
start but things are what they are.
--
Michael


signature.asc
Description: PGP signature

Password reset link / 'less' does not exit in psql version 13.4

2022-07-25 Thread Michael J. Baars

Hello,

I have two very simple questions:

1) I have an account at postgresql.org, but a link to a 'forgot password' seems 
to be missing on the login page. I have my password stored only on an old 
Fedora 32 computer. To change the password
when logged in, you need to supply the old password. In short, I have no way to 
migrate this postgresql.org account to my new Fedora 35 and Fedora 36 
computers. What can be done about this?

2) I have three psql clients running, a version 12.6, a version 13.4 and a 
version 14.3. Until now a 'select * from table;' showed the output in 'less' or 
something alike and exited from 'less' when
the output was complete. Both version 12.6 and version 13.4 work that way. 
Version 14.3 does not exit from 'less' when the output is complete. Did anyone 
notice this already?

Best regards,
Mischa Baars.

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Amit Kapila

On Mon, Jul 25, 2022 at 12:58 PM Peter Smith  wrote:
>
> Firstly, I have some (case-sensitivity) questions about the previous
> patch which was already pushed [1].
>
> Q1. create_subscription docs
>
> I did not understand why the docs refer to slot_name = NONE, yet the
> newly added option says origin = none/any. I think that saying origin
> = NONE/ANY would be more consistent with the existing usage of NONE in
> this documentation.
>
> ~~~
>
> Q2. parse_subscription_options
>
> Similarly, in the code (parse_subscription_options), I did not
> understand why the checks for special name values are implemented
> differently:
>
> The new 'origin' code is using pg_strcmpcase to check special values
> (none/any), and the old 'slot_name' code uses case-sensitive strcmp to
> check the special value (none).
>

We have a restriction for slot_names for lower case letters (aka
"Replication slot names may only contain lower case letters, numbers,
and the underscore character.") whereas there is no such restriction
in origin name, that's why the check is different. So, if you try with
slot_name = 'NONE', you will get the error.

-- 
With Regards,
Amit Kapila.

Re: SLRUs in the main buffer pool, redux

2022-07-25 Thread Heikki Linnakangas


On 25/07/2022 09:54, Heikki Linnakangas wrote:

In RecordTransactionCommit(), we enter a critical section, and then call
TransactionIdCommitTree() to update the CLOG pages. That now involves a
call to ReadBuffer_common(), which in turn calls
ResourceOwnerEnlargeBuffers(). That can fail, because it might require
allocating memory, which is forbidden in a critical section. I ran into
an assertion about that with "make check" when I was playing around with
a heavily modified version of this patch. Haven't seen it with your
original one, but I believe that's just luck.

Calling ResourceOwnerEnlargeBuffers() before entering the critical
section would probably fix that, although I'm a bit worried about having
the Enlarge call so far away from the point where it's needed.


Oh I just saw that you had a comment about that in the patch and had 
hacked around it. Anyway, calling ResourceOwnerEnlargeBuffers() might be 
a solution. Or switch to a separate "CriticalResourceOwner" that's 
guaranteed to have enough pre-allocated space, before entering the 
critical section.


- Heikki

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread vignesh C

On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz  wrote:
>
> On 7/22/22 12:47 AM, Amit Kapila wrote:
> > On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz  
> > wrote:
>
> >> 1. I'm concerned by calling this "Bidirectional replication" in the docs
> >> that we are overstating the current capabilities. I think this is
> >> accentuated int he opening paragraph:
> >>
> >> ==snip==
> >>Bidirectional replication is useful for creating a multi-master database
> >>environment for replicating read/write operations performed by any of 
> >> the
> >>member nodes.
> >> ==snip==
> >>
> >> For one, we're not replicating reads, we're replicating writes. Amongst
> >> the writes, at this point we're only replicating DML. A reader could
> >> think that deploying can work for a full bidirectional solution.
> >>
> >> (Even if we're aspirationally calling this section "Bidirectional
> >> replication", that does make it sound like we're limited to two nodes,
> >> when we can support more than two).
> >>
> >
> > Right, I think the system can support N-Way replication.
>
> I did some more testing of the feature, i.e. doing 3-node and 4-node
> tests. While logical replication today can handle replicating between
> multiple nodes (N-way), the "origin = none" does require setting up
> subscribers between each of the nodes.
>
> For example, if I have 4 nodes A, B, C, D and I want to replicate the
> same table between all of them, I need to set up subscribers between all
> of them (A<=>B, A<=>C, A<=>D, B<=>C, B<=>D, C<=>D). However, each node
> can replicate between each other in a way that's convenient (vs. having
> to do something funky with partitions) so this is still a big step forward.
>
> This is a long way of saying that I do think it's fair to say we support
> "N-way" replication so long as you are set up in a mesh (vs. a ring,
> e.g. A=>B=>C=>D=>A).
>
> > Among the above "Replicating changes between primaries" sounds good to
> > me or simply "Replication between primaries". As this is a sub-section
> > on the Logical Replication page, I feel it is okay to not use Logical
> > in the title.
>
> Agreed, I think that's fine.
>
> >> At a minimum, I think we should reference the documentation we have in
> >> the logical replication section on conflicts. We may also want to advise
> >> that a user is responsible for designing their schemas in a way to
> >> minimize the risk of conflicts.
> >>
> >
> > This sounds reasonable to me.
> >
> > One more point about docs, it appears to be added as the last
> > sub-section on the Logical Replication page. Is there a reason for
> > doing so? I feel this should be third sub-section after describing
> > Publication and Subscription.
>
> When I first reviewed, I had not built the docs. Did so on this pass.
>
> I agree with the positioning argument, i.e. it should go after
> "Subscription" in the table of contents -- but it makes me question a
> couple of things:
>
> 1. The general ordering of the docs
> 2. How we describe that section (more on that in a sec)
> 3. If "row filters" should be part of "subscription" instead of its own
> section.
>
> If you look at the current order, "Quick setup" is the last section; one
> would think the "quick" portion goes first :) Given a lot of this is for
> the current docs, I may start a separate discussion on -docs for this part.
>
> For the time being, I agree it should be moved to the section after
> "Subscription".
>
> I think what this section describes is "Configuring Replication Between
> Nodes" as it covers a few different scenarios.
>
> I do think we need to iterate on these docs -- the examples with the
> commands are generally OK and easy to follow, but a few things I noticed:
>
> 1. The general description of the section needs work. We may want to
> refine the description of the use cases, and in the warning, link to
> instructions on how to take backups.
> 2. We put the "case not supported" in the middle, not at the end.
> 3. The "generic steps for adding a new node..." section uses a
> convention for steps that is not found in the docs. We also don't
> provide an example for this section, and this is the most complicated
> scenario to set up.
>
> I may be able to propose some suggestions in a few days.
>
> > BTW, do you have any opinion on the idea of the first remaining patch
> > where we accomplish two things: a) Checks and throws an error if
> > 'copy_data = on' and 'origin = none' but the publication tables were
> > also replicated from other publishers. b) Adds 'force' value for
> > copy_data parameter to allow copying in such a case. The primary
> > reason for this patch is to avoid loops or duplicate data in the
> > initial phase. We can't skip copying based on origin as we can do
> > while replicating changes from WAL. So, we detect that the publisher
> > already has data from some other node and doesn't allow replication
> > unless the user uses the 'force' option for copy_data.
>
> In general, I agree with the patch; but

1 2 >

1 - 100 of 108 matches

Mail list logo