date:20220121

Re: BufferAlloc: don't take two simultaneous locks

2022-01-21 Thread Andrey Borodin




> 21 дек. 2021 г., в 10:23, Yura Sokolov  написал(а):
> 
> 

Hi Yura!

I've took a look into the patch. The idea seems reasonable to me: 
clearing\evicting old buffer and placing new one seem to be different units of 
work, there is no need to couple both partition locks together. And the claimed 
performance impact is fascinating! Though I didn't verify it yet.

On a first glance API change in BufTable does not seem obvious to me. Is void 
*oldelem actually BufferTag * or maybe BufferLookupEnt *? What if we would like 
to use or manipulate with oldelem in future?

And the name BufTableFreeDeleted() confuses me a bit. You know, in C we usually 
free(), but in C++ we delete [], and here we do both... Just to be sure.

Thanks!

Best regards, Andrey Borodin.

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-21 Thread Shruthi Gowda

On Sat, Jan 22, 2022 at 12:17 AM Robert Haas  wrote:
>
> On Fri, Jan 21, 2022 at 8:40 AM Shruthi Gowda  wrote:
> > From what I see in the code, template0 and postgres are the last
> > things that get created in initdb phase. The system OIDs that get
> > assigned to these DBs vary from release to release. At present, the
> > system assigned OIDs of template0 and postgres are 13679 and 13680
> > respectively. I feel it would be safe to assign 16000 and 16001 for
> > template0 and postgres respectively from the unpinned object OID range
> > 12000 - 16383. In the future, even if the initdb unpinned objects
> > reach the range of 16000 issues can only arise if initdb() creates
> > another system-created database for which the system assigns these
> > reserved OIDs (16000, 16001).
>
> It doesn't seem safe to me to rely on that. We don't know what could
> happen in the future if the number of built-in objects increases.
> Looking at the lengthy comment on this topic in transam.h, I see that
> there are three ranges:
>
> 1- manually assigned OIDs
> 1-11999 OIDs assigned by genbki.pl
> 12000-16384 OIDs assigned to unpinned objects post-bootstrap
>
> It seems to me that what this comment is saying is that OIDs in the
> second and third categories are doled out by counters. Therefore, we
> can't know which of those OIDs will get used, or how many of them will
> get used, or which objects will get which OIDs. Therefore, I think we
> should go back to the approach that you were using for template0 and
> handle both that database and postgres using that method. That is,
> assign an OID manually, and make sure unused_oids knows that it should
> be counted as already used.

Agree. In the latest patch, the template0 and postgres OIDs are fixed
to unused manually assigned OIDs 4 and 5 respectively. These OIDs are
no more listed as unused OIDs.


Regards,
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com


v13-0001-pg_upgrade-perserve-database-OID-patch.patch
Description: Binary data

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-21 Thread Shruthi Gowda

On Sat, Jan 22, 2022 at 12:27 AM Tom Lane  wrote:
>
> Robert Haas  writes:
> > It seems to me that what this comment is saying is that OIDs in the
> > second and third categories are doled out by counters. Therefore, we
> > can't know which of those OIDs will get used, or how many of them will
> > get used, or which objects will get which OIDs. Therefore, I think we
> > should go back to the approach that you were using for template0 and
> > handle both that database and postgres using that method. That is,
> > assign an OID manually, and make sure unused_oids knows that it should
> > be counted as already used.
>
> Indeed.  If you're going to manually assign OIDs to these databases,
> do it honestly, and put them into the range intended for that purpose.
> Trying to take short-cuts is just going to cause trouble down the road.

Understood. I will rework the patch accordingly. Thanks

Regards,
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com

Re: Skipping logical replication transactions on subscriber side

2022-01-21 Thread David G. Johnston

On Fri, Jan 21, 2022 at 10:30 PM Amit Kapila 
wrote:

> On Fri, Jan 21, 2022 at 10:00 PM David G. Johnston
>  wrote:
> >
> > On Fri, Jan 21, 2022 at 4:55 AM Amit Kapila 
> wrote:
> >>
> >> Apart from this, I have changed a few comments and ran pgindent. Do
> >> let me know what you think of the changes?
> >
> >
> > The paragraph describing ALTER SUBSCRIPTION SKIP seems unnecessarily
> repetitive.  Consider:
> > """
> > Skips applying all changes of the specified remote transaction, whose
> value should be obtained from pg_stat_subscription_workers.last_error_xid.
> >
>
> Here, you can also say that the value can be found from server logs as
> well.
>

subscriber's server logs, right?  I would agree that adding that for
completeness is warranted.

> > Then change the subskipxid column description to be:
> > """
> > ID of the transaction whose changes are to be skipped.  It is 0 when
> there are no pending skips.  This is set by issuing ALTER SUBSCRIPTION SKIP
> and resets back to 0 when the identified transactions passes through the
> subscription stream and is successfully ignored.
> > """
> >
>
> Users can manually reset it by specifying NONE, so that should be
> covered in the above text, otherwise, looks good.
>

I agree with incorporating "reset" into the paragraph somehow - does not
have to mention NONE, just that ALTER SUBSCRIPTION SKIP (not a family
friendly abbreviation...) is what does it.

> > I don't understand why/how ", if a valid transaction ID;" comes into
> play (how would we know whether it is valid, or if we do ALTER SUBSCRIPTION
> SKIP should prohibit the invalid value from being chosen).
> >
>
> What do you mean by invalid value here? Is it the value lesser than
> FirstNormalTransactionId or a value that is of the non-error
> transaction? For the former, we already have a check in the patch and
> for later we can't identify it with any certainty because the error
> stats are collected by the stats collector.
>

The original proposal qualifies the non-zero transaction id in
subskipxid as being a "valid transaction ID" and that invalid ones (which
is how "otherwise" is interpreted given the "valid" qualification preceding
it) are shown as 0.  As an end-user that makes me wonder what it means for
a transaction ID to be invalid.  My point is that dropping the mention of
"valid transaction ID" avoids that and lets the reader operate with an
understanding that things should "just work".  If I see a non-zero in the
column I have a pending skip and if I see zero I do not.  My wording
assumes it is that simple.  If it isn't I would need some clarity as to why
it is not in order to write something I could read and understand from my
inexperienced user-centric point-of-view.

I get that I may provide a transaction ID that is invalid such that the
system could never see it (or at least not for a long while) - say we
error on transaction 102 and I typo it as 1002 or 101.  But I would expect
either an error where I make the typo or the numbers 1002 or 101 to appear
on the table.  I would not expect my 101 typo to result in a 0 appearing on
the table (and if it does so today I argue that is a POLA violation).
Thus, "if a valid transaction ID" from the original text just doesn't make
sense to me.

In typical usage it would seem strange to allow a skip to be recorded if
there is no existing error in the subscription.  Should we (do we, haven't
read the code) warn in that situation?

*Or, why even force them to specify a number instead of just saying SKIP
and if there is a current error we skip its transaction, otherwise we warn
them that nothing happened because there is no last error.*

Additionally, the description for pg_stat_subscription_workers should
describe what happens once the transaction represented by last_error_xid
has either been successfully processed or skipped.  Does this "last error"
stick around until another error happens (which is hopefully very rare) or
does it reset to blanks?  Seems like it should reset, which really makes
this more of an "active_error" instead of a "last_error".  This system is
linear, we are stuck until this error is resolved, making it active.

> > I'm against mentioning subtransactions in the skip_option description.
> >
>
> We have mentioned that because currently, we don't support it but in
> the future one can come up with an idea to support it. What problem do
> you see with it?
>

If you ever get around to implementing the feature then by all means add
it.  My main issue is that we basically never talk about subtransactions in
the user-facing documentation and it doesn't seem desirable to do so here.
Knowing that a whole transaction is skipped is all I need to care about as
a user.  I believe that no users will be asking "what about subtransactions
(savepoints)" but by mentioning it less experienced ones will now have
something to be curious about that they really do not need to be.

>
> > The Logical Replication page changes provide good content ove

Re: Refactoring of compression options in pg_basebackup

2022-01-21 Thread Michael Paquier

On Fri, Jan 21, 2022 at 09:57:41AM -0500, Robert Haas wrote:
> Thanks. One thing I just noticed is that the enum we're using here is
> called WalCompressionMethod. But we're not compressing WAL. We're
> compressing tarfiles of the data directory.

Also, having this enum in walmethods.h is perhaps not the best place
either, even more if you plan to use that in pg_basebackup for the
server-side compression.  One idea is to rename this enum to
DataCompressionMethod, moving it into a new header, like common.h as
of the attached.
--
Michael
From 3d2aec3abdfa8d2c6faf639c1cbaaf73d9064f11 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 22 Jan 2022 14:43:19 +0900
Subject: [PATCH] Move declaration in new header for pg_basebackup

---
 src/bin/pg_basebackup/common.h| 26 ++
 src/bin/pg_basebackup/pg_basebackup.c |  5 +++--
 src/bin/pg_basebackup/pg_receivewal.c |  6 +++---
 src/bin/pg_basebackup/walmethods.c| 12 ++--
 src/bin/pg_basebackup/walmethods.h| 15 ---
 src/tools/pgindent/typedefs.list  |  2 +-
 6 files changed, 43 insertions(+), 23 deletions(-)
 create mode 100644 src/bin/pg_basebackup/common.h

diff --git a/src/bin/pg_basebackup/common.h b/src/bin/pg_basebackup/common.h
new file mode 100644
index 00..abad3b25ca
--- /dev/null
+++ b/src/bin/pg_basebackup/common.h
@@ -0,0 +1,26 @@
+/*-
+ *
+ * common.h
+ *
+ * Common declarations shared across all tools of src/bin/pg_basebackup/.
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/bin/pg_basebackup/common.h
+ *
+ *-
+ */
+
+#ifndef BASEBACKUP_COMMON_H
+#define BASEBACKUP_COMMON_H
+
+/* Types of compression supported */
+typedef enum
+{
+	COMPRESSION_GZIP,
+	COMPRESSION_LZ4,
+	COMPRESSION_NONE
+} DataCompressionMethod;
+
+#endif			/* BASEBACKUP_COMMON_H */
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index d5b0ade10d..2c66e4fd87 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -29,6 +29,7 @@
 
 #include "access/xlog_internal.h"
 #include "bbstreamer.h"
+#include "common.h"
 #include "common/file_perm.h"
 #include "common/file_utils.h"
 #include "common/logging.h"
@@ -123,7 +124,7 @@ static bool showprogress = false;
 static bool estimatesize = true;
 static int	verbose = 0;
 static int	compresslevel = 0;
-static WalCompressionMethod compressmethod = COMPRESSION_NONE;
+static DataCompressionMethod compressmethod = COMPRESSION_NONE;
 static IncludeWal includewal = STREAM_WAL;
 static bool fastcheckpoint = false;
 static bool writerecoveryconf = false;
@@ -943,7 +944,7 @@ parse_max_rate(char *src)
  * from the parsed results.
  */
 static void
-parse_compress_options(char *src, WalCompressionMethod *methodres,
+parse_compress_options(char *src, DataCompressionMethod *methodres,
 	   int *levelres)
 {
 	char	   *sep;
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index ccb215c398..9f2e633c80 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -52,7 +52,7 @@ static bool do_drop_slot = false;
 static bool do_sync = true;
 static bool synchronous = false;
 static char *replication_slot = NULL;
-static WalCompressionMethod compression_method = COMPRESSION_NONE;
+static DataCompressionMethod compression_method = COMPRESSION_NONE;
 static XLogRecPtr endpos = InvalidXLogRecPtr;
 
 
@@ -114,7 +114,7 @@ usage(void)
  */
 static bool
 is_xlogfilename(const char *filename, bool *ispartial,
-WalCompressionMethod *wal_compression_method)
+DataCompressionMethod *wal_compression_method)
 {
 	size_t		fname_len = strlen(filename);
 	size_t		xlog_pattern_len = strspn(filename, "0123456789ABCDEF");
@@ -285,7 +285,7 @@ FindStreamingStart(uint32 *tli)
 	{
 		uint32		tli;
 		XLogSegNo	segno;
-		WalCompressionMethod wal_compression_method;
+		DataCompressionMethod wal_compression_method;
 		bool		ispartial;
 
 		if (!is_xlogfilename(dirent->d_name,
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index f74bd13315..b87afec5a6 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -49,7 +49,7 @@
 typedef struct DirectoryMethodData
 {
 	char	   *basedir;
-	WalCompressionMethod compression_method;
+	DataCompressionMethod compression_method;
 	int			compression_level;
 	bool		sync;
 	const char *lasterrstring;	/* if set, takes precedence over lasterrno */
@@ -561,7 +561,7 @@ dir_get_file_size(const char *pathname)
 	return statbuf.st_size;
 }
 
-static WalCompressionMethod
+static DataCompressionMethod
 dir_compression_method(void)
 {
 	return dir_data->compression_method;
@@ -608,7 +608,7 @@ dir_finish(void)
 
 WalWriteMethod *
 CreateWalDire

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread David G. Johnston

On Fri, Jan 21, 2022 at 5:14 PM James Coleman  wrote:

>
> > Really?  That's horrid, because that's directly useful advice.
>
> Remedied, but rewritten a bit to better fit with the new style/goal of
> that tip).
>
> Version 3 is attached.
>
>
Coming back to this after a respite I think the tip needs to be moved just
like everything else.  For much the same reason (though this may only be a
personal bias), I know what SQL Commands do the various things that DDL
encompasses (especially the basics like adding a column) and so the DDL
section is really just a tutorial-like chapter that I will generally forget
about because I will go straight to the official source which is the SQL
Command Reference.  My future self would want the tip to show up there.  If
we put the tip after the existing paragraph that starts: "Adding a column
with a volatile DEFAULT or changing the type of an existing column..." the
need to specify an example function in the tip goes away - though maybe it
should be moved to the notes paragraph instead: "with a volatile DEFAULT
(e.g., clock_timestamp()) or  changing the type of an existing column..."

David J.

Re: Skipping logical replication transactions on subscriber side

2022-01-21 Thread Amit Kapila

On Fri, Jan 21, 2022 at 10:00 PM David G. Johnston
 wrote:
>
> On Fri, Jan 21, 2022 at 4:55 AM Amit Kapila  wrote:
>>
>> Apart from this, I have changed a few comments and ran pgindent. Do
>> let me know what you think of the changes?
>
>
> The paragraph describing ALTER SUBSCRIPTION SKIP seems unnecessarily 
> repetitive.  Consider:
> """
> Skips applying all changes of the specified remote transaction, whose value 
> should be obtained from pg_stat_subscription_workers.last_error_xid.
>

Here, you can also say that the value can be found from server logs as well.

>
  While this will result in avoiding the last error on the
subscription, thus allowing it to resume working.  See "link to a more
holistic description in the Logical Replication chapter" for
alternative means of resolving subscription errors.  Removing an
entire transaction from the history of a table should be considered a
last resort as it can leave the system in a very inconsistent state.
>
> Note, this feature will not accept transactions prepared under two-phase 
> commit.
>
> This command sets pg_subscription.subskipxid field upon issuance and the 
> system clears the same field upon seeing and successfully skipped the 
> identified transaction.  Issuing this command again while a skipped 
> transaction is pending replaces the existing transaction with the new one.
> """
>

The proposed text sounds better to me except for a minor change as
suggested above.

> Then change the subskipxid column description to be:
> """
> ID of the transaction whose changes are to be skipped.  It is 0 when there 
> are no pending skips.  This is set by issuing ALTER SUBSCRIPTION SKIP and 
> resets back to 0 when the identified transactions passes through the 
> subscription stream and is successfully ignored.
> """
>

Users can manually reset it by specifying NONE, so that should be
covered in the above text, otherwise, looks good.

> I don't understand why/how ", if a valid transaction ID;" comes into play 
> (how would we know whether it is valid, or if we do ALTER SUBSCRIPTION SKIP 
> should prohibit the invalid value from being chosen).
>

What do you mean by invalid value here? Is it the value lesser than
FirstNormalTransactionId or a value that is of the non-error
transaction? For the former, we already have a check in the patch and
for later we can't identify it with any certainty because the error
stats are collected by the stats collector.

> I'm against mentioning subtransactions in the skip_option description.
>

We have mentioned that because currently, we don't support it but in
the future one can come up with an idea to support it. What problem do
you see with it?

> The Logical Replication page changes provide good content overall but I 
> dislike going into detail about how to perform conflict resolution in the 
> third paragraph and then summarize the various forms of conflict resolution 
> in the newly added forth.  Maybe re-work things like:
>
> 1. Logical replication behaves...
> 2. A conflict will produce...details can be found in places...
> 3. Resolving conflicts can be done by...
> 4. (split and reworded) If choosing to simply skip the offending transaction 
> you take the pg_stat_subscription_worker.last_error_xid value (716 in the 
> example above) and provide it while executing ALTER SUBSCRIPTION SKIP...
> 5. (split and reworded) Prior to v15 ALTER SUBSCRIPTION SKIP was not 
> available and instead you had to use the pg_replication_origin_advance() 
> function...
>
> Don't just list out two options for the user to perform the same action.  
> Tell a story about why we felt compelled to add ALTER SYSTEM SKIP and why 
> either the function is now deprecated or is useful given different 
> circumstances (the former seems likely).
>

Personally, I don't see much value in the split (especially giving
context like "Prior to v15 ..) but specifying the circumstances where
each of the options could be useful.

-- 
With Regards,
Amit Kapila.

Re: A test for replay of regression tests

2022-01-21 Thread Thomas Munro

On Sat, Jan 22, 2022 at 8:48 AM Andres Freund  wrote:
> Unfortunately we don't quite seem there yet:

And another way to fail:

pg_dump: error: query failed: ERROR:  canceling statement due to
conflict with recovery

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi&dt=2022-01-22%2003%3A06%3A42

Probably needs hot_standby_feedback on.  Will adjust this soon.

One more failure seen in today's crop was a "stats" failure on
seawasp, which must be the well known pre-existing problem.  (Probably
just needs someone to rewrite the stats subsystem to use shared memory
instead of UDP).

Re: row filtering for logical replication

2022-01-21 Thread Amit Kapila

On Fri, Jan 21, 2022 at 8:19 PM Alvaro Herrera  wrote:
>
> If ATTACH PARTITION or CREATE TABLE .. PARTITION OF don't let you
> specify replica identity, I suspect it's because both partitioning and
> logical replication were developed in parallel, and neither gave too
> much thought to the other.
>

I think the reason CREATE TABLE .. syntax form doesn't have a way to
specify RI is that we need to have an index for RI. Consider the below
example:

CREATE TABLE parent (a int primary key, b int not null, c varchar)
PARTITION BY RANGE(a);
CREATE TABLE child PARTITION OF parent FOR VALUES FROM (0) TO (250);
CREATE UNIQUE INDEX b_index on child(b);
ALTER TABLE child REPLICA IDENTITY using INDEX b_index;

In this, the parent table's replica identity is the primary
key(default) and the child table's replica identity is the b_index. I
think if we want we can come up with some syntax to combine these
steps and allow to specify replica identity during the second step
(Create ... Partition) but not sure if we have a convincing reason for
this feature per se.

>
> I suspect that a better way to attack this problem is to let ALTER TABLE
> ... ATTACH PARTITION and CREATE TABLE .. PARTITION OF specify a replica
> identity as necessary.
>
> My suggestion is to avoid painting us into a corner from which it will
> be impossible to get out later.
>

Apart from the above reason, here we are just following the current
model of how the update/delete behaves w.r.t RI. Now, I think in the
future we can also think of uplifting some of the restrictions related
to RI for filters if we find a good way to have columns values that
are not in WAL. We have discussed this previously in this thread and
thought that it is sensible to have a RI restriction for
updates/deletes as the patch is doing for the first version.

I am not against inventing some new syntaxes for row/column filter
patches but there doesn't seem to be a very convincing reason for it
and there is a good chance that we won't be able to accomplish that
for the current version.

-- 
With Regards,
Amit Kapila.

Re: Schema variables - new implementation for Postgres 15

2022-01-21 Thread Julien Rouhaud

Hi,

On Fri, Jan 21, 2022 at 09:23:34PM +0100, Pavel Stehule wrote:
> 
> st 19. 1. 2022 v 9:01 odesílatel Julien Rouhaud  napsal:
> >
> > Also, the pg_dump handling emits a COLLATION option for session variables
> > even
> > for default collation, while it should only emit it if the collation is
> > not the
> > type's default collation.  As a reference, for attributes the SQL used is:
> >
> >  "CASE WHEN a.attcollation
> > <> t.typcollation "
> >  "THEN a.attcollation ELSE
> > 0 END AS attcollation,\n"
> >
> 
> Isn't it a different issue? I don't see filtering DEFAULT_COLLATION_OID in
> pg_dump code. But this case protects against a redundant COLLATE clause,
> and for consistency, this check should be done for variables too.

Yes, sorry my message was a bit ambiguous as for all native collatable types
the "default" collation is the type's default collation, I thought that the
code extract would make it clear enough.

In any case your fix is exactly what I had in mind so it's perfect, thanks!

> I am sending updated patches

Thanks a lot!  I will try to review them over the weekend.

Re: fairywren is generating bogus BASE_BACKUP commands

2022-01-21 Thread Thomas Munro

On Sat, Jan 22, 2022 at 3:55 PM Robert Haas  wrote:
> On Fri, Jan 21, 2022 at 5:35 PM Andrew Dunstan  wrote:
> > # See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
> > local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;
> > Probably in this case just setting it to 'server:' would do the trick.
>
> Oh, thanks for the tip. Do you want to push a commit that does that,
> or ... should I do it?

Just a thought:  Would it prevent the magic path translation and all
just work if the path were already in Windows form?  So, if we did
just this change at the top:

-my $tempdir = PostgreSQL::Test::Utils::tempdir;
+my $tempdir = 
PostgreSQL::Test::Utils::perl2host(PostgreSQL::Test::Utils::tempdir);

Re: How to get started with contribution

2022-01-21 Thread vrund shah

Respected Sir\Mam
This year I am planning to take part in GSOC 2022 in the PostgreSQL
organization.

Regards
Vrund V Shah

On Sat, Jan 22, 2022 at 8:17 AM vrund shah  wrote:

> Respected  Sir\Mam
>
> I am already using PostgreSQL for my college purpose and for learning SQL.
> I have learned SQL from udemy courses with instructor Jose Portilla. and I
> am well aware of PostgreSQL and PGAdmin.
>
> Regards
> Vrund V Shah
>
> On Sat, Jan 22, 2022 at 8:12 AM vrund shah  wrote:
>
>> Thank you for your valuable guidance.
>> I will surely look at the links and if have any queries then I will
>> contact you.
>>
>> regards
>> Vrund V Shah
>>
>> On Sat, Jan 22, 2022 at 2:23 AM Stephen Frost  wrote:
>>
>>> Greetings,
>>>
>>> * Tomas Vondra (tomas.von...@enterprisedb.com) wrote:
>>> > On 1/21/22 21:28, Stephen Frost wrote:
>>> > >* vrund v shah (vrund3...@gmail.com) wrote:
>>> > >>I am Vrund V Shah, a computer science undergrad. I have just
>>> completed my
>>> > >>3^rd semester at G H Patel College of Engineering & Technology.
>>> I am new
>>> > >>to open source contribution but I am well aware of C/C++, SQL
>>> and I will
>>> > >>learn Python before the end of the first week of February. I
>>> would love to
>>> > >>contribute to your organization but don’t know how!!
>>> > >>
>>> > >>Could you please guide me on how and from where to start?
>>> > >
>>> > >I'd suggest you start with patch reviews if you're interested in
>>> working
>>> > >on the core PostgreSQL server code.  Information on that is available
>>> > >here:
>>> > >
>>> > >https://wiki.postgresql.org/wiki/Reviewing_a_Patch
>>> > >
>>> >
>>> > Yeah, that's what I recommend people who ask me this question.
>>> >
>>> > However, that wiki page is more about the process than about "what" to
>>> do,
>>> > so my advice to the OP would be to first go to the current CF [1] and
>>> look
>>> > for patches that would be genuinely useful for him/her (e.g. because of
>>> > work). And do the review by following the wiki page.
>>>
>>> Yeah.  There's also this:
>>>
>>> https://wiki.postgresql.org/wiki/Developer_FAQ
>>>
>>> where the first topic is about getting involved in PG development, and
>>> there's:
>>>
>>> https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
>>>
>>> which covers a bit more about mailing lists and such.
>>>
>>> Thanks,
>>>
>>> Stephen
>>>
>>

Re: Proposal: allow database-specific role memberships

2022-01-21 Thread Kenaniah Cerny

Thanks for the feedback.

I have attached an alternate version of the v5 patch that incorporates the
suggested changes to the documentation and DRYs up some of the acl.c code
for comparison. As for the databaseOid / InvalidOid parameter, I'm open to
any suggestions for how to make that even cleaner, but am currently at a
loss as to how that would look.

CI is showing a failure to run pg_dump on just the Linux - Debian Bullseye
job (https://cirrus-ci.com/task/5265343722553344). Does anyone have any
ideas as to where I should look in order to debug that?

Kenaniah

On Fri, Jan 21, 2022 at 3:04 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Fri, Jan 21, 2022 at 3:12 PM Kenaniah Cerny  wrote:
>
>> The latest rebased version of the patch is attached.
>>
>
> As I was just reminded, we tend to avoid specifying specific PostgreSQL
> versions in our documentation.  We just say what the current version does.
> Here, the note sentences at lines 62 and 63 don't follow documentation
> norms on that score and should just be removed.  The last two sentences
> belong in the main description body, not a note.  Thus the whole note goes
> away.
>
> I don't think I really appreciated the value this feature would have when
> combined with the predefined roles like pg_read_all_data and
> pg_write_all_data.
>
> I suppose I don't really appreciate the warning about SUPERUSER, etc...or
> at least why this warning is somehow specific to the per-database version
> of role membership.  If this warning is desirable it should be worded to
> apply to role membership in general - and possibly proposed as a separate
> patch for consideration.
>
> I didn't dive deeply but I think we now have at three places in the acl.c
> code where after setting memlist from the system cache we perform nearly
> identical for loops to generate the final roles_list.  Possibly this needs
> a refactor first so that you can introduce the per-database stuff more
> succinctly.  Basically, the vast majority of this commit is just adding
> InvalidOid and databaseOid all other the place - with a few minor code
> changes to accommodate the new arguments.  The acl.c code should try and be
> made done the same after post-refactor.
>
> David J.
>
>


database-role-memberships-v5-alternate.patch
Description: Binary data

Re: How to get started with contribution

2022-01-21 Thread vrund shah

Respected  Sir\Mam

I am already using PostgreSQL for my college purpose and for learning SQL.
I have learned SQL from udemy courses with instructor Jose Portilla. and I
am well aware of PostgreSQL and PGAdmin.

Regards
Vrund V Shah

On Sat, Jan 22, 2022 at 8:12 AM vrund shah  wrote:

> Thank you for your valuable guidance.
> I will surely look at the links and if have any queries then I will
> contact you.
>
> regards
> Vrund V Shah
>
> On Sat, Jan 22, 2022 at 2:23 AM Stephen Frost  wrote:
>
>> Greetings,
>>
>> * Tomas Vondra (tomas.von...@enterprisedb.com) wrote:
>> > On 1/21/22 21:28, Stephen Frost wrote:
>> > >* vrund v shah (vrund3...@gmail.com) wrote:
>> > >>I am Vrund V Shah, a computer science undergrad. I have just
>> completed my
>> > >>3^rd semester at G H Patel College of Engineering & Technology. I
>> am new
>> > >>to open source contribution but I am well aware of C/C++, SQL and
>> I will
>> > >>learn Python before the end of the first week of February. I
>> would love to
>> > >>contribute to your organization but don’t know how!!
>> > >>
>> > >>Could you please guide me on how and from where to start?
>> > >
>> > >I'd suggest you start with patch reviews if you're interested in
>> working
>> > >on the core PostgreSQL server code.  Information on that is available
>> > >here:
>> > >
>> > >https://wiki.postgresql.org/wiki/Reviewing_a_Patch
>> > >
>> >
>> > Yeah, that's what I recommend people who ask me this question.
>> >
>> > However, that wiki page is more about the process than about "what" to
>> do,
>> > so my advice to the OP would be to first go to the current CF [1] and
>> look
>> > for patches that would be genuinely useful for him/her (e.g. because of
>> > work). And do the review by following the wiki page.
>>
>> Yeah.  There's also this:
>>
>> https://wiki.postgresql.org/wiki/Developer_FAQ
>>
>> where the first topic is about getting involved in PG development, and
>> there's:
>>
>> https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
>>
>> which covers a bit more about mailing lists and such.
>>
>> Thanks,
>>
>> Stephen
>>
>

Re: fairywren is generating bogus BASE_BACKUP commands

2022-01-21 Thread Robert Haas

On Fri, Jan 21, 2022 at 5:35 PM Andrew Dunstan  wrote:
> # See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
> local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;
> Probably in this case just setting it to 'server:' would do the trick.

Oh, thanks for the tip. Do you want to push a commit that does that,
or ... should I do it?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Skipping logical replication transactions on subscriber side

2022-01-21 Thread Amit Kapila

On Fri, Jan 21, 2022 at 7:23 PM Peter Eisentraut
 wrote:
>
> On 21.01.22 04:08, Masahiko Sawada wrote:
> >> I think the superuser check in AlterSubscription() might no longer be
> >> appropriate.  Subscriptions can now be owned by non-superusers.  Please
> >> check that.
> >
> > IIUC we don't allow non-superuser to own the subscription yet. We
> > still have the following superuser checks:
> >
> > In CreateSubscription():
> >
> >  if (!superuser())
> >  ereport(ERROR,
> >  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> >   errmsg("must be superuser to create subscriptions")));
> >
> > and in AlterSubscriptionOwner_internal();
> >
> >  /* New owner must be a superuser */
> >  if (!superuser_arg(newOwnerId))
> >  ereport(ERROR,
> >  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> >   errmsg("permission denied to change owner of
> > subscription \"%s\"",
> >  NameStr(form->subname)),
> >   errhint("The owner of a subscription must be a 
> > superuser.")));
> >
> > Also, doing superuser check here seems to be consistent with
> > pg_replication_origin_advance() which is another way to skip
> > transactions and also requires superuser permission.
>
> I'm referring to commit a2ab9c06ea15fbcb2bfde570986a06b37f52bcca.  You
> still have to be superuser to create a subscription, but you can change
> the owner to a nonprivileged user and it will observe table permissions
> on the subscriber.
>
> Assuming my understanding of that commit is correct, I think it would be
> sufficient in your patch to check that the current user is the owner of
> the subscription.
>

Won't we already do that for Alter Subscription command which means
nothing special needs to be done for this? However, it seems to me
that the idea we are trying to follow here is that as this option can
lead to data inconsistency, it is good to allow only superusers to
specify this option. The owner of the subscription can be changed to
non-superuser as well in which case I think it won't be a good idea to
allow this option. OTOH, if we think it is okay to allow such an
option to users that don't have superuser privilege then I think
allowing it to the owner of the subscription makes sense to me.

-- 
With Regards,
Amit Kapila.

Re: How to get started with contribution

2022-01-21 Thread vrund shah

Thank you for your valuable guidance.
I will surely look at the links and if have any queries then I will contact
you.

regards
Vrund V Shah

On Sat, Jan 22, 2022 at 2:23 AM Stephen Frost  wrote:

> Greetings,
>
> * Tomas Vondra (tomas.von...@enterprisedb.com) wrote:
> > On 1/21/22 21:28, Stephen Frost wrote:
> > >* vrund v shah (vrund3...@gmail.com) wrote:
> > >>I am Vrund V Shah, a computer science undergrad. I have just
> completed my
> > >>3^rd semester at G H Patel College of Engineering & Technology. I
> am new
> > >>to open source contribution but I am well aware of C/C++, SQL and
> I will
> > >>learn Python before the end of the first week of February. I would
> love to
> > >>contribute to your organization but don’t know how!!
> > >>
> > >>Could you please guide me on how and from where to start?
> > >
> > >I'd suggest you start with patch reviews if you're interested in working
> > >on the core PostgreSQL server code.  Information on that is available
> > >here:
> > >
> > >https://wiki.postgresql.org/wiki/Reviewing_a_Patch
> > >
> >
> > Yeah, that's what I recommend people who ask me this question.
> >
> > However, that wiki page is more about the process than about "what" to
> do,
> > so my advice to the OP would be to first go to the current CF [1] and
> look
> > for patches that would be genuinely useful for him/her (e.g. because of
> > work). And do the review by following the wiki page.
>
> Yeah.  There's also this:
>
> https://wiki.postgresql.org/wiki/Developer_FAQ
>
> where the first topic is about getting involved in PG development, and
> there's:
>
> https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
>
> which covers a bit more about mailing lists and such.
>
> Thanks,
>
> Stephen
>

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread James Coleman

On Fri, Jan 21, 2022 at 5:38 PM Tom Lane  wrote:
>
> "David G. Johnston"  writes:
> > You've removed the "constraint verification scan" portion of this.
>
> Indeed, because that's got nothing to do with adding a new column
> (per se; adding a constraint along with the column is a different
> can of worms).

Yeah. Initially I'd thought I'd wanted it there, but by explicitly
linking people to the ALTER TABLE docs for more details (I've made
that a link now too) I'm now inclined to agree that tightly focusing
the tip is better form.

> > Re-reading this, the recommendation:
>
> > - However, if the default value is volatile (e.g.,
> > - clock_timestamp())
> > - each row will need to be updated with the value calculated at the time
> > - ALTER TABLE is executed. To avoid a potentially
> > - lengthy update operation, particularly if you intend to fill the
> > column
> > - with mostly nondefault values anyway, it may be preferable to add the
> > - column with no default, insert the correct values using
> > - UPDATE, and then add any desired default as
> > described
> > - below.
>
> > has now been completely removed from the documentation.
>
> Really?  That's horrid, because that's directly useful advice.

Remedied, but rewritten a bit to better fit with the new style/goal of
that tip).

Version 3 is attached.

James Coleman
From ffca825ca27cffc70c7eb39385545a76fa0d9e2d Mon Sep 17 00:00:00 2001
From: James Coleman 
Date: Fri, 24 Sep 2021 09:59:27 -0400
Subject: [PATCH v3 1/2] Document atthasmissing default avoids verification
 table scan

When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant
default value without rewriting the table the doc changes did not note
how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL.
Since adding a NOT NULL constraint requires a verification table scan to
ensure no values are null, users want to know that the combined
operation also avoids the table scan.
---
 doc/src/sgml/ref/alter_table.sgml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index a76e2e7322..1dde16fa39 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1355,7 +1355,9 @@ WITH ( MODULUS numeric_literal, REM
 evaluated at the time of the statement and the result stored in the
 table's metadata.  That value will be used for the column for all existing
 rows.  If no DEFAULT is specified, NULL is used.  In
-neither case is a rewrite of the table required.
+neither case is a rewrite of the table required.  A NOT NULL
+constraint may be added to the new column in the same statement without
+requiring scanning the table to verify the constraint.

 

-- 
2.17.1

From 3f119b3f67f6452ff7594d4f19d60ca17a09e19f Mon Sep 17 00:00:00 2001
From: jcoleman 
Date: Fri, 21 Jan 2022 18:50:39 +
Subject: [PATCH v3 2/2] Don't double document ADD COLUMN optimization details

Instead point people to the source of truth. This also avoids using
different language ("constant" versus "non-volatile").
---
 doc/src/sgml/ddl.sgml | 24 
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 22f6c5c7ab..efd9542252 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1406,25 +1406,17 @@ ALTER TABLE products ADD COLUMN description text;
 

 
- From PostgreSQL 11, adding a column with
- a constant default value no longer means that each row of the table
- needs to be updated when the ALTER TABLE statement
- is executed. Instead, the default value will be returned the next time
- the row is accessed, and applied when the table is rewritten, making
- the ALTER TABLE very fast even on large tables.
-
+ Adding a new column can require rewriting the whole table,
+ making it slow for large tables.  However, the rewrite can be optimized
+ away in some cases, depending on what default value is given to the
+ column.  See  for details.
 
-
- However, if the default value is volatile (e.g.,
- clock_timestamp())
- each row will need to be updated with the value calculated at the time
- ALTER TABLE is executed. To avoid a potentially
- lengthy update operation, particularly if you intend to fill the column
- with mostly nondefault values anyway, it may be preferable to add the
+ When a rewrite is required (e.g., a volatile default value like
+ clock_timestamp()) it may be preferable to add the
  column with no default, insert the correct values using
  UPDATE, and then add any desired default as described
- below.
-
+ below.  This is particularly true if you intend to fill the column
+ with mostly nondefault values anyway.

 

-- 
2.17.1

Re: do only critical work during single-user vacuum?

2022-01-21 Thread Bossart, Nathan

On 1/21/22, 2:43 PM, "John Naylor"  wrote:
> - to have a simple, easy to type, command

AFAICT the disagreement is really just about the grammar.
Sawada-san's idea would look something like

VACUUM (FREEZE, INDEX_CLEANUP OFF, MIN_XID_AGE 16, MIN_MXID_AGE 
16);

while your proposal looks more like

VACUUM (WRAPAROUND);

The former is highly configurable, but it is probably annoying to type
at 3 AM, and the interaction between the two *_AGE options is not
exactly intuitive (although I expect MIN_XID_AGE to be sufficient in
most cases).  The latter is not as configurable, but it is much easier
to type at 3 AM.

I think simplicity is a good goal, but I don't know if the difference
between the two approaches outweighs the benefits of configurability.
If you are in an emergency situation, you already will have to take
down the server, connect in single-user mode to the database(s) that
need vacuuming, and actually do the vacuuming.  The wraparound
WARNING/ERROR already has a HINT that describes the next steps
required.  Perhaps it would be enough to also emit an example VACUUM
command to use.

I think folks will find the configurability useful, too.  With
MIN_XID_AGE, it's super easy to have pg_cron vacuum everything over
500M on the weekend (and also do index cleanup), which may allow you
to use more relaxed autovacuum settings during the week.  The docs
already have suggestions for manually vacuuming when the load is low
[0], so I think it is reasonable to build additional support for this
use-case.

Nathan

[0] 
https://www.postgresql.org/docs/devel/routine-vacuuming.html#VACUUM-FOR-SPACE-RECOVERY

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread James Coleman

On Fri, Jan 21, 2022 at 4:08 PM Andrew Dunstan  wrote:
>
>
> On 1/21/22 13:55, James Coleman wrote:
> > On Thu, Jan 20, 2022 at 3:43 PM James Coleman  wrote:
> >> As noted earlier I expect to be posting an updated patch soon.
> > Here's the updated series. In 0001 I've moved the documentation tweak
> > into the ALTER TABLE notes section. In 0002 I've taken David J's
> > suggestion of shortening the "Tip" on the DDL page and mostly using it
> > to point people to the Notes section on the ALTER TABLE page.
>
>
> I don't really like the first part of patch 1, but as it gets removed by
> patch 2 we can move past that.

At first I was very confused by this feedback, but after looking at
the patch files I sent, that's my fault: I meant to remove the
modification of the "Tip" section but somehow missed that in what I
sent. I'll correct that in the next patch series.

James Coleman

Re: fairywren is generating bogus BASE_BACKUP commands

2022-01-21 Thread Andres Freund

Hi,

On 2022-01-21 17:26:32 -0500, Robert Haas wrote:
> I think the syntax has been accepted since pg_basebackup was added in 2011,
> and Andres added it to this test case earlier this week (with -cfast in the
> subject line of the commit message).

The reason I used -cfast instead of -c fast or --checkpoint=fast is that the
way perltidy formats leads to very wide lines already, and making them even
longer seemed unattractive...

Given the -cfast syntax successfully passed tests on at least AIX, freebsd,
linux, macos, netbsd, openbsd, windows msvc, windows msys, I'm not too worried
about portability either.

Greetings,

Andres Freund

Re: fairywren is generating bogus BASE_BACKUP commands

2022-01-21 Thread Robert Haas

On Fri, Jan 21, 2022 at 5:42 PM Tom Lane  wrote:
> The point I was trying to make is that if we have to jump through
> that sort of hoop in the test scripts, then real users are going
> to have to jump through it as well, and they won't like that
> (and we will get bug reports about it).  It'd be better to design
> the option syntax to avoid such requirements.

Well, as Andrew points out, pg_basebackup's -T foo=bar syntax causes
the same issue, so you'll need to redesign that, too. But even that is
not really a proper fix. The real root of this problem is that the
operating system's notion of a valid path differs from PostgreSQL's
notion of a valid path on this platform, and I imagine that fixing
that is a rather large project.

ISTM that you're basically just complaining about options syntax that
you don't like, but I think there's nothing particularly worse about
this syntax than lots of other things we type all the time. psql -v
VAR=VALUE -P OTHERKINDOFVAR=OTHERVALUE? curl -u USER:PASSWORD?
pg_basebackup -T OLD=NEW? perl -d[t]:MODULE=OPT,OPT? I mean, that last
one actually seems kinda horrible and if this were as bad as that I'd
say yeah, it should be redesigned. But I don't think it is. There's
plenty of precedent for bundling closely-related values into a single
command-line option, which is all I've done here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Proposal: allow database-specific role memberships

2022-01-21 Thread David G. Johnston

On Fri, Jan 21, 2022 at 3:12 PM Kenaniah Cerny  wrote:

> The latest rebased version of the patch is attached.
>

As I was just reminded, we tend to avoid specifying specific PostgreSQL
versions in our documentation.  We just say what the current version does.
Here, the note sentences at lines 62 and 63 don't follow documentation
norms on that score and should just be removed.  The last two sentences
belong in the main description body, not a note.  Thus the whole note goes
away.

I don't think I really appreciated the value this feature would have when
combined with the predefined roles like pg_read_all_data and
pg_write_all_data.

I suppose I don't really appreciate the warning about SUPERUSER, etc...or
at least why this warning is somehow specific to the per-database version
of role membership.  If this warning is desirable it should be worded to
apply to role membership in general - and possibly proposed as a separate
patch for consideration.

I didn't dive deeply but I think we now have at three places in the acl.c
code where after setting memlist from the system cache we perform nearly
identical for loops to generate the final roles_list.  Possibly this needs
a refactor first so that you can introduce the per-database stuff more
succinctly.  Basically, the vast majority of this commit is just adding
InvalidOid and databaseOid all other the place - with a few minor code
changes to accommodate the new arguments.  The acl.c code should try and be
made done the same after post-refactor.

David J.

Re: fairywren is generating bogus BASE_BACKUP commands

2022-01-21 Thread Andres Freund

Hi,

On 2022-01-21 17:42:45 -0500, Tom Lane wrote:
> Andrew Dunstan  writes:
> > c.f. src/bin/pg_verifybackup/t/003_corruption.pl which says:
> >     my $source_ts_prefix = $source_ts_path;
> >     $source_ts_prefix =~ s!(^[A-Z]:/[^/]*)/.*!$1!;
> >     ...
>
> >     # See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
> >     local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;
>
> > Probably in this case just setting it to 'server:' would do the trick.
>
> The point I was trying to make is that if we have to jump through
> that sort of hoop in the test scripts, then real users are going
> to have to jump through it as well, and they won't like that
> (and we will get bug reports about it).  It'd be better to design
> the option syntax to avoid such requirements.

Normal users aren't going to invoke a "native" basebackup from inside msys. I
assume the translation happens because an "msys world" perl invokes
a "native" pg_basebackup via msys system(), right?  If pg_basebackup instead is
"normally" invoked from a windows terminal, or anything else "native" windows,
the problem won't exist, no?

As we're building a "native" postgres in this case, none of our tools should
internally have such translations happening. So I don't think it'll be a huge
issue for users themselves?

Not that I think that there are all that many users of mingw built postgres on
windows... I think it's mostly msvc built postgres in that world?

Greetings,

Andres Freund

Re: Synchronizing slots from primary to standby

2022-01-21 Thread Hsu, John

   > I might be missing something but isn’t it okay even if the new primary
   > server is behind the subscribers? IOW, even if two slot's LSNs (i.e.,
   > restart_lsn and confirm_flush_lsn) are behind the subscriber's remote
   > LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only
   > transactions that were committed after the remote_lsn. So the
   > subscriber can resume logical replication with the new primary without
   > any data loss.

Maybe I'm misreading, but I thought the purpose of this to make 
sure that the logical subscriber does not have data that has not been
replicated to the new primary. The use-case I can think of would be
if synchronous_commit were enabled and fail-over occurs. If
we didn't have this set, isn't it possible that this logical subscriber
has extra commits that aren't present on the newly promoted primary?

And sorry I accidentally started a new thread in my last reply. 
Re-pasting some of my previous questions/comments:

wait_for_standby_confirmation does not update standby_slot_names once it's
in a loop and can't be fixed with SIGHUP. Similarly, synchronize_slot_names 
isn't updated once the worker is launched.

If a logical slot was dropped on the writer, should the worker drop logical 
slots that it was previously synchronizing but are no longer present? Or 
should we leave that to the user to manage? I'm trying to think why users 
would want to sync logical slots to a reader but not have that be dropped 
as well if it's no longer present.

Is there a reason we're deciding to use one-worker syncing per database 
instead of one general worker that syncs across all the databases? 
I imagine I'm missing something obvious here. 

As for how standby_slot_names should be configured, I'd prefer the 
flexibility similar to what we have for synchronus_standby_names since 
that seems the most analogous. It'd provide flexibility for failovers, 
which I imagine is the most common use-case.

On 1/20/22, 9:34 PM, "Masahiko Sawada"  wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.

On Wed, Dec 15, 2021 at 7:13 AM Peter Eisentraut
 wrote:
>
> On 31.10.21 11:08, Peter Eisentraut wrote:
> > I want to reactivate $subject.  I took Petr Jelinek's patch from [0],
> > rebased it, added a bit of testing.  It basically works, but as
> > mentioned in [0], there are various issues to work out.
> >
> > The idea is that the standby runs a background worker to periodically
> > fetch replication slot information from the primary.  On failover, a
> > logical subscriber would then ideally find up-to-date replication slots
> > on the new publisher and can just continue normally.
>
> > So, again, this isn't anywhere near ready, but there is already a lot
> > here to gather feedback about how it works, how it should work, how to
> > configure it, and how it fits into an overall replication and HA
> > architecture.
>
> The second,
> standby_slot_names, is set on the primary.  It holds back logical
> replication until the listed physical standbys have caught up.  That
> way, when failover is necessary, the promoted standby is not behind the
> logical replication consumers.

I might be missing something but isn’t it okay even if the new primary
server is behind the subscribers? IOW, even if two slot's LSNs (i.e.,
restart_lsn and confirm_flush_lsn) are behind the subscriber's remote
LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only
transactions that were committed after the remote_lsn. So the
subscriber can resume logical replication with the new primary without
any data loss.

The new primary should not be ahead of the subscribers because it
forwards the logical replication start LSN to the slot’s
confirm_flush_lsn in this case. But it cannot happen since the remote
LSN of the subscriber’s origin is always updated first, then the
confirm_flush_lsn of the slot on the primary is updated, and then the
confirm_flush_lsn of the slot on the standby is synchronized.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: New developer papercut - Makefile references INSTALL

2022-01-21 Thread Andres Freund

Hi,

On 2022-01-21 17:25:08 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > It seems quite workable to continue for INSTALL to be generated, but have 
> > the
> > result checked in. The rate of changes to 
> > {installation,install-windows}.sgml
> > isn't that high, and when things change, it's actually useful to be able to
> > see the current instructions from a console.
> > Might even be good to be forced to see the text version of INSTALL when
> > changing the sgml docs...
>
> Not sure about that, because
>
> (1) if done wrong, it'd make it impossible to commit into the
> docs unless you have a working docs toolchain on your machine,
> whether you wanted to touch installation.sgml or not;

Hm, do we really want committers do that at any frequency? I think that
concern makes sense for contributors, but I think it's reasonable to expect
the docs to be built before committing changes.

It might be relevant that the dependencies for INSTALL generation are
considerably smaller than for a full docs build. It needs xsltproc, xmllint
and pandoc. Not tiny, but still a lot less than the full docbook toolchain.

On a debian container with just enough stuff installed to get through
./configure --without-readline --without-zlib (to minimize things installed
from another source):

apt-get install -y xsltproc libxml2-utils pandoc
...
The following NEW packages will be installed:
  libcmark-gfm-extensions0 libcmark-gfm0 libicu67 libxml2 libxml2-utils 
libxslt1.1 pandoc pandoc-data xsltproc
0 upgraded, 9 newly installed, 0 to remove and 0 not upgraded.
Need to get 28.8 MB of archives.
After this operation, 193 MB of additional disk space will be used.

Re committers, the biggest issue would presumably be working on windows :(. I
don't think the docs build is integrated into the msvc tooling right now.

> (2) we'd inevitably get a lot of diff noise because of different
> committers having different versions of the docs toolchain.
> (Unlike configure, trying to require uniformity of those tools
> seems quite impractical.)

Fair point, no idea how big that aspect is. I'd expect xsltproc to be halfway
OK in that regard, and it's certainly not changing much anymore. Pandoc, I
have no idea.

> Perhaps this could be finessed by making updating of INSTALL
> the responsibility of some post-commit hook on the git server.
> Not sure that we want to go there, though.  In any case, that
> approach would negate your point about seeing the results.

It would. I guess it'd still be better than the situation today, but...

Greetings,

Andres Freund

Re: fairywren is generating bogus BASE_BACKUP commands

2022-01-21 Thread Tom Lane

Andrew Dunstan  writes:
> c.f. src/bin/pg_verifybackup/t/003_corruption.pl which says:
>     my $source_ts_prefix = $source_ts_path;
>     $source_ts_prefix =~ s!(^[A-Z]:/[^/]*)/.*!$1!;
>     ...

>     # See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
>     local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;

> Probably in this case just setting it to 'server:' would do the trick.

The point I was trying to make is that if we have to jump through
that sort of hoop in the test scripts, then real users are going
to have to jump through it as well, and they won't like that
(and we will get bug reports about it).  It'd be better to design
the option syntax to avoid such requirements.

regards, tom lane

Re: do only critical work during single-user vacuum?

2022-01-21 Thread John Naylor

On Wed, Jan 19, 2022 at 5:26 PM Michael Paquier  wrote:
>
> Could you avoid introducing a new grammar pattern in VACUUM?  Any new
> option had better be within the parenthesized part as it is extensible
> at will with its set of DefElems.

This new behavior is not an option that one can sensibly mix with
other options as the user sees fit, but rather hard-codes the
parameters for its single purpose. That said, I do understand your
objection.

[*thinks*]

How about the attached patch (and test script)? It still needs polish,
but it could work. It allows "verbose" to coexist, although that's
really only for testing normal mode. While testing in single-user
mode, I was sad to find out that it not only doesn't emit messages
(not a client), but also doesn't log. That would have been a decent
way to monitor progress...

In this form, I'm no longer a fan of calling the option "wraparound",
because it's too close to the "is_wraparound" param member.
Internally, at least, we can use "emergency" or "minimal". (In fact
the bit symbol is VACOPT_MINIMAL for this draft). That can be worked
out later.

On Fri, Jan 21, 2022 at 12:59 AM Masahiko Sawada  wrote:
>
> The purpose of this thread is to provide a way for users to run vacuum
> only very old tables (while skipping index cleanup, etc.),

Ah, thank you Sawada-san, now I understand why we have been talking
past each other. The purpose is actually:

- to have a simple, easy to type, command
- intended for single-user mode, but not limited to it (so it's easy to test)
- to get out of single user mode as quickly as possible

--
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 283ffaea77..6183f412d3 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -52,6 +52,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
+#include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -114,6 +115,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	bool		full = false;
 	bool		disable_page_skipping = false;
 	bool		process_toast = true;
+	bool		wraparound = false;
 	ListCell   *lc;
 
 	/* index_cleanup and truncate values unspecified for now */
@@ -200,6 +202,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	params.nworkers = nworkers;
 			}
 		}
+		else if (strcmp(opt->defname, "wraparound") == 0)
+			wraparound = defGetBoolean(opt);
 		else
 			ereport(ERROR,
 	(errcode(ERRCODE_SYNTAX_ERROR),
@@ -246,17 +250,51 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		}
 	}
 
+	if (wraparound)
+	{
+		/* exclude incompatible options */
+		foreach(lc, vacstmt->options)
+		{
+			DefElem*opt = (DefElem *) lfirst(lc);
+
+			// WIP is there a better way?
+			if (strcmp(opt->defname, "wraparound") != 0 &&
+strcmp(opt->defname, "verbose") != 0 &&
+defGetBoolean(opt))
+
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("option \"%s\" is incompatible with WRAPAROUND", opt->defname),
+parser_errposition(pstate, opt->location)));
+		}
+
+		/* skip unnecessary work, as in failsafe mode */
+		params.index_cleanup = VACOPTVALUE_DISABLED;
+		params.truncate = VACOPTVALUE_DISABLED;
+	}
+
 	/*
-	 * All freeze ages are zero if the FREEZE option is given; otherwise pass
-	 * them as -1 which means to use the default values.
+	 * Set freeze ages to zero where appropriate; otherwise pass
+	 * them as -1 which means to use the configured values.
 	 */
 	if (params.options & VACOPT_FREEZE)
 	{
+		/* All freeze ages are zero if the FREEZE option is given */
 		params.freeze_min_age = 0;
 		params.freeze_table_age = 0;
 		params.multixact_freeze_min_age = 0;
 		params.multixact_freeze_table_age = 0;
 	}
+	else if (params.options & VACOPT_MINIMAL)
+	{
+		/* it's unlikely any selected table will not be eligible for aggressive vacuum, but make sure */
+		params.freeze_table_age = 0;
+		params.multixact_freeze_table_age = 0;
+
+		// WIP: It might be worth trying to do less work here, or at least hard-coding the default values
+		params.freeze_min_age = -1;
+		params.multixact_freeze_min_age = -1;
+	}
 	else
 	{
 		params.freeze_min_age = -1;
@@ -894,6 +932,8 @@ get_all_vacuum_rels(int options)
 	Relation	pgclass;
 	TableScanDesc scan;
 	HeapTuple	tuple;
+	int32 		table_xid_age,
+table_mxid_age;
 
 	pgclass = table_open(RelationRelationId, AccessShareLock);
 
@@ -909,12 +949,42 @@ get_all_vacuum_rels(int options)
 		if (!vacuum_is_relation_owner(relid, classForm, options))
 			continue;
 
+		if (options | VACOPT_MINIMAL)
+		{
+			/*
+			* Only consider relations able to hold unfrozen XIDs (anything else
+			* should have InvalidTransactionId in relfrozenxid anyway).
+			*/
+			if (classForm->relkind != RELKIND_RELATION &&
+classForm->relkind != RELKIND_MATVIEW &&
+classForm->relkind != RELKIND_

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread Tom Lane

"David G. Johnston"  writes:
> You've removed the "constraint verification scan" portion of this.

Indeed, because that's got nothing to do with adding a new column
(per se; adding a constraint along with the column is a different
can of worms).

> Re-reading this, the recommendation:

> - However, if the default value is volatile (e.g.,
> - clock_timestamp())
> - each row will need to be updated with the value calculated at the time
> - ALTER TABLE is executed. To avoid a potentially
> - lengthy update operation, particularly if you intend to fill the
> column
> - with mostly nondefault values anyway, it may be preferable to add the
> - column with no default, insert the correct values using
> - UPDATE, and then add any desired default as
> described
> - below.

> has now been completely removed from the documentation.

Really?  That's horrid, because that's directly useful advice.

regards, tom lane

Re: fairywren is generating bogus BASE_BACKUP commands

2022-01-21 Thread Tom Lane

Robert Haas  writes:
> On Fri, Jan 21, 2022 at 5:09 PM Tom Lane  wrote:
>> While we're on the subject of ill-chosen option syntax: "-cfast"
>> with non double dashes?  Really?  That's horribly ambiguous.

> I'm not sure whether you're complaining that we accept that syntax or
> using it, but AFAIK I'm responsible for neither. I think the syntax
> has been accepted since pg_basebackup was added in 2011, and Andres
> added it to this test case earlier this week (with -cfast in the
> subject line of the commit message).

pg_basebackup's help defines the syntax as

  -c, --checkpoint=fast|spread
 set fast or spread checkpointing

which I'd read as requiring a space (or possibly equal sign)
between "-c" and "fast".  If it works as written in this test,
that's an accident of the particular getopt implementation,
and I'll bet it won't be too long before we come across
a getopt that doesn't like it.

regards, tom lane

Re: fairywren is generating bogus BASE_BACKUP commands

2022-01-21 Thread Andrew Dunstan



On 1/21/22 17:10, Thomas Munro wrote:
> On Sat, Jan 22, 2022 at 10:42 AM Robert Haas  wrote:
>> # Running: pg_basebackup --no-sync -cfast --target
>> server:/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/tmp_test_Ag8r/backuponserver
>> -X none
>> pg_basebackup: error: could not initiate base backup: ERROR:
>> unrecognized target: "server;C"
>>
>> "server" is a valid backup target, but "server;C" is not. And I think
>> this must be a bug on the client side, because the server logs the
>> generated query:
> It looks a bit like msys perl could be recognising
> "server:/home/pgrunner/..." and converting it to
> "server;C:\tools\msys64\home\pgrunner\...".  From a bit of light
> googling I see that such conversions happen in msys perl's system()
> unless you turn them off with MSYS_NO_PATHCONV, and then we'd have to
> do it ourselves in the right places.



c.f. src/bin/pg_verifybackup/t/003_corruption.pl which says:


    my $source_ts_prefix = $source_ts_path;
    $source_ts_prefix =~ s!(^[A-Z]:/[^/]*)/.*!$1!;
    ...

    # See https://www.msys2.org/wiki/Porting/#filesystem-namespaces
    local $ENV{MSYS2_ARG_CONV_EXCL} = $source_ts_prefix;


Probably in this case just setting it to 'server:' would do the trick.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread David G. Johnston

On Fri, Jan 21, 2022 at 2:50 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Fri, Jan 21, 2022 at 2:08 PM Andrew Dunstan 
> wrote:
> >> I know what it's replacing refers to release 11, but let's stop doing
> >> that. How about something like this?
> >>
> >> Adding a new column can sometimes require rewriting the table,
> >> making it a very slow operation. However in many cases this rewrite
> >> and related verification scans can be optimized away by using an
> >> appropriate default value. See the notes in ALTER
> >> TABLE for details.
>
> > I think it is a virtue, and am supported in that feeling by the existing
> > wording, to be explicit about the release before which these
> optimizations
> > can not happen.  The docs generally use this to good effect without
> > overdoing it.  This is a prime example.
>
> The fact of the matter is that optimizations of this sort have existed
> for years.  (For example, I think we've optimized away the rewrite
> when the new column is DEFAULT NULL since the very beginning.)  So it
> does not help to write the text as if there were no such optimizations
> before version N and they were all there in N.
>

Fair point, and indeed the v10 docs do mention the NULL (or no default)
optimization.


> I agree that Andrew's text could stand a pass of "omit needless words".
> But I also think that we could be a bit more explicit about what "slow"
> means.  Maybe like
>
> Adding a new column can require rewriting the whole table,
> making it slow for large tables.  However the rewrite can be optimized
> away in some cases, depending on what default value is given to the
> column.  See ALTER TABLE for details.
>
>
Comma needed after however.
You've removed the "constraint verification scan" portion of this. Maybe:
"""
...
column.  The same applies for the NOT NULL constraint verification scan.
See ALTER TABLE for details.
"""


Re-reading this, the recommendation:

- However, if the default value is volatile (e.g.,
- clock_timestamp())
- each row will need to be updated with the value calculated at the time
- ALTER TABLE is executed. To avoid a potentially
- lengthy update operation, particularly if you intend to fill the
column
- with mostly nondefault values anyway, it may be preferable to add the
- column with no default, insert the correct values using
- UPDATE, and then add any desired default as
described
- below.

has now been completely removed from the documentation.  I suggest having
this remain as the Tip and turning the optimization stuff into a Note.


> (the ALTER TABLE reference should be a link, too)
>

Yeah, the page does have a link already (fairly close by...) but with these
changes putting one here seems to make sense.

David J.

Re: fairywren is generating bogus BASE_BACKUP commands

2022-01-21 Thread Robert Haas

On Fri, Jan 21, 2022 at 5:09 PM Tom Lane  wrote:
> I think the backup_target string was already corrupted that way when
> pg_basebackup absorbed it from optarg.  It's pretty hard to believe that
> the strchr/pnstrdup stanza got it wrong.  However, comparing the
> TARGET_DETAIL to what the TAP test says it issued:
>
> # Running: pg_basebackup --no-sync -cfast --target 
> server:/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/tmp_test_Ag8r/backuponserver
>  -X none
> pg_basebackup: error: could not initiate base backup: ERROR:  unrecognized 
> target: "server;C"
>
> it's absolutely clear that something decided to munge the target string.
> Given that colon is reserved in Windows filename syntax, it's not
> so surprising if it munged it wrong, or at least more aggressively
> than you expected.

Nothing in the Perl code tells it that the particular argument in
question is a path rather than anything else, so it must be applying
some heuristic to decide whether to munge it. That's horrible.

> I kinda wonder if this notation for the target was well-chosen.
> Keeping the file name strictly separate from the "type" keyword
> might be a wiser move.  Quite aside from Windows-isms, there
> are going to be usages where this is hard to tell from a URL.
> (If memory serves, double leading slash is significant to some
> networked file systems.)

Maybe. I think it's important that the notation is not ridiculously
verbose, and -t server --target-detail $PATH is a LOT more typing.

> While we're on the subject of ill-chosen option syntax: "-cfast"
> with non double dashes?  Really?  That's horribly ambiguous.
> Most programs would parse something like that as five single-letter
> options, and most users who know Unix would expect it to mean that.

I'm not sure whether you're complaining that we accept that syntax or
using it, but AFAIK I'm responsible for neither. I think the syntax
has been accepted since pg_basebackup was added in 2011, and Andres
added it to this test case earlier this week (with -cfast in the
subject line of the commit message). FWIW, though, I've been aware of
that syntax for a long time and never thought it was a problem. I
usually spell the option in exactly that way when I use it, and I'm
relatively sure that things I've given to customers would break if we
removed support for it. I don't know how we'd do that anyway, since
all that's happening here is a call to getopt_long().

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: New developer papercut - Makefile references INSTALL

2022-01-21 Thread Tom Lane

Andres Freund  writes:
> It seems quite workable to continue for INSTALL to be generated, but have the
> result checked in. The rate of changes to {installation,install-windows}.sgml
> isn't that high, and when things change, it's actually useful to be able to
> see the current instructions from a console.
> Might even be good to be forced to see the text version of INSTALL when
> changing the sgml docs...

Not sure about that, because

(1) if done wrong, it'd make it impossible to commit into the
docs unless you have a working docs toolchain on your machine,
whether you wanted to touch installation.sgml or not;

(2) we'd inevitably get a lot of diff noise because of different
committers having different versions of the docs toolchain.
(Unlike configure, trying to require uniformity of those tools
seems quite impractical.)

Perhaps this could be finessed by making updating of INSTALL
the responsibility of some post-commit hook on the git server.
Not sure that we want to go there, though.  In any case, that
approach would negate your point about seeing the results.

regards, tom lane

Re: Proposal: allow database-specific role memberships

2022-01-21 Thread Kenaniah Cerny

The latest rebased version of the patch is attached.

On Tue, Jan 11, 2022 at 11:01 PM Julien Rouhaud  wrote:

> Hi,
>
> On Thu, Dec 2, 2021 at 2:26 AM Kenaniah Cerny  wrote:
> >
> > Attached is a rebased version of the patch that omits catversion.h in
> order to avoid conflicts.
>
> Unfortunately even without that the patch doesn't apply anymore
> according to the cfbot: http://cfbot.cputube.org/patch_36_3374.log
>
> 1 out of 3 hunks FAILED -- saving rejects to file
> src/backend/parser/gram.y.rej
> [...]
> 2 out of 8 hunks FAILED -- saving rejects to file
> src/bin/pg_dump/pg_dumpall.c.rej
>
> Could you send a rebased version?
>
> In the meantime I'm switching the patch to Waiting on Author.
>


database-role-memberships-v5.patch
Description: Binary data

Re: New developer papercut - Makefile references INSTALL

2022-01-21 Thread Andres Freund

Hi,

On 2022-01-21 11:49:12 -0500, Robert Haas wrote:
> On Fri, Jan 21, 2022 at 11:39 AM Tom Lane  wrote:
> > =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> > > Another solution would be to merge both README files together and make
> > > separate section for development/git based codebase.
> >
> > There's a lot to be said for that approach: make it simpler, not
> > more complicated.

I agree, that's the right direction.

> Yeah. And what about just getting rid of the INSTALL file altogether?

Yea, I think that might be worth doing too, at least in some form. It's
certainly not helpful to have it in the tarball but not the git tree.

I tried to find the discussion around removing INSTALL from the source tree,
but it seems to actually have centered much more around HISTORY
https://www.postgresql.org/message-id/200403091751.i29HpiV24304%40candle.pha.pa.us

It seems quite workable to continue for INSTALL to be generated, but have the
result checked in. The rate of changes to {installation,install-windows}.sgml
isn't that high, and when things change, it's actually useful to be able to
see the current instructions from a console.

Might even be good to be forced to see the text version of INSTALL when
changing the sgml docs...

> I think that, in 2022, a lot of people are likely to use git to obtain
> the source code rather than obtain a tarball.

Indeed.

> And regardless of what method they use to get the source code, they don't
> really need there to be a text file in the directory with installation
> instructions; a URL is just fine.

Even working with git trees, I do quite prefer having the instructions
available in a terminal compatible way, TBH. The building happens in a
terminal, after all.  In our case it's made worse by the browser version being
split across ~10 pages and multiple chapters.

> There was a time when you couldn't count on people to have a web browser
> conveniently available, either because that whole world wide web thing
> hadn't really caught on yet, or because they didn't even have an always-on
> Internet connection. In that world, an INSTALL file in the tarball makes a
> lot of sense. But these delays, the number of people who are still obtaining
> PostgreSQL via UUCP-over-modem-relay has got to be ... relatively limited.

There's still people having to build postgres on systems without internet
access - but typically they'll have access to the instructions when developin
the scripts for that...

Greetings,

Andres Freund

Re: fairywren is generating bogus BASE_BACKUP commands

2022-01-21 Thread Thomas Munro

On Sat, Jan 22, 2022 at 10:42 AM Robert Haas  wrote:
> # Running: pg_basebackup --no-sync -cfast --target
> server:/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/tmp_test_Ag8r/backuponserver
> -X none
> pg_basebackup: error: could not initiate base backup: ERROR:
> unrecognized target: "server;C"
>
> "server" is a valid backup target, but "server;C" is not. And I think
> this must be a bug on the client side, because the server logs the
> generated query:

It looks a bit like msys perl could be recognising
"server:/home/pgrunner/..." and converting it to
"server;C:\tools\msys64\home\pgrunner\...".  From a bit of light
googling I see that such conversions happen in msys perl's system()
unless you turn them off with MSYS_NO_PATHCONV, and then we'd have to
do it ourselves in the right places.

Re: fairywren is generating bogus BASE_BACKUP commands

2022-01-21 Thread Tom Lane

Robert Haas  writes:
> "server" is a valid backup target, but "server;C" is not. And I think
> this must be a bug on the client side, because the server logs the
> generated query:

> 2022-01-21 20:53:11.618 UTC [8404:10] 010_pg_basebackup.pl LOG:
> received replication command: BASE_BACKUP ( LABEL 'pg_basebackup base
> backup',  PROGRESS,  CHECKPOINT 'fast',  MANIFEST 'yes',
> TABLESPACE_MAP,  TARGET 'server;C',  TARGET_DETAIL
> '\\tools\\msys64\\home\\pgrunner\\bf\\root\\HEAD\\pgsql.build\\src\\bin\\pg_basebackup\\tmp_check\\tmp_test_Ag8r\\backuponserver')

> So it's not that the server parses the query and introduces gibberish
> -- the client has already introduced gibberish when constructing the
> query. But the code to do that is pretty straightforward -- we just
> call strchr to find the colon in the backup target, and then
> pnstrdup() the part before the colon and use the latter part as-is. If
> pnstrdup were failing to add a terminating \0 then this would be quite
> plausible, but I think it shouldn't. Unless the operating sytem's
> strnlen() is buggy? That seems like a stretch, so feel free to tell me
> what obvious stupid thing I did here and am not seeing...

I think the backup_target string was already corrupted that way when
pg_basebackup absorbed it from optarg.  It's pretty hard to believe that
the strchr/pnstrdup stanza got it wrong.  However, comparing the
TARGET_DETAIL to what the TAP test says it issued:

# Running: pg_basebackup --no-sync -cfast --target 
server:/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/tmp_test_Ag8r/backuponserver
 -X none
pg_basebackup: error: could not initiate base backup: ERROR:  unrecognized 
target: "server;C"

it's absolutely clear that something decided to munge the target string.
Given that colon is reserved in Windows filename syntax, it's not
so surprising if it munged it wrong, or at least more aggressively
than you expected.

I kinda wonder if this notation for the target was well-chosen.
Keeping the file name strictly separate from the "type" keyword
might be a wiser move.  Quite aside from Windows-isms, there
are going to be usages where this is hard to tell from a URL.
(If memory serves, double leading slash is significant to some
networked file systems.)

While we're on the subject of ill-chosen option syntax: "-cfast"
with non double dashes?  Really?  That's horribly ambiguous.
Most programs would parse something like that as five single-letter
options, and most users who know Unix would expect it to mean that.

regards, tom lane

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread Tom Lane

"David G. Johnston"  writes:
> On Fri, Jan 21, 2022 at 2:08 PM Andrew Dunstan  wrote:
>> I know what it's replacing refers to release 11, but let's stop doing
>> that. How about something like this?
>> 
>> Adding a new column can sometimes require rewriting the table,
>> making it a very slow operation. However in many cases this rewrite
>> and related verification scans can be optimized away by using an
>> appropriate default value. See the notes in ALTER
>> TABLE for details.

> I think it is a virtue, and am supported in that feeling by the existing
> wording, to be explicit about the release before which these optimizations
> can not happen.  The docs generally use this to good effect without
> overdoing it.  This is a prime example.

The fact of the matter is that optimizations of this sort have existed
for years.  (For example, I think we've optimized away the rewrite
when the new column is DEFAULT NULL since the very beginning.)  So it
does not help to write the text as if there were no such optimizations
before version N and they were all there in N.

I agree that Andrew's text could stand a pass of "omit needless words".
But I also think that we could be a bit more explicit about what "slow"
means.  Maybe like

Adding a new column can require rewriting the whole table,
making it slow for large tables.  However the rewrite can be optimized
away in some cases, depending on what default value is given to the
column.  See ALTER TABLE for details.

(the ALTER TABLE reference should be a link, too)

regards, tom lane

fairywren is generating bogus BASE_BACKUP commands

2022-01-21 Thread Robert Haas

Thomas Munro pointed out this failure to me on fairywren:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2022-01-21%2020%3A10%3A22

He theorizes that I need some perl2host magic in there, which may well
be true. But I also noticed this:

# Running: pg_basebackup --no-sync -cfast --target
server:/home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/tmp_test_Ag8r/backuponserver
-X none
pg_basebackup: error: could not initiate base backup: ERROR:
unrecognized target: "server;C"

"server" is a valid backup target, but "server;C" is not. And I think
this must be a bug on the client side, because the server logs the
generated query:

2022-01-21 20:53:11.618 UTC [8404:10] 010_pg_basebackup.pl LOG:
received replication command: BASE_BACKUP ( LABEL 'pg_basebackup base
backup',  PROGRESS,  CHECKPOINT 'fast',  MANIFEST 'yes',
TABLESPACE_MAP,  TARGET 'server;C',  TARGET_DETAIL
'\\tools\\msys64\\home\\pgrunner\\bf\\root\\HEAD\\pgsql.build\\src\\bin\\pg_basebackup\\tmp_check\\tmp_test_Ag8r\\backuponserver')

So it's not that the server parses the query and introduces gibberish
-- the client has already introduced gibberish when constructing the
query. But the code to do that is pretty straightforward -- we just
call strchr to find the colon in the backup target, and then
pnstrdup() the part before the colon and use the latter part as-is. If
pnstrdup were failing to add a terminating \0 then this would be quite
plausible, but I think it shouldn't. Unless the operating sytem's
strnlen() is buggy? That seems like a stretch, so feel free to tell me
what obvious stupid thing I did here and am not seeing...

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread David G. Johnston

On Fri, Jan 21, 2022 at 2:08 PM Andrew Dunstan  wrote:

> On 1/21/22 13:55, James Coleman wrote:
>
> + Before PostgreSQL 11, adding a new
> column to a
> + table required rewriting that table, making it a very slow operation.
> + More recent versions can sometimes optimize away this rewrite and
> related
> + validation scans.  See the notes in ALTER TABLE
> for details.
>
>
> I know what it's replacing refers to release 11, but let's stop doing
> that. How about something like this?
>
> Adding a new column can sometimes require rewriting the table,
> making it a very slow operation. However in many cases this rewrite
> and related verification scans can be optimized away by using an
> appropriate default value. See the notes in ALTER
> TABLE for details.
>

I think it is a virtue, and am supported in that feeling by the existing
wording, to be explicit about the release before which these optimizations
can not happen.  The docs generally use this to good effect without
overdoing it.  This is a prime example.

The combined effect of "sometimes", "in many", "can be", and "an
appropriate" make this version harder to read than it probably needs to
be.  I like the patch as-is over this; but I would want to give an
alternative wording more thought if it is insisted upon that mention of
PostgreSQL 11 goes away.

David J.

Re: A test for replay of regression tests

2022-01-21 Thread Andrew Dunstan



On 1/21/22 13:58, Thomas Munro wrote:
> On Fri, Jan 21, 2022 at 3:42 PM Thomas Munro  wrote:
>> Thanks.  I added that and pushed.  Let's see if fairywren likes it
>> when it comes back online.
> A watched pot never boils, but I wonder why Andrew's 4 Windows
> configurations jacana, bowerbird, fairywren and drongo have stopped
> returning results.



I think I have unstuck both machines. I will keep an eye on them and
make sure they don't get stuck again.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread Andrew Dunstan

On 1/21/22 13:55, James Coleman wrote:
> On Thu, Jan 20, 2022 at 3:43 PM James Coleman  wrote:
>> As noted earlier I expect to be posting an updated patch soon.
> Here's the updated series. In 0001 I've moved the documentation tweak
> into the ALTER TABLE notes section. In 0002 I've taken David J's
> suggestion of shortening the "Tip" on the DDL page and mostly using it
> to point people to the Notes section on the ALTER TABLE page.

I don't really like the first part of patch 1, but as it gets removed by
patch 2 we can move past that.

+ Before PostgreSQL 11, adding a new
column to a
+ table required rewriting that table, making it a very slow operation.
+ More recent versions can sometimes optimize away this rewrite and
related
+ validation scans.  See the notes in ALTER TABLE
for details.

I know what it's replacing refers to release 11, but let's stop doing
that. How about something like this?

Adding a new column can sometimes require rewriting the table,
making it a very slow operation. However in many cases this rewrite
and related verification scans can be optimized away by using an
appropriate default value. See the notes in ALTER
TABLE for details.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Parallelize correlated subqueries that execute within each worker

2022-01-21 Thread Tom Lane

Robert Haas  writes:
> I don't think there's an intrinsic problem with the idea of making a
> tentative determination about parallel safety and then refining it
> later, but I'm not sure why you think it would be a lot of work to
> figure this out at the point where we generate gather paths. I think
> it's just a matter of testing whether the set of parameters that the
> path needs as input is the empty set. It may be that neither extParam
> nor allParam are precisely that thing, but I think both are very
> close, and it seems to me that there's no theoretical reason why we
> can't know for every path the set of inputs that it requires "from the
> outside."

I'd be very happy if someone redesigned the extParam/allParam mechanism,
or at least documented it better.  It's confusing and I've never been
able to escape the feeling that it's somewhat redundant.

The real problem with it though is that we don't compute those values
until much too late to be useful in path construction; see comments
for SS_identify_outer_params.  To be helpful to the planner, we'd have
to rejigger things at least enough to calculate them earlier -- or
maybe better, calculate what the planner wants earlier, and then transform
to what the executor wants later.

regards, tom lane

Re: How to get started with contribution

2022-01-21 Thread Stephen Frost

Greetings,

* Tomas Vondra (tomas.von...@enterprisedb.com) wrote:
> On 1/21/22 21:28, Stephen Frost wrote:
> >* vrund v shah (vrund3...@gmail.com) wrote:
> >>I am Vrund V Shah, a computer science undergrad. I have just completed 
> >> my
> >>3^rd semester at G H Patel College of Engineering & Technology. I am new
> >>to open source contribution but I am well aware of C/C++, SQL and I will
> >>learn Python before the end of the first week of February. I would love 
> >> to
> >>contribute to your organization but don’t know how!!
> >>
> >>Could you please guide me on how and from where to start?
> >
> >I'd suggest you start with patch reviews if you're interested in working
> >on the core PostgreSQL server code.  Information on that is available
> >here:
> >
> >https://wiki.postgresql.org/wiki/Reviewing_a_Patch
> >
> 
> Yeah, that's what I recommend people who ask me this question.
> 
> However, that wiki page is more about the process than about "what" to do,
> so my advice to the OP would be to first go to the current CF [1] and look
> for patches that would be genuinely useful for him/her (e.g. because of
> work). And do the review by following the wiki page.

Yeah.  There's also this:

https://wiki.postgresql.org/wiki/Developer_FAQ

where the first topic is about getting involved in PG development, and
there's:

https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F

which covers a bit more about mailing lists and such.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-01-21 Thread Peter Geoghegan

On Fri, Jan 21, 2022 at 12:07 PM Greg Stark  wrote:
> This confuses me. "Transactions per second" is a headline database
> metric that lots of users actually focus on quite heavily -- rather
> too heavily imho.

But transactions per second is for the whole database, not for
individual tables. It's also really a benchmarking thing, where the
size and variety of transactions is fixed. With something like pgbench
it actually is exactly the same thing, but such a workload is not at
all realistic.  Even BenchmarkSQL/TPC-C isn't like that, despite the
fact that it is a fairly synthetic workload (it's just not super
synthetic).

> Ok, XID consumption is only a subset of transactions
> that are not read-only but that's a detail that's pretty easy to
> explain and users get pretty quickly.

My point was mostly this: the number of distinct extant unfrozen tuple
headers (and the range of the relevant XIDs) is generally highly
unpredictable today. And the number of tuples we'll have to freeze to
be able to advance relfrozenxid by a good amount is quite variable, in
general.

For example, if we bulk extend a relation as part of an ETL process,
then the number of distinct XIDs could be as low as 1, even though we
can expect a great deal of "freeze debt" that will have to be paid off
at some point (with the current design, in the common case where the
user doesn't account for this effect because they're not already an
expert). There are other common cases that are not quite as extreme as
that, that still have the same effect -- even an expert will find it
hard or impossible to tune autovacuum_freeze_min_age for that.

Another case of interest (that illustrates the general principle) is
something like pgbench_tellers. We'll never have an aggressive VACUUM
of the table with the patch, and we shouldn't ever need to freeze any
tuples. But, owing to workload characteristics, we'll constantly be
able to keep its relfrozenxid very current, because (even if we
introduce skew) each individual row cannot go very long without being
updated, allowing old XIDs to age out that way.

There is also an interesting middle ground, where you get a mixture of
both tendencies due to skew. The tuple that's most likely to get
updated was the one that was just updated. How are you as a DBA ever
supposed to tune autovacuum_freeze_min_age if tuples happen to be
qualitatively different in this way?

> What I find confuses people much more is the concept of the
> oldestxmin. I think most of the autovacuum problems I've seen come
> from cases where autovacuum is happily kicking off useless vacuums
> because the oldestxmin hasn't actually advanced enough for them to do
> any useful work.

As it happens, the proposed log output won't use the term oldestxmin
anymore -- I think that it makes sense to rename it to "removable
cutoff". Here's an example:

LOG:  automatic vacuum of table "regression.public.bmsql_oorder": index scans: 1
pages: 0 removed, 317308 remain, 250258 skipped using visibility map
(78.87% of total)
tuples: 70 removed, 34105925 remain (6830471 newly frozen), 2528 are
dead but not yet removable
removable cutoff: 37574752, which is 230115 xids behind next
new relfrozenxid: 35221275, which is 5219310 xids ahead of previous value
index scan needed: 55540 pages from table (17.50% of total) had
3339809 dead item identifiers removed
index "bmsql_oorder_pkey": pages: 144257 in total, 0 newly deleted, 0
currently deleted, 0 reusable
index "bmsql_oorder_idx2": pages: 330083 in total, 0 newly deleted, 0
currently deleted, 0 reusable
I/O timings: read: 7928.207 ms, write: 1386.662 ms
avg read rate: 33.107 MB/s, avg write rate: 26.218 MB/s
buffer usage: 220825 hits, 443331 misses, 351084 dirtied
WAL usage: 576110 records, 364797 full page images, 2046767817 bytes
system usage: CPU: user: 10.62 s, system: 7.56 s, elapsed: 104.61 s

Note also that I deliberately made the "new relfrozenxid" line that
immediately follows (information that we haven't shown before now)
similar, to highlight that they're now closely related concepts. Now
if you VACUUM a table that is either empty or has only frozen tuples,
VACUUM will set relfrozenxid to oldestxmin/removable cutoff.
Internally, oldestxmin is the "starting point" for our final/target
relfrozenxid for the table. We ratchet it back dynamically, whenever
we see an older-than-current-target XID that cannot be immediately
frozen (e.g., when we can't easily get a cleanup lock on the page).

-- 
Peter Geoghegan

Re: How to get started with contribution

2022-01-21 Thread Tomas Vondra


On 1/21/22 21:28, Stephen Frost wrote:

Greetings,

* vrund v shah (vrund3...@gmail.com) wrote:

I am Vrund V Shah, a computer science undergrad. I have just completed my
3^rd semester at G H Patel College of Engineering & Technology. I am new
to open source contribution but I am well aware of C/C++, SQL and I will
learn Python before the end of the first week of February. I would love to
contribute to your organization but don’t know how!!

Could you please guide me on how and from where to start?


I'd suggest you start with patch reviews if you're interested in working
on the core PostgreSQL server code.  Information on that is available
here:

https://wiki.postgresql.org/wiki/Reviewing_a_Patch



Yeah, that's what I recommend people who ask me this question.

However, that wiki page is more about the process than about "what" to 
do, so my advice to the OP would be to first go to the current CF [1] 
and look for patches that would be genuinely useful for him/her (e.g. 
because of work). And do the review by following the wiki page.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: GROUP BY optimization

2022-01-21 Thread Tomas Vondra




On 1/21/22 12:09, Andrey Lepikhov wrote:

On 7/22/21 3:58 AM, Tomas Vondra wrote:

4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to
estimate_num_groups_incremental. In principle yes, if we use "group
size" from the previous step, then the returned value is the number of
new groups after adding the "new" pathkey.
But even if we ignore the issues with amplification mentioned in (3),
there's an issue with non-linear behavior in estimate_num_groups,
because at some point it's calculating

    D(N,n,p) = n * (1 - ((N-p)/N)^(N/n))

where N - total rows, p - sample size, n - number of distinct values.
And if we have (N1,n1) and (N2,n2) then the ratio of calculated
estimated (which is pretty much what calculating group size does)

    D(N2,n2,p2) / D(N1,n1,p1)

which will differ depending on p1 and p2. And if we're tweaking the
tuplesPerPrevGroup all the time, that's really annoying, as it may make
the groups smaller or larger, which is unpredictable and annoying, and I
wonder if it might go against the idea of penalizing tuplesPerPrevGroup
to some extent.
We could simply use the input "tuples" value here, and then divide the
current and previous estimate to calculate the number of new groups.

>
tuplesPerPrevGroup is only a top boundary for the estimation. I think, 
we should use result of previous estimation as a limit for the next 
during incremental estimation. Maybe we simply limit the 
tuplesPerPrevGroup value to ensure of monotonic non-growth of this 
value? - see in attachment patch to previous fixes.




Yes, I think that's a reasonable defense - it makes no sense to exceed 
the group size in the preceding step, which could happen with small 
number of groups or some unexpected ndistinct estimate.


The other thing we could do is reduce the coefficient gradually - so 
it'd be 1.5 for the first pathkey, then 1.25 for the next one, and so 
on. But it seems somewhat arbitrary (I certainly don't have some sound 
theoretical justification ...).


I've merged most of the fixes you reported. I've skipped the path_save 
removal in planner.c, because that seems incorrect - if there are 
multiple pathkeys, we must start with the original path, not the 
modified one we built in the last iteration. Or am I missing something?


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 590589f4388499d41d2c1bb42678b794c044d045 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 21 Jan 2022 20:22:14 +0100
Subject: [PATCH v20220121] GROUP BY reordering

---
 .../postgres_fdw/expected/postgres_fdw.out|  15 +-
 src/backend/optimizer/path/costsize.c | 369 +-
 src/backend/optimizer/path/equivclass.c   |  13 +-
 src/backend/optimizer/path/pathkeys.c | 580 
 src/backend/optimizer/plan/planner.c  | 653 ++
 src/backend/optimizer/util/pathnode.c |   2 +-
 src/backend/utils/adt/selfuncs.c  |  37 +-
 src/backend/utils/misc/guc.c  |  29 +
 src/include/nodes/nodes.h |   1 +
 src/include/nodes/pathnodes.h |  10 +
 src/include/optimizer/cost.h  |   4 +-
 src/include/optimizer/paths.h |  11 +
 src/include/utils/selfuncs.h  |   5 +
 src/test/regress/expected/aggregates.out  | 244 ++-
 .../regress/expected/incremental_sort.out |   2 +-
 src/test/regress/expected/join.out|  51 +-
 .../regress/expected/partition_aggregate.out  | 136 ++--
 src/test/regress/expected/partition_join.out  |  75 +-
 src/test/regress/expected/union.out   |  60 +-
 src/test/regress/sql/aggregates.sql   |  99 +++
 src/test/regress/sql/incremental_sort.sql |   2 +-
 21 files changed, 1904 insertions(+), 494 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 7d6f7d9e3df..9926b6bba07 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2741,16 +2741,13 @@ select c2 * (random() <= 1)::int as c2 from ft2 group by c2 * (random() <= 1)::i
 -- GROUP BY clause in various forms, cardinal, alias and constant expression
 explain (verbose, costs off)
 select count(c2) w, c2 x, 5 y, 7.0 z from ft1 group by 2, y, 9.0::int order by 2;
-  QUERY PLAN   

- Sort
+ QUERY PLAN 
+
+ Foreign Scan
Output: (count(c2)), c2, 5, 7.0, 9
-   Sort Key: ft1.c2
-   ->  Foreign Scan
- Output: (count(c2)), c2, 5, 7.0, 9
- Relations: Aggregate on (public.ft1)
- Remote SQL

Re: How to get started with contribution

2022-01-21 Thread Stephen Frost

Greetings,

* vrund v shah (vrund3...@gmail.com) wrote:
>I am Vrund V Shah, a computer science undergrad. I have just completed my
>3^rd semester at G H Patel College of Engineering & Technology. I am new
>to open source contribution but I am well aware of C/C++, SQL and I will
>learn Python before the end of the first week of February. I would love to
>contribute to your organization but don’t know how!!
> 
>Could you please guide me on how and from where to start?

I'd suggest you start with patch reviews if you're interested in working
on the core PostgreSQL server code.  Information on that is available
here:

https://wiki.postgresql.org/wiki/Reviewing_a_Patch

Thanks,

Stephen


signature.asc
Description: PGP signature

How to get started with contribution

2022-01-21 Thread vrund v shah

Respected sir\mamI am Vrund V Shah, a computer science undergrad. I have just completed my 3rd semester at G H Patel College of Engineering & Technology. I am new to open source contribution but I am well aware of C/C++, SQL and I will learn Python before the end of the first week of February. I would love to contribute to your organization but don’t know how!!Could you please guide me on how and from where to start?Hope to hear from you soon Regards Vrund V Shah  Sent from Mail for Windows

Re: Parallelize correlated subqueries that execute within each worker

2022-01-21 Thread Robert Haas

On Fri, Jan 14, 2022 at 2:25 PM James Coleman  wrote:
> I've been chewing on this a bit, and I was about to go re-read the
> code and see how easy it'd be to do exactly what you're suggesting in
> generate_gather_paths() (and verifying it doesn't need to happen in
> other places). However there's one (I think large) gotcha with that
> approach (assuming it otherwise makes sense): it means we do
> unnecessary work. In the current patch series we only need to recheck
> parallel safety if we're in a situation where we might actually
> benefit from doing that work (namely when we have a correlated
> subquery we might otherwise be able to execute in a parallel plan). If
> we don't track that status we'd have to recheck the full parallel
> safety of the path for all paths -- even without correlated
> subqueries.

I don't think there's an intrinsic problem with the idea of making a
tentative determination about parallel safety and then refining it
later, but I'm not sure why you think it would be a lot of work to
figure this out at the point where we generate gather paths. I think
it's just a matter of testing whether the set of parameters that the
path needs as input is the empty set. It may be that neither extParam
nor allParam are precisely that thing, but I think both are very
close, and it seems to me that there's no theoretical reason why we
can't know for every path the set of inputs that it requires "from the
outside."

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: A test for replay of regression tests

2022-01-21 Thread Thomas Munro

On Sat, Jan 22, 2022 at 8:48 AM Andres Freund  wrote:
> Unfortunately we don't quite seem there yet:
>
> I saw a couple failures like:
> https://api.cirrus-ci.com/v1/artifact/task/5394938773897216/regress_diffs/build/testrun/recovery/t/027_stream_regress/regression.diffs
> (from https://cirrus-ci.com/task/5394938773897216?logs=check_world#L183 )
>
>  -- Should succeed
>  DROP TABLESPACE regress_tblspace_renamed;
> +ERROR:  tablespace "regress_tblspace_renamed" is not empty
>
>
> I assume the reason we see this semi-regularly when the regression tests run
> as part of 027_stream_regress, but not in the main regression test run, is
> similar to the reloptions problem, namely that we run with a much smaller
> shared buffers.
>
> I assume what happens is that this just makes the known problem of bgwriter or
> some other process keeping open a filehandle to an already deleted relation,
> preventing the deletion to "fully" take effect, worse.

Right, I assume this would be fixed by [1].  I need to re-convince
myself of that patch's correctness and make some changes after
Robert's feedback; I'll look into committing it next week.  From a
certain point of view it's now quite good that we hit this case
occasionally in CI.

[1] https://commitfest.postgresql.org/36/2962/

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-01-21 Thread Greg Stark

On Thu, 20 Jan 2022 at 17:01, Peter Geoghegan  wrote:
>
> Then there's the fact that you
> really cannot think about the rate of XID consumption intuitively --
> it has at best a weak, unpredictable relationship with anything that
> users can understand, such as data stored or wall clock time.

This confuses me. "Transactions per second" is a headline database
metric that lots of users actually focus on quite heavily -- rather
too heavily imho. Ok, XID consumption is only a subset of transactions
that are not read-only but that's a detail that's pretty easy to
explain and users get pretty quickly.

There are corner cases like transactions that look read-only but are
actually read-write or transactions that consume multiple xids but
complex systems are full of corner cases and people don't seem too
surprised about these things.

What I find confuses people much more is the concept of the
oldestxmin. I think most of the autovacuum problems I've seen come
from cases where autovacuum is happily kicking off useless vacuums
because the oldestxmin hasn't actually advanced enough for them to do
any useful work.

-- 
greg

pg_basebackup fsyncs some files despite --no-sync (was: Adding CI to our tree)

2022-01-21 Thread Andres Freund

Hi,

On 2022-01-18 20:16:46 -0800, Andres Freund wrote:
> I noticed a few other sources of "unnecessary" fsyncs.  The most frequent
> being the durable_rename() of backup_manifest in pg_basebackup.c. Manifests 
> are
> surprisingly large, 135k for a freshly initdb'd cluster.

Robert, I assume the fsync for manifests isn't ignoring --no-sync for a
particular reason?

The attached patch adds no-sync handling to the manifest rename, as well as
one case in the directory wal method.


It's a bit painful that we have to have code like

if (dir_data->sync)
r = durable_rename(tmppath, tmppath2);
else
{
if (rename(tmppath, tmppath2) != 0)
{
pg_log_error("could not rename file 
\"%s\" to \"%s\": %m",
 tmppath, 
tmppath2);
r = -1;
}
}

It seems like it'd be better to set it up so that durable_rename() could
decide internally wether to fsync, or have a wrapper around durable_rename()?


> There's an fsync in walmethods.c:tar_close() that sounds intentional, but I
> don't really understand what the comment:
> 
>   /* Always fsync on close, so the padding gets fsynced */
>   if (tar_sync(f) < 0)

tar_sync() actually checks for tar_data->sync, so it doesn't do an
fsync. Arguably the comment is a bit confusing, but ...

Greetings,

Andres Freund
>From 36772524734f9f613edd6622d840e48f045f04d8 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 21 Jan 2022 10:30:37 -0800
Subject: [PATCH v1] pg_basebackup: Skip a few more fsyncs if --no-sync is
 specified.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/20220119041646.rhuo3youiqxqj...@alap3.anarazel.de
Backpatch:
---
 src/bin/pg_basebackup/pg_basebackup.c | 18 +++---
 src/bin/pg_basebackup/walmethods.c| 12 +++-
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index d5b0ade10d5..a5cca388845 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -2196,9 +2196,21 @@ BaseBackup(void)
 		snprintf(tmp_filename, MAXPGPATH, "%s/backup_manifest.tmp", basedir);
 		snprintf(filename, MAXPGPATH, "%s/backup_manifest", basedir);
 
-		/* durable_rename emits its own log message in case of failure */
-		if (durable_rename(tmp_filename, filename) != 0)
-			exit(1);
+		if (do_sync)
+		{
+			/* durable_rename emits its own log message in case of failure */
+			if (durable_rename(tmp_filename, filename) != 0)
+exit(1);
+		}
+		else
+		{
+			if (rename(tmp_filename, filename) != 0)
+			{
+pg_log_error("could not rename file \"%s\" to \"%s\": %m",
+			 tmp_filename, filename);
+exit(1);
+			}
+		}
 	}
 
 	if (verbose)
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index f74bd13315c..a6d08c1270a 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -445,7 +445,17 @@ dir_close(Walfile f, WalCloseMethod method)
 			snprintf(tmppath2, sizeof(tmppath2), "%s/%s",
 	 dir_data->basedir, filename2);
 			pg_free(filename2);
-			r = durable_rename(tmppath, tmppath2);
+			if (dir_data->sync)
+r = durable_rename(tmppath, tmppath2);
+			else
+			{
+if (rename(tmppath, tmppath2) != 0)
+{
+	pg_log_error("could not rename file \"%s\" to \"%s\": %m",
+ tmppath, tmppath2);
+	r = -1;
+}
+			}
 		}
 		else if (method == CLOSE_UNLINK)
 		{
-- 
2.34.0

Re: Push down time-related SQLValue functions to foreign server

2022-01-21 Thread Tom Lane

Alexander Pyhalov  writes:
> So far I have the following prototype. It seems to be working, but I 
> think it can be enhanced.
> At least, some sort of caching seems to be necessary for 
> is_stable_expr().

Yeah, from a performance standpoint this seems pretty horrid ---
it's probably exponential in the size of the expressions considered
because of the repeated recursions.

The approach I had in mind was to extend the responsibility of
foreign_expr_walker to make it deduce the classification of
an expression in a bottom-up fashion.  Also, as I noted before,
we don't want to re-deduce that in a later deparse pass, so
maybe we should just go ahead and deparse during foreign_expr_walker.
Not sure about that part.  It sounds expensive; but recording the
classification results in a way we could re-use later doesn't seem
too cheap either.

BTW, the patch is just about unreadable as-is.  It would have been
better to not have re-indented the bulk of foreign_expr_walker,
leaving that for some later pgindent pass.  But that may be moot,
because I don't think we can tolerate the double-recursion approach
you took here.

regards, tom lane

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-01-21 Thread Andres Freund

On 2022-01-20 20:41:16 +, Bossart, Nathan wrote:
> Here's this part.

And pushed to all branches. Thanks.

Re: A test for replay of regression tests

2022-01-21 Thread Andres Freund

Hi,

On 2022-01-17 17:25:19 +1300, Thomas Munro wrote:
> I reordered the arguments, tested locally under the buildfarm client script,
> and pushed.  I'll keep an eye on the build farm.

After the reloptions fix the tests seem much more likely to succeed than
before. Progress!

Unfortunately we don't quite seem there yet:

I saw a couple failures like:
https://api.cirrus-ci.com/v1/artifact/task/5394938773897216/regress_diffs/build/testrun/recovery/t/027_stream_regress/regression.diffs
(from https://cirrus-ci.com/task/5394938773897216?logs=check_world#L183 )

 -- Should succeed
 DROP TABLESPACE regress_tblspace_renamed;
+ERROR:  tablespace "regress_tblspace_renamed" is not empty

I assume the reason we see this semi-regularly when the regression tests run
as part of 027_stream_regress, but not in the main regression test run, is
similar to the reloptions problem, namely that we run with a much smaller
shared buffers.

I assume what happens is that this just makes the known problem of bgwriter or
some other process keeping open a filehandle to an already deleted relation,
preventing the deletion to "fully" take effect, worse.

Greetings,

Andres Freund

Re: refactoring basebackup.c

2022-01-21 Thread Robert Haas

On Thu, Jan 20, 2022 at 11:10 AM Robert Haas  wrote:
> On Thu, Jan 20, 2022 at 8:00 AM Dipesh Pandit  wrote:
> > Thanks for the feedback, I have incorporated the suggestions and
> > updated a new patch v2.
>
> Cool. I'll do a detailed review later, but I think this is going in a
> good direction.

Here is a more detailed review.

+if (inflateInit2(zs, 15 + 16) != Z_OK)
+{
+pg_log_error("could not initialize compression library");
+exit(1);
+
+}

Extra blank line.

+/* At present, we only know how to parse tar and gzip archives. */

gzip -> tar.gz. You can gzip something that is not a tar.

+ * Extract the gzip compressed archive using a gzip extractor and then
+ * forward it to next streamer.

This comment is not good. First, we're not necessarily doing it.
Second, it just describes what the code does, not why it does it.
Maybe something like "If the user requested both that the server
compress the backup and also that we extract the backup, we need to
decompress it."

+if (server_compression != NULL)
+{
+if (strcmp(server_compression, "gzip") == 0)
+server_compression_type = BACKUP_COMPRESSION_GZIP;
+else if (strlen(server_compression) == 5 &&
+strncmp(server_compression, "gzip", 4) == 0 &&
+server_compression[4] >= '1' && server_compression[4] <= '9')
+{
+server_compression_type = BACKUP_COMPRESSION_GZIP;
+server_compression_level = server_compression[4] - '0';
+}
+}
+else
+server_compression_type = BACKUP_COMPRESSION_NONE;

I think this is not required any more. I think probably some other
things need to be adjusted as well, based on Michael's changes and the
updates in my patch to match.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Replace uses of deprecated Python module distutils.sysconfig

2022-01-21 Thread Tom Lane

I wrote:
> The early returns are not great: we have about half a dozen machines
> so far that are finding python3, and reporting sane-looking Python
> include paths, but not finding Python.h.  They're all Linux-oid
> machines, so I suppose what is going on is that they have the base
> python3 package installed but not python3-dev or local equivalent.

> I want to leave that patch in place long enough so we can get a
> fairly full survey of which machines are OK and which are not,
> but I suppose I'll have to revert it tomorrow or so.  We did
> promise the owners a month to adjust their configurations.

I have now reverted that patch, but I think this was a highly
worthwhile bit of reconnaissance.  It identified 18 animals
that had incomplete python3 installations (versus only 13
that definitely or possibly lack python3 altogether).  Their
owners most likely thought they were already good to go for the
changeover, so without this experiment we'd have had a whole lot
of buildfarm red when the real change is made.

I've notified the owners of these results.

regards, tom lane

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread David G. Johnston

On Fri, Jan 21, 2022 at 11:55 AM James Coleman  wrote:

> On Thu, Jan 20, 2022 at 3:43 PM James Coleman  wrote:
> >
> > As noted earlier I expect to be posting an updated patch soon.
>
> Here's the updated series. In 0001 I've moved the documentation tweak
> into the ALTER TABLE notes section. In 0002 I've taken David J's
> suggestion of shortening the "Tip" on the DDL page and mostly using it
> to point people to the Notes section on the ALTER TABLE page.
>
>
WFM

David J.

Re: A test for replay of regression tests

2022-01-21 Thread Thomas Munro

On Fri, Jan 21, 2022 at 3:42 PM Thomas Munro  wrote:
> Thanks.  I added that and pushed.  Let's see if fairywren likes it
> when it comes back online.

A watched pot never boils, but I wonder why Andrew's 4 Windows
configurations jacana, bowerbird, fairywren and drongo have stopped
returning results.

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-21 Thread Tom Lane

Robert Haas  writes:
> It seems to me that what this comment is saying is that OIDs in the
> second and third categories are doled out by counters. Therefore, we
> can't know which of those OIDs will get used, or how many of them will
> get used, or which objects will get which OIDs. Therefore, I think we
> should go back to the approach that you were using for template0 and
> handle both that database and postgres using that method. That is,
> assign an OID manually, and make sure unused_oids knows that it should
> be counted as already used.

Indeed.  If you're going to manually assign OIDs to these databases,
do it honestly, and put them into the range intended for that purpose.
Trying to take short-cuts is just going to cause trouble down the road.

regards, tom lane

Re: Document atthasmissing default optimization avoids verification table scan

2022-01-21 Thread James Coleman

On Thu, Jan 20, 2022 at 3:43 PM James Coleman  wrote:
>
> As noted earlier I expect to be posting an updated patch soon.

Here's the updated series. In 0001 I've moved the documentation tweak
into the ALTER TABLE notes section. In 0002 I've taken David J's
suggestion of shortening the "Tip" on the DDL page and mostly using it
to point people to the Notes section on the ALTER TABLE page.

Thanks,
James Coleman

v2-0001-Document-atthasmissing-default-avoids-verificatio.patch
Description: Binary data

v2-0002-Don-t-double-document-ADD-COLUMN-optimization-det.patch
Description: Binary data

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-21 Thread Robert Haas

On Fri, Jan 21, 2022 at 8:40 AM Shruthi Gowda  wrote:
> From what I see in the code, template0 and postgres are the last
> things that get created in initdb phase. The system OIDs that get
> assigned to these DBs vary from release to release. At present, the
> system assigned OIDs of template0 and postgres are 13679 and 13680
> respectively. I feel it would be safe to assign 16000 and 16001 for
> template0 and postgres respectively from the unpinned object OID range
> 12000 - 16383. In the future, even if the initdb unpinned objects
> reach the range of 16000 issues can only arise if initdb() creates
> another system-created database for which the system assigns these
> reserved OIDs (16000, 16001).

It doesn't seem safe to me to rely on that. We don't know what could
happen in the future if the number of built-in objects increases.
Looking at the lengthy comment on this topic in transam.h, I see that
there are three ranges:

1- manually assigned OIDs
1-11999 OIDs assigned by genbki.pl
12000-16384 OIDs assigned to unpinned objects post-bootstrap

It seems to me that what this comment is saying is that OIDs in the
second and third categories are doled out by counters. Therefore, we
can't know which of those OIDs will get used, or how many of them will
get used, or which objects will get which OIDs. Therefore, I think we
should go back to the approach that you were using for template0 and
handle both that database and postgres using that method. That is,
assign an OID manually, and make sure unused_oids knows that it should
be counted as already used.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: New developer papercut - Makefile references INSTALL

2022-01-21 Thread Tom Lane

Robert Haas  writes:
> On Fri, Jan 21, 2022 at 12:19 PM Tom Lane  wrote:
>> I'm not convinced by this argument.  In the first place, the INSTALL
>> file isn't doing any harm.  I don't know that I'd bother to build the
>> infrastructure for it today, but we already have that infrastructure
>> and it's not causing us any particular maintenance burden.

> I think it *is* doing harm. It confuses people. We get semi-regular
> threads on the list like this one where people are confused by the
> file not being there,

That's not the fault of INSTALL, that's the fault of the README files,
which I think we're agreed we can fix.  (Or, if you suppose that they
came to the code with some previous expectation that there'd be an
INSTALL file, taking it away is certainly not going to improve matters.)

> but I can't remember ever seeing a thread where
> someone said that it was great, or said that they thought it needed
> improvement, or said that they used it and then something interesting
> happened afterward, or anything like that.

It's just another copy of the same documentation, so I can't really
imagine a situation where someone would feel a need to mention it
specifically.

regards, tom lane

Re: refactoring basebackup.c

2022-01-21 Thread Robert Haas

On Wed, Jan 19, 2022 at 4:26 PM Robert Haas  wrote:
> I spent some time thinking about test coverage for the server-side
> backup code today and came up with the attached (v12-0003).

I committed the base backup target patch yesterday, and today I
updated the remaining code in light of Michael Paquier's commit
5c649fe153367cdab278738ee4aebbfd158e0546. Here is the resulting patch.

Michael, I am proposing to that we remove this message as part of this commit:

-pg_log_info("no value specified for compression
level, switching to default");

I think most people won't want to specify a compression level, so
emitting a message when they don't seems too verbose.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v13-0001-Server-side-gzip-compression.patch
Description: Binary data

Re: document the need to analyze partitioned tables

2022-01-21 Thread Tomas Vondra


On 1/21/22 19:02, Justin Pryzby wrote:

Thanks for looking at this

On Fri, Jan 21, 2022 at 06:21:57PM +0100, Tomas Vondra wrote:

Hi,

On 10/8/21 14:58, Justin Pryzby wrote:

Cleaned up and attached as a .patch.

The patch implementing autoanalyze on partitioned tables should
revert relevant portions of this patch.


I went through this patch and I'd like to propose a couple changes, per the
0002 patch:

1) I've reworded the changes in maintenance.sgml a bit. It sounded a bit
strange before, but I'm not a native speaker so maybe it's worse ...


+ autoanalyze on the parent table.  If your queries require statistics on
+ parent relations for proper planning, it's necessary to periodically run

You added two references to "relations", but everything else talks about
"tables", which is all that analyze processes.



Good point, that should use "tables" too.


2) Remove unnecessary whitespace changes in perform.sgml.


Those were a note to myself and to any reviewer - should that be updated too ?



Ah, I see. I don't think that part needs updating - it talks about 
having to analyze after a bulk load, and that applies to all tables 
anyway. I don't think it needs to mention partitioned tables need an 
analyze too.



3) Simplify the analyze.sgml changes a bit - it was trying to cram too much
stuff into a single paragraph, so I split that.

Does that seem OK, or did omit something important?


+If the table being analyzed has one or more children,

I think you're referring to both legacy inheritance and and partitioning.  That
should be more clear.



I think it applies to both types of partitioning - it's just that in the 
declarative partitioning case the table is always empty so no stats with 
inherit=false are built.



+ANALYZE gathers two sets of statistics: once on the rows
+of the parent table only, and a second one including rows of both the 
parent
+table and all child relations.  This second set of statistics is needed 
when

I think should say ".. and all of its children".



OK


FWIW I think it's really confusing we have inheritance and partitioning, and
partitions and child tables. And sometimes we use partitioning in the
generic sense (i.e. including the inheritance approach), and sometimes only
the declarative variant. Same for partitions vs child tables. I can't even
imagine how confusing this has to be for people just learning this stuff.
They must be in permanent WTF?! state ...


The docs were cleaned up some in 0c06534bd.  At least the word "partitioned"
should never be used for legacy inheritance - but "partitioning" is.



OK


--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom c5105f2f067dff553045792d4a171fc835fe2109 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 22 Jul 2021 16:06:18 -0500
Subject: [PATCH v3 1/2] documentation deficiencies for ANALYZE of partitioned
 tables

This is partially extracted from 1b5617eb844cd2470a334c1d2eec66cf9b39c41a,
which was reverted.
---
 doc/src/sgml/maintenance.sgml | 27 ++
 doc/src/sgml/perform.sgml |  2 ++
 doc/src/sgml/ref/analyze.sgml | 36 ++-
 3 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 36f975b1e5b..b7c806cc906 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -290,6 +290,14 @@
 to meaningful statistical changes.

 
+   
+Tuples changed in partitions and inheritance children do not count towards
+analyze on the parent table.  If the parent table is empty or rarely
+changed, it may never be processed by autovacuum.  It is necessary to
+periodically manual run ANALYZE on the parent table to
+keep the statistics of its table hierarchy up to date.
+   
+

 As with vacuuming for space recovery, frequent updates of statistics
 are more useful for heavily-updated tables than for seldom-updated
@@ -347,6 +355,18 @@
  ANALYZE commands on those tables on a suitable schedule.
 

+
+   
+
+ The autovacuum daemon does not issue ANALYZE commands for
+ partitioned tables.  Inheritance parents will only be analyzed if the
+ parent itself is changed - changes to child tables do not trigger
+ autoanalyze on the parent table.  It is necessary to periodically run a
+ manual ANALYZE to keep the statistics of the table
+ hierarchy up to date.
+
+   
+
   
 
   
@@ -819,6 +839,13 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
 since the last ANALYZE.

 
+   
+Partitioned tables are not processed by autovacuum.  Statistics
+should be collected by running a manual ANALYZE when it is
+first populated, and updated whenever the distribution of data in its
+partitions changes significantly.
+   
+

 Temporary tables cannot be accessed by autovacuum.  Therefore,

Re: document the need to analyze partitioned tables

2022-01-21 Thread Justin Pryzby

Thanks for looking at this

On Fri, Jan 21, 2022 at 06:21:57PM +0100, Tomas Vondra wrote:
> Hi,
> 
> On 10/8/21 14:58, Justin Pryzby wrote:
> > Cleaned up and attached as a .patch.
> > 
> > The patch implementing autoanalyze on partitioned tables should
> > revert relevant portions of this patch.
> 
> I went through this patch and I'd like to propose a couple changes, per the
> 0002 patch:
> 
> 1) I've reworded the changes in maintenance.sgml a bit. It sounded a bit
> strange before, but I'm not a native speaker so maybe it's worse ...

+ autoanalyze on the parent table.  If your queries require statistics on   

+ parent relations for proper planning, it's necessary to periodically run  


You added two references to "relations", but everything else talks about
"tables", which is all that analyze processes.

> 2) Remove unnecessary whitespace changes in perform.sgml.

Those were a note to myself and to any reviewer - should that be updated too ?

> 3) Simplify the analyze.sgml changes a bit - it was trying to cram too much
> stuff into a single paragraph, so I split that.
> 
> Does that seem OK, or did omit something important?

+If the table being analyzed has one or more children,

I think you're referring to both legacy inheritance and and partitioning.  That
should be more clear.

+ANALYZE gathers two sets of statistics: once on the rows
+of the parent table only, and a second one including rows of both the 
parent
+table and all child relations.  This second set of statistics is needed 
when

I think should say ".. and all of its children".

> FWIW I think it's really confusing we have inheritance and partitioning, and
> partitions and child tables. And sometimes we use partitioning in the
> generic sense (i.e. including the inheritance approach), and sometimes only
> the declarative variant. Same for partitions vs child tables. I can't even
> imagine how confusing this has to be for people just learning this stuff.
> They must be in permanent WTF?! state ...

The docs were cleaned up some in 0c06534bd.  At least the word "partitioned"
should never be used for legacy inheritance - but "partitioning" is.

-- 
Justin

Re: New developer papercut - Makefile references INSTALL

2022-01-21 Thread Robert Haas

On Fri, Jan 21, 2022 at 12:19 PM Tom Lane  wrote:
> I'm not convinced by this argument.  In the first place, the INSTALL
> file isn't doing any harm.  I don't know that I'd bother to build the
> infrastructure for it today, but we already have that infrastructure
> and it's not causing us any particular maintenance burden.

I think it *is* doing harm. It confuses people. We get semi-regular
threads on the list like this one where people are confused by the
file not being there, but I can't remember ever seeing a thread where
someone said that it was great, or said that they thought it needed
improvement, or said that they used it and then something interesting
happened afterward, or anything like that. AFAICR, the only threads on
the mailing list that mention the file at all are started by people
who were told to look there and couldn't find the file. Now we can
speculate that there is a far larger number of people who find the
file, love it, and have no problems with it or suggestions for
improvement or need to comment upon it in any way, and that's hard to
disprove. But I doubt it.

> In the
> second place, I think your argument is a bit backwards.  Sure, people
> who are relying on a git pull can be expected to have easy access to
> on-line docs; that's exactly why we aren't troubled by not providing
> ready-to-go INSTALL docs in that case.  But that doesn't follow for
> people who are using a tarball.  In particular, it might not be that
> easy to find on-line docs matching the specific tarball version they
> are working with.  (With the planned meson conversion, that's about to
> become a bigger deal than it's been in the recent past.)

I would guess that these days if you're brave enough to compile from
source, you are very, very likely to get that source from git rather
than a tarball. These days if you Google "[name of any piece of
software] source code" the first hit is the git repository. I grant
that the second hit, in the case of PostgreSQL, is a link to download
page for tarballs, but when I try plugging other things in there
instead of "postgresql" the git repository is always the first hit,
and sometimes there's a download page after that. Again, this doesn't
prove anything, but I do think it's suggestive.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: document the need to analyze partitioned tables

2022-01-21 Thread Tomas Vondra


Hi,

On 10/8/21 14:58, Justin Pryzby wrote:

Cleaned up and attached as a .patch.

The patch implementing autoanalyze on partitioned tables should
revert relevant portions of this patch.


I went through this patch and I'd like to propose a couple changes, per 
the 0002 patch:


1) I've reworded the changes in maintenance.sgml a bit. It sounded a bit 
strange before, but I'm not a native speaker so maybe it's worse ...


2) Remove unnecessary whitespace changes in perform.sgml.

3) Simplify the analyze.sgml changes a bit - it was trying to cram too 
much stuff into a single paragraph, so I split that.


Does that seem OK, or did omit something important?

FWIW I think it's really confusing we have inheritance and partitioning, 
and partitions and child tables. And sometimes we use partitioning in 
the generic sense (i.e. including the inheritance approach), and 
sometimes only the declarative variant. Same for partitions vs child 
tables. I can't even imagine how confusing this has to be for people 
just learning this stuff. They must be in permanent WTF?! state ...


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom c5105f2f067dff553045792d4a171fc835fe2109 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 22 Jul 2021 16:06:18 -0500
Subject: [PATCH v2 1/2] documentation deficiencies for ANALYZE of partitioned
 tables

This is partially extracted from 1b5617eb844cd2470a334c1d2eec66cf9b39c41a,
which was reverted.
---
 doc/src/sgml/maintenance.sgml | 27 ++
 doc/src/sgml/perform.sgml |  2 ++
 doc/src/sgml/ref/analyze.sgml | 36 ++-
 3 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 36f975b1e5b..b7c806cc906 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -290,6 +290,14 @@
 to meaningful statistical changes.

 
+   
+Tuples changed in partitions and inheritance children do not count towards
+analyze on the parent table.  If the parent table is empty or rarely
+changed, it may never be processed by autovacuum.  It is necessary to
+periodically manual run ANALYZE on the parent table to
+keep the statistics of its table hierarchy up to date.
+   
+

 As with vacuuming for space recovery, frequent updates of statistics
 are more useful for heavily-updated tables than for seldom-updated
@@ -347,6 +355,18 @@
  ANALYZE commands on those tables on a suitable schedule.
 

+
+   
+
+ The autovacuum daemon does not issue ANALYZE commands for
+ partitioned tables.  Inheritance parents will only be analyzed if the
+ parent itself is changed - changes to child tables do not trigger
+ autoanalyze on the parent table.  It is necessary to periodically run a
+ manual ANALYZE to keep the statistics of the table
+ hierarchy up to date.
+
+   
+
   
 
   
@@ -819,6 +839,13 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
 since the last ANALYZE.

 
+   
+Partitioned tables are not processed by autovacuum.  Statistics
+should be collected by running a manual ANALYZE when it is
+first populated, and updated whenever the distribution of data in its
+partitions changes significantly.
+   
+

 Temporary tables cannot be accessed by autovacuum.  Therefore,
 appropriate vacuum and analyze operations should be performed via
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e5..b84853fd6ff 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1765,9 +1765,11 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
Run ANALYZE Afterwards
 

+
 Whenever you have significantly altered the distribution of data
 within a table, running ANALYZE is strongly recommended. This
 includes bulk loading large amounts of data into the table.  Running
+
 ANALYZE (or VACUUM ANALYZE)
 ensures that the planner has up-to-date statistics about the
 table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c8fcebc1612..ea78f0d0387 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -250,22 +250,32 @@ ANALYZE [ VERBOSE ] [ table_and_columns
 
   
-If the table being analyzed has one or more children,
-ANALYZE will gather statistics twice: once on the
-rows of the parent table only, and a second time on the rows of the
-parent table with all of its children.  This second set of statistics
-is needed when planning queries that traverse the entire inheritance
-tree.  The autovacuum daemon, however, will only consider inserts or
-updates on the parent table itself when deciding whether to trigger an
-automatic analyze for that table.  If

Re: New developer papercut - Makefile references INSTALL

2022-01-21 Thread Tom Lane

Robert Haas  writes:
> On Fri, Jan 21, 2022 at 11:39 AM Tom Lane  wrote:
>> =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
>>> Another solution would be to merge both README files together and make
>>> separate section for development/git based codebase.

>> There's a lot to be said for that approach: make it simpler, not
>> more complicated.

> Yeah.

Josef, you want to draft a patch?

> And what about just getting rid of the INSTALL file altogether?
> I think that, in 2022, a lot of people are likely to use git to obtain
> the source code rather than obtain a tarball. And regardless of what
> method they use to get the source code, they don't really need there
> to be a text file in the directory with installation instructions; a
> URL is just fine. There was a time when you couldn't count on people
> to have a web browser conveniently available, either because that
> whole world wide web thing hadn't really caught on yet, or because
> they didn't even have an always-on Internet connection. In that world,
> an INSTALL file in the tarball makes a lot of sense. But these delays,
> the number of people who are still obtaining PostgreSQL via
> UUCP-over-modem-relay has got to be ... relatively limited.

I'm not convinced by this argument.  In the first place, the INSTALL
file isn't doing any harm.  I don't know that I'd bother to build the
infrastructure for it today, but we already have that infrastructure
and it's not causing us any particular maintenance burden.  In the
second place, I think your argument is a bit backwards.  Sure, people
who are relying on a git pull can be expected to have easy access to
on-line docs; that's exactly why we aren't troubled by not providing
ready-to-go INSTALL docs in that case.  But that doesn't follow for
people who are using a tarball.  In particular, it might not be that
easy to find on-line docs matching the specific tarball version they
are working with.  (With the planned meson conversion, that's about to
become a bigger deal than it's been in the recent past.)

regards, tom lane

Re: Logical replication timeout problem

2022-01-21 Thread Fabrice Chapuis

I keep your patch 0001 and I add these two calls in function
WalSndUpdateProgress without modifying WalSndKeepaliveIfNecessary, it works
too.
What do your think of this patch?

static void
WalSndUpdateProgress(LogicalDecodingContext *ctx, XLogRecPtr lsn,
TransactionId xid)
{
static TimestampTz sendTime = 0;
TimestampTz now = GetCurrentTimestamp();

ProcessRepliesIfAny();
WalSndKeepaliveIfNecessary();



/*
 * Track lag no more than once per
WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS to
 * avoid flooding the lag tracker when we commit frequently.
 */
...
Regards

Fabrice

On Fri, Jan 21, 2022 at 2:17 PM Fabrice Chapuis 
wrote:

> Thanks for your patch, it also works well when executing our use case, the
> timeout no longer appears in the logs. Is it necessary now to refine this
> patch and make as few changes as possible in order for it to be released?
>
> On Fri, Jan 21, 2022 at 10:51 AM wangw.f...@fujitsu.com <
> wangw.f...@fujitsu.com> wrote:
>
>> On Thu, Jan 20, 2022 at 9:18 PM Amit Kapila 
>> wrote:
>> > It might be not reaching the actual send_keep_alive logic in
>> > WalSndKeepaliveIfNecessary because of below code:
>> > {
>> > ...
>> > /*
>> > * Don't send keepalive messages if timeouts are globally disabled or
>> > * we're doing something not partaking in timeouts.
>> > */
>> > if (wal_sender_timeout <= 0 || last_reply_timestamp <= 0) return; ..
>> > }
>> >
>> > I think you can add elog before the above return and before updating
>> progress
>> > in the below code:
>> > case REORDER_BUFFER_CHANGE_INSERT:
>> >   if (!relentry->pubactions.pubinsert)
>> > + {
>> > + OutputPluginUpdateProgress(ctx);
>> >   return;
>> >
>> > This will help us to rule out one possibility.
>>
>> Thanks for your advices!
>>
>> According to your advices, I applied 0001,0002 and 0003 to run the test
>> script.
>> When subscriber timeout, I filter publisher-side log:
>> $ grep "before invoking update progress" pub.log | wc -l
>> 60373557
>> $ grep "return because wal_sender_timeout or last_reply_timestamp"
>> pub.log | wc -l
>> 0
>> $ grep "return because waiting_for_ping_response" pub.log | wc -l
>> 0
>>
>> Based on this result, I think function WalSndKeepaliveIfNecessary was
>> invoked,
>> but function WalSndKeepalive was not invoked because (last_processing >=
>> ping_time) is false.
>> So I tried to see changes about last_processing and last_reply_timestamp
>> (because ping_time is based on last_reply_timestamp).
>>
>> I found last_processing and last_reply_timestamp is set in function
>> ProcessRepliesIfAny.
>> last_processing is set to the time when function ProcessRepliesIfAny is
>> invoked.
>> Only when publisher receive a response from subscriber,
>> last_reply_timestamp is
>> set to last_processing and the flag waiting_for_ping_response is reset to
>> false.
>>
>> When we are during the loop to skip all the changes of transaction, IIUC,
>> we do
>> not invoke function ProcessRepliesIfAny. So I think last_processing and
>> last_reply_timestamp will not be changed in this loop.
>> Therefore I think about our use case, we should modify the condition of
>> invoking WalSndKeepalive.(please refer to
>> 0004-Simple-modification-of-timing.patch, and note that this is only a
>> patch
>> for testing).
>> At the same time I modify the input of WalSndKeepalive from true to
>> false. This
>> is because when input is true, waiting_for_ping_response is set to true in
>> WalSndKeepalive. As mentioned above, ProcessRepliesIfAny is not invoked
>> in the
>> loop, so I think waiting_for_ping_response will not be reset to false and
>> keepalive messages will not be sent.
>>
>> I tested after applying patches(0001 and 0004), I found the timeout was
>> not
>> printed in subscriber-side log. And the added messages "begin load
>> changes" and
>> "commit the log" were printed in publisher-side log:
>> $ grep -ir "begin load changes" pub.log
>> 2022-01-21 11:17:06.934 CST [2577699] LOG:  begin load changes
>> $ grep -ir "commit the log" pub.log
>> 2022-01-21 11:21:15.564 CST [2577699] LOG:  commit the log
>>
>> Attach the patches and test script mentioned above, in case someone wants
>> to
>> try.
>>
>> Regards,
>> Wang wei
>>
>

Re: New developer papercut - Makefile references INSTALL

2022-01-21 Thread Robert Haas

On Fri, Jan 21, 2022 at 11:39 AM Tom Lane  wrote:
> =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> > Another solution would be to merge both README files together and make
> > separate section for development/git based codebase.
>
> There's a lot to be said for that approach: make it simpler, not
> more complicated.

Yeah. And what about just getting rid of the INSTALL file altogether?
I think that, in 2022, a lot of people are likely to use git to obtain
the source code rather than obtain a tarball. And regardless of what
method they use to get the source code, they don't really need there
to be a text file in the directory with installation instructions; a
URL is just fine. There was a time when you couldn't count on people
to have a web browser conveniently available, either because that
whole world wide web thing hadn't really caught on yet, or because
they didn't even have an always-on Internet connection. In that world,
an INSTALL file in the tarball makes a lot of sense. But these delays,
the number of people who are still obtaining PostgreSQL via
UUCP-over-modem-relay has got to be ... relatively limited.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: New developer papercut - Makefile references INSTALL

2022-01-21 Thread Tom Lane

=?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> Another solution would be to merge both README files together and make
> separate section for development/git based codebase.

There's a lot to be said for that approach: make it simpler, not
more complicated.

regards, tom lane

Re: Skipping logical replication transactions on subscriber side

2022-01-21 Thread David G. Johnston

On Fri, Jan 21, 2022 at 4:55 AM Amit Kapila  wrote:

> Apart from this, I have changed a few comments and ran pgindent. Do
> let me know what you think of the changes?
>

The paragraph describing ALTER SUBSCRIPTION SKIP seems unnecessarily
repetitive.  Consider:
"""
Skips applying all changes of the specified remote transaction, whose value
should be obtained from pg_stat_subscription_workers.last_error_xid.  While
this will result in avoiding the last error on the subscription, thus
allowing it to resume working.  See "link to a more holistic description in
the Logical Replication chapter" for alternative means of resolving
subscription errors.  Removing an entire transaction from the history of a
table should be considered a last resort as it can leave the system in a
very inconsistent state.

Note, this feature will not accept transactions prepared under two-phase
commit.

This command sets pg_subscription.subskipxid field upon issuance and the
system clears the same field upon seeing and successfully skipped the
identified transaction.  Issuing this command again while a skipped
transaction is pending replaces the existing transaction with the new one.
"""

Then change the subskipxid column description to be:
"""
ID of the transaction whose changes are to be skipped.  It is 0 when there
are no pending skips.  This is set by issuing ALTER SUBSCRIPTION SKIP and
resets back to 0 when the identified transactions passes through the
subscription stream and is successfully ignored.
"""

I don't understand why/how ", if a valid transaction ID;" comes into play
(how would we know whether it is valid, or if we do ALTER SUBSCRIPTION SKIP
should prohibit the invalid value from being chosen).

I'm against mentioning subtransactions in the skip_option description.

The Logical Replication page changes provide good content overall but I
dislike going into detail about how to perform conflict resolution in the
third paragraph and then summarize the various forms of conflict resolution
in the newly added forth.  Maybe re-work things like:

1. Logical replication behaves...
2. A conflict will produce...details can be found in places...
3. Resolving conflicts can be done by...
4. (split and reworded) If choosing to simply skip the offending
transaction you take the pg_stat_subscription_worker.last_error_xid value
(716 in the example above) and provide it while executing ALTER
SUBSCRIPTION SKIP...
5. (split and reworded) Prior to v15 ALTER SUBSCRIPTION SKIP was not
available and instead you had to use the pg_replication_origin_advance()
function...

Don't just list out two options for the user to perform the same action.
Tell a story about why we felt compelled to add ALTER SYSTEM SKIP and why
either the function is now deprecated or is useful given different
circumstances (the former seems likely).

David J.

Re: ICU for global collation

2022-01-21 Thread Julien Rouhaud

On Fri, Jan 21, 2022 at 03:24:02PM +0100, Peter Eisentraut wrote:
> On 21.01.22 14:51, Julien Rouhaud wrote:
> > Is that change intended?  There isn't any usage of the collversionstr before
> > the possible error when actual_versionstr is missing.
> 
> I wanted to move it closer to the SysCacheGetAttr() where the "datum" value
> is obtained.  It seemed weird to get the datum, then do other things, then
> decode the datum.

Oh ok.  It won't make much difference performance-wise, so no objection.

Re: New developer papercut - Makefile references INSTALL

2022-01-21 Thread Josef Šimánek

pá 21. 1. 2022 v 16:31 odesílatel Tom Lane  napsal:
>
> =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> > There is README.git explaining this. README itself is meant to be used
> > for distributed source code. You can generate INSTALL locally for
> > example by running make dist (INSTALL will be present in
> > postgresql-15devel directory).
>
> > Anyway I do agree this is confusing. Maybe we can actually rename
> > README.git to README and current README to README.dist or similar.
> > README.dist can be copied to distribution package as README during
> > Makefile magic.
>
> IIRC, we discussed that when README.git was invented, and concluded
> that it would just create different sorts of confusion.  I might
> be biased, as the person who is generally checking created tarballs
> for sanity, but I really do not want any situation where a file
> appearing in the tarball is different from the same-named file in
> the git tree.
>
> Perhaps it could be sane to not have *any* file named README in
> the git tree, only README.git and README.dist, with the tarball
> preparation process copying README.dist to README.  However,
> if I'm understanding what github does, that would leave us with
> no automatically-displayed documentation for the github repo.
> So I'm not sure that helps with samay's concern.

Especially for GitHub use-case it is possible to add separate readme
into .github directory. But the problem with local clone will not be
solved anyway.

>From GitHub docs:

"If you put your README file in your repository's root, docs, or
hidden .github directory, GitHub will recognize and automatically
surface your README to repository visitors."

Another solution would be to merge both README files together and make
separate section for development/git based codebase.

> regards, tom lane

Re: support for MERGE

2022-01-21 Thread Alvaro Herrera

On 2022-Jan-21, Japin Li wrote:

> +   /*
> +* NOT MATCHED actions can't see target relation, but they 
> can see
> +* source relation.
> +*/
> +   Assert(mergeWhenClause->commandType == CMD_INSERT ||
> +  mergeWhenClause->commandType == CMD_DELETE ||
> +  mergeWhenClause->commandType == CMD_NOTHING);
> +   setNamespaceVisibilityForRTE(pstate->p_namespace,
> +
> targetRelRTE, false, false);
> +   setNamespaceVisibilityForRTE(pstate->p_namespace,
> +
> sourceRelRTE, true, true);
> 
> Should we remove the CMD_DELETE from Assert(), since it will not happened
> according to MERGE syntax?

Absolutely --- silly copy&paste mistake.  Pushed fix.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Ni aún el genio muy grande llegaría muy lejos
si tuviera que sacarlo todo de su propio interior" (Goethe)

Re: New developer papercut - Makefile references INSTALL

2022-01-21 Thread Tom Lane

=?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> There is README.git explaining this. README itself is meant to be used
> for distributed source code. You can generate INSTALL locally for
> example by running make dist (INSTALL will be present in
> postgresql-15devel directory).

> Anyway I do agree this is confusing. Maybe we can actually rename
> README.git to README and current README to README.dist or similar.
> README.dist can be copied to distribution package as README during
> Makefile magic.

IIRC, we discussed that when README.git was invented, and concluded
that it would just create different sorts of confusion.  I might
be biased, as the person who is generally checking created tarballs
for sanity, but I really do not want any situation where a file
appearing in the tarball is different from the same-named file in
the git tree.

Perhaps it could be sane to not have *any* file named README in
the git tree, only README.git and README.dist, with the tarball
preparation process copying README.dist to README.  However,
if I'm understanding what github does, that would leave us with
no automatically-displayed documentation for the github repo.
So I'm not sure that helps with samay's concern.

regards, tom lane

Re: Extend compatibility of PostgreSQL::Test::Cluster

2022-01-21 Thread Andrew Dunstan



On 1/21/22 02:47, Michael Paquier wrote:
> On Tue, Jan 18, 2022 at 06:35:39PM -0500, Andrew Dunstan wrote:
>> Here's a version that does that and removes some recent bitrot.
> I have been looking at the full set of features of Cluster.pm and the
> requirements behind v10 as minimal version supported, and nothing
> really stands out.
>
> +   # old versions of walreceiver just set the application name to
> +   # `walreceiver'
>
> Perhaps this should mention to which older versions this sentence
> applies?



Will do in the next version. FTR it's versions older than 12.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Refactoring of compression options in pg_basebackup

2022-01-21 Thread Robert Haas

On Thu, Jan 20, 2022 at 9:18 PM Michael Paquier  wrote:
> On Thu, Jan 20, 2022 at 10:25:43AM -0500, Robert Haas wrote:
> > You don't need to test for gzip and none in two places each. Just make
> > the block with the "It does not match ..." comment the "else" clause
> > for this last part.
>
> Indeed, that looks better.  I have done an extra pass on this stuff
> this morning, and applied it, so we should be done here.

Thanks. One thing I just noticed is that the enum we're using here is
called WalCompressionMethod. But we're not compressing WAL. We're
compressing tarfiles of the data directory.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: row filtering for logical replication

2022-01-21 Thread Alvaro Herrera

On 2022-Jan-21, houzj.f...@fujitsu.com wrote:

> Personally, I'm a little hesitant to put the check at DDL level, because
> adding check at DDLs like ATTACH PARTITION/CREATE PARTITION OF ( [1]
> explained why we need to check these DDLs) looks a bit restrictive and
> user might also complain about that. Put the check in
> CheckCmdReplicaIdentity seems more acceptable because it is consistent
> with the existing behavior which has few complaints from users AFAIK.

I think logical replication is currently so limited that there's very
few people that can put it to real use.  So I suggest we should not take
the small number of complaints about the current behavior as very
valuable, because it just means that not a lot of people are using
logical replication in the first place.  But once these new
functionalities are introduced, it will start to become actually useful
and it will be then when users will exercise and notice weird behavior.

If ATTACH PARTITION or CREATE TABLE .. PARTITION OF don't let you
specify replica identity, I suspect it's because both partitioning and
logical replication were developed in parallel, and neither gave too
much thought to the other.  So these syntax corners went unnoticed.

I suspect that a better way to attack this problem is to let ALTER TABLE
... ATTACH PARTITION and CREATE TABLE .. PARTITION OF specify a replica
identity as necessary.

My suggestion is to avoid painting us into a corner from which it will
be impossible to get out later.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"La espina, desde que nace, ya pincha" (Proverbio africano)

Re: ICU for global collation

2022-01-21 Thread Peter Eisentraut

On 21.01.22 14:51, Julien Rouhaud wrote:

 From 1c46bf3138ad42074971aa3130142236de7e63f7 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 21 Jan 2022 10:01:25 +0100
Subject: [PATCH] Change collate and ctype fields to type text

+   collversionstr = TextDatumGetCString(datum);
+
actual_versionstr = 
get_collation_actual_version(collform->collprovider, collcollate);
if (!actual_versionstr)
{
@@ -1606,7 +1616,6 @@ pg_newlocale_from_collation(Oid collid)
(errmsg("collation \"%s\" has no 
actual version, but a version was specified",

NameStr(collform->collname;
}
-   collversionstr = TextDatumGetCString(collversion);

Is that change intended?  There isn't any usage of the collversionstr before
the possible error when actual_versionstr is missing.

I wanted to move it closer to the SysCacheGetAttr() where the "datum" 
value is obtained.  It seemed weird to get the datum, then do other 
things, then decode the datum.

Re: Skipping logical replication transactions on subscriber side

2022-01-21 Thread Peter Eisentraut


On 21.01.22 04:08, Masahiko Sawada wrote:

I think the superuser check in AlterSubscription() might no longer be
appropriate.  Subscriptions can now be owned by non-superusers.  Please
check that.


IIUC we don't allow non-superuser to own the subscription yet. We
still have the following superuser checks:

In CreateSubscription():

 if (!superuser())
 ereport(ERROR,
 (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  errmsg("must be superuser to create subscriptions")));

and in AlterSubscriptionOwner_internal();

 /* New owner must be a superuser */
 if (!superuser_arg(newOwnerId))
 ereport(ERROR,
 (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  errmsg("permission denied to change owner of
subscription \"%s\"",
 NameStr(form->subname)),
  errhint("The owner of a subscription must be a superuser.")));

Also, doing superuser check here seems to be consistent with
pg_replication_origin_advance() which is another way to skip
transactions and also requires superuser permission.


I'm referring to commit a2ab9c06ea15fbcb2bfde570986a06b37f52bcca.  You 
still have to be superuser to create a subscription, but you can change 
the owner to a nonprivileged user and it will observe table permissions 
on the subscriber.


Assuming my understanding of that commit is correct, I think it would be 
sufficient in your patch to check that the current user is the owner of 
the subscription.

Re: ICU for global collation

2022-01-21 Thread Julien Rouhaud

Hi,

On Fri, Jan 21, 2022 at 10:44:24AM +0100, Peter Eisentraut wrote:
> 
> Here is a second preparation patch: Change collate and ctype fields to type
> text.
> 
> I think this is useful by itself because it allows longer locale names. ICU
> locale names with several options appended can be longer than 63 bytes.
> This case is probably broken right now.  Also, it saves space in the typical
> case, since most locale names are not anywhere near 63 bytes.

I totally agree.

> From 1c46bf3138ad42074971aa3130142236de7e63f7 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut 
> Date: Fri, 21 Jan 2022 10:01:25 +0100
> Subject: [PATCH] Change collate and ctype fields to type text

+   collversionstr = TextDatumGetCString(datum);
+
actual_versionstr = 
get_collation_actual_version(collform->collprovider, collcollate);
if (!actual_versionstr)
{
@@ -1606,7 +1616,6 @@ pg_newlocale_from_collation(Oid collid)
(errmsg("collation \"%s\" has 
no actual version, but a version was specified",

NameStr(collform->collname;
}
-   collversionstr = TextDatumGetCString(collversion);


Is that change intended?  There isn't any usage of the collversionstr before
the possible error when actual_versionstr is missing.

Apart from that the patch looks good to me, all tests pass and no issue with
building the doc either.

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2022-01-21 Thread Shruthi Gowda

On Fri, Jan 21, 2022 at 1:08 AM Robert Haas  wrote:
>
> On Thu, Jan 20, 2022 at 11:03 AM Shruthi Gowda  wrote:
> > It is not required for PostgresObjectId. The unused_oids script
> > provides a list of unused oids in the manually-assignable OIDs range
> > (1-).
>
> Well, so ... why are we not treating the OIDs for these two databases
> the same? If there's a range from which we can assign OIDs without
> risk of duplication and without needing to update this script, perhaps
> we ought to assign both of them from that range and leave the script
> alone.

>From what I see in the code, template0 and postgres are the last
things that get created in initdb phase. The system OIDs that get
assigned to these DBs vary from release to release. At present, the
system assigned OIDs of template0 and postgres are 13679 and 13680
respectively. I feel it would be safe to assign 16000 and 16001 for
template0 and postgres respectively from the unpinned object OID range
12000 - 16383. In the future, even if the initdb unpinned objects
reach the range of 16000 issues can only arise if initdb() creates
another system-created database for which the system assigns these
reserved OIDs (16000, 16001).

> + * that is in use in the old cluster is also used in th new cluster - and
>
> th -> the.

Fixed

> +preserves the DB, tablespace, relfilenode OIDs so TOAST and other references
>
> Insert "and" before "relfilenode".

Fixed

Attached is the latest patch for review.

Shruthi KC
EnterpriseDB: http://www.enterprisedb.com


v12-0001-pg_upgrade-perserve-database-OID-patch.patch
Description: Binary data

Re: Logical replication timeout problem

2022-01-21 Thread Fabrice Chapuis

Thanks for your patch, it also works well when executing our use case, the
timeout no longer appears in the logs. Is it necessary now to refine this
patch and make as few changes as possible in order for it to be released?

On Fri, Jan 21, 2022 at 10:51 AM wangw.f...@fujitsu.com <
wangw.f...@fujitsu.com> wrote:

> On Thu, Jan 20, 2022 at 9:18 PM Amit Kapila 
> wrote:
> > It might be not reaching the actual send_keep_alive logic in
> > WalSndKeepaliveIfNecessary because of below code:
> > {
> > ...
> > /*
> > * Don't send keepalive messages if timeouts are globally disabled or
> > * we're doing something not partaking in timeouts.
> > */
> > if (wal_sender_timeout <= 0 || last_reply_timestamp <= 0) return; ..
> > }
> >
> > I think you can add elog before the above return and before updating
> progress
> > in the below code:
> > case REORDER_BUFFER_CHANGE_INSERT:
> >   if (!relentry->pubactions.pubinsert)
> > + {
> > + OutputPluginUpdateProgress(ctx);
> >   return;
> >
> > This will help us to rule out one possibility.
>
> Thanks for your advices!
>
> According to your advices, I applied 0001,0002 and 0003 to run the test
> script.
> When subscriber timeout, I filter publisher-side log:
> $ grep "before invoking update progress" pub.log | wc -l
> 60373557
> $ grep "return because wal_sender_timeout or last_reply_timestamp" pub.log
> | wc -l
> 0
> $ grep "return because waiting_for_ping_response" pub.log | wc -l
> 0
>
> Based on this result, I think function WalSndKeepaliveIfNecessary was
> invoked,
> but function WalSndKeepalive was not invoked because (last_processing >=
> ping_time) is false.
> So I tried to see changes about last_processing and last_reply_timestamp
> (because ping_time is based on last_reply_timestamp).
>
> I found last_processing and last_reply_timestamp is set in function
> ProcessRepliesIfAny.
> last_processing is set to the time when function ProcessRepliesIfAny is
> invoked.
> Only when publisher receive a response from subscriber,
> last_reply_timestamp is
> set to last_processing and the flag waiting_for_ping_response is reset to
> false.
>
> When we are during the loop to skip all the changes of transaction, IIUC,
> we do
> not invoke function ProcessRepliesIfAny. So I think last_processing and
> last_reply_timestamp will not be changed in this loop.
> Therefore I think about our use case, we should modify the condition of
> invoking WalSndKeepalive.(please refer to
> 0004-Simple-modification-of-timing.patch, and note that this is only a
> patch
> for testing).
> At the same time I modify the input of WalSndKeepalive from true to false.
> This
> is because when input is true, waiting_for_ping_response is set to true in
> WalSndKeepalive. As mentioned above, ProcessRepliesIfAny is not invoked in
> the
> loop, so I think waiting_for_ping_response will not be reset to false and
> keepalive messages will not be sent.
>
> I tested after applying patches(0001 and 0004), I found the timeout was not
> printed in subscriber-side log. And the added messages "begin load
> changes" and
> "commit the log" were printed in publisher-side log:
> $ grep -ir "begin load changes" pub.log
> 2022-01-21 11:17:06.934 CST [2577699] LOG:  begin load changes
> $ grep -ir "commit the log" pub.log
> 2022-01-21 11:21:15.564 CST [2577699] LOG:  commit the log
>
> Attach the patches and test script mentioned above, in case someone wants
> to
> try.
>
> Regards,
> Wang wei
>

Re: Skipping logical replication transactions on subscriber side

2022-01-21 Thread Amit Kapila

On Fri, Jan 21, 2022 at 5:25 PM Amit Kapila  wrote:
>
> On Fri, Jan 21, 2022 at 10:10 AM Masahiko Sawada  
> wrote:
> >
>
> Few things that I think we can improve in 028_skip_xact.pl are as follows:
>
> After CREATE SUBSCRIPTION, wait for initial sync to be over and
> two_phase state to be enabled. Please see 021_twophase. For the
> streaming case, we might be able to ensure streaming even with lesser
> data. Can you please try that?
>

I noticed that the newly added test by this patch takes time is on the
upper side. See comparison with the subscription test that takes max
time:
[17:38:49] t/028_skip_xact.pl . ok 9298 ms
[17:38:59] t/100_bugs.pl .. ok11349 ms

I think we can reduce time by removing some stream tests without much
impacting on coverage, possibly related to 2PC and streaming together,
and if you do that we probably don't need a subscription with both 2PC
and streaming enabled.

-- 
With Regards,
Amit Kapila.

Re: Skipping logical replication transactions on subscriber side

2022-01-21 Thread Amit Kapila

On Fri, Jan 21, 2022 at 10:10 AM Masahiko Sawada  wrote:
>
> On Fri, Jan 21, 2022 at 1:20 PM Amit Kapila  wrote:
> >
> > What do we want to indicate by [, ... ]? To me, it appears like
> > multiple options but that is not what we support currently.
>
> You're right. It's an oversight.
>

I have fixed this and a few other things in the attached patch.
1.
The newly added column needs to be updated in the following statement:
-- All columns of pg_subscription except subconninfo are publicly readable.
REVOKE ALL ON pg_subscription FROM public;
GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
  substream, subtwophasestate, subslotname, subsynccommit,
subpublications)
ON pg_subscription TO public;

2.
+stop_skipping_changes(bool clear_subskipxid, XLogRecPtr origin_lsn,
+   TimestampTz origin_timestamp)
+{
+ Assert(is_skipping_changes());
+
+ ereport(LOG,
+ (errmsg("done skipping logical replication transaction %u",
+ skip_xid)));

Isn't it better to move this LOG at the end of this function? Because
clear* functions can give an error, so it is better to move it after
that. I have done that in the attached.

3.
+-- fail - must be superuser
+SET SESSION AUTHORIZATION 'regress_subscription_user2';
+ALTER SUBSCRIPTION regress_testsub SKIP (xid = 100);
+ERROR:  must be owner of subscription regress_testsub

This test doesn't seem to be right. You want to get the error for the
superuser but the error is for the owner. I have changed this test to
do what it intends to do.

Apart from this, I have changed a few comments and ran pgindent. Do
let me know what you think of the changes?

Few things that I think we can improve in 028_skip_xact.pl are as follows:

After CREATE SUBSCRIPTION, wait for initial sync to be over and
two_phase state to be enabled. Please see 021_twophase. For the
streaming case, we might be able to ensure streaming even with lesser
data. Can you please try that?

-- 
With Regards,
Amit Kapila.

v10-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch
Description: Binary data

Re: POC: GROUP BY optimization

2022-01-21 Thread Andrey Lepikhov


On 7/22/21 3:58 AM, Tomas Vondra wrote:

4) I'm not sure it's actually a good idea to pass tuplesPerPrevGroup to
estimate_num_groups_incremental. In principle yes, if we use "group
size" from the previous step, then the returned value is the number of
new groups after adding the "new" pathkey.
But even if we ignore the issues with amplification mentioned in (3),
there's an issue with non-linear behavior in estimate_num_groups,
because at some point it's calculating

D(N,n,p) = n * (1 - ((N-p)/N)^(N/n))

where N - total rows, p - sample size, n - number of distinct values.
And if we have (N1,n1) and (N2,n2) then the ratio of calculated
estimated (which is pretty much what calculating group size does)

D(N2,n2,p2) / D(N1,n1,p1)

which will differ depending on p1 and p2. And if we're tweaking the
tuplesPerPrevGroup all the time, that's really annoying, as it may make
the groups smaller or larger, which is unpredictable and annoying, and I
wonder if it might go against the idea of penalizing tuplesPerPrevGroup
to some extent.
We could simply use the input "tuples" value here, and then divide the
current and previous estimate to calculate the number of new groups.
tuplesPerPrevGroup is only a top boundary for the estimation. I think, 
we should use result of previous estimation as a limit for the next 
during incremental estimation. Maybe we simply limit the 
tuplesPerPrevGroup value to ensure of monotonic non-growth of this 
value? - see in attachment patch to previous fixes.


--
regards,
Andrey Lepikhov
Postgres Professionaldiff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index 70af9c91d5..4e26cd275d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2025,8 +2025,10 @@ compute_cpu_sort_cost(PlannerInfo *root, List *pathkeys, 
int nPresortedKeys,
/*
 * Real-world distribution isn't uniform but now we don't have 
a way to
 * determine that, so, add multiplier to get closer to worst 
case.
+* Guard this value against irrational decisions.
 */
-   tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups);
+   tuplesPerPrevGroup = Min(tuplesPerPrevGroup,
+ceil(1.5 * 
tuplesPerPrevGroup / nGroups));
 
/*
 * We could skip all following columns for cost estimation, 
because we

Re: row filtering for logical replication

2022-01-21 Thread Dilip Kumar

On Thu, Jan 20, 2022 at 4:54 PM Amit Kapila  wrote:

> + /*
> + * Unchanged toasted replica identity columns are only detoasted in the
> + * old tuple, copy this over to the new tuple.
> + */
> + if (att->attlen == -1 &&
> + VARATT_IS_EXTERNAL_ONDISK(tmp_new_slot->tts_values[i]) &&
> + !VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i]))
> + {
> + if (tmp_new_slot == new_slot)
> + {
> + tmp_new_slot = MakeSingleTupleTableSlot(desc, &TTSOpsVirtual);
> + ExecClearTuple(tmp_new_slot);
> + ExecCopySlot(tmp_new_slot, new_slot);
> + }
> +
> + tmp_new_slot->tts_values[i] = old_slot->tts_values[i];
> + tmp_new_slot->tts_isnull[i] = old_slot->tts_isnull[i];
> + }
> + }
>
>
> What is the need to assign new_slot to tmp_new_slot at the beginning
> of this part of the code? Can't we do this when we found some
> attribute that needs to be copied from the old tuple?
>
> The other part which is not clear to me by looking at this code and
> comments is how do we ensure that we cover all cases where the new
> tuple doesn't have values?
>

IMHO, the only part we are trying to handle is when the toasted attribute
is not modified in the new tuple.  And if we notice the update WAL the new
tuple is written as it is in the WAL which is getting inserted into the
heap page.  That means if it is external it can only be in
VARATT_IS_EXTERNAL_ONDISK format.  So I don't think we need to worry about
any intermediate format which we use for the in-memory tuples.  Sometimes
in reorder buffer we do use the INDIRECT format as well which internally
can store ON DISK format but we don't need to worry about that case either
because that is only true when we have the complete toast tuple as part of
the WAL and we recreate the tuple in memory in reorder buffer, so even if
it can by ON DISK format inside INDIRECT format but we have complete tuple.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: Logical replication timeout problem

2022-01-21 Thread wangw.f...@fujitsu.com

On Thu, Jan 20, 2022 at 9:18 PM Amit Kapila  wrote:
> It might be not reaching the actual send_keep_alive logic in
> WalSndKeepaliveIfNecessary because of below code:
> {
> ...
> /*
> * Don't send keepalive messages if timeouts are globally disabled or
> * we're doing something not partaking in timeouts.
> */
> if (wal_sender_timeout <= 0 || last_reply_timestamp <= 0) return; ..
> }
> 
> I think you can add elog before the above return and before updating progress
> in the below code:
> case REORDER_BUFFER_CHANGE_INSERT:
>   if (!relentry->pubactions.pubinsert)
> + {
> + OutputPluginUpdateProgress(ctx);
>   return;
> 
> This will help us to rule out one possibility.

Thanks for your advices!

According to your advices, I applied 0001,0002 and 0003 to run the test script.
When subscriber timeout, I filter publisher-side log:
$ grep "before invoking update progress" pub.log | wc -l
60373557
$ grep "return because wal_sender_timeout or last_reply_timestamp" pub.log | wc 
-l
0
$ grep "return because waiting_for_ping_response" pub.log | wc -l
0

Based on this result, I think function WalSndKeepaliveIfNecessary was invoked,
but function WalSndKeepalive was not invoked because (last_processing >=
ping_time) is false.
So I tried to see changes about last_processing and last_reply_timestamp
(because ping_time is based on last_reply_timestamp).

I found last_processing and last_reply_timestamp is set in function
ProcessRepliesIfAny.
last_processing is set to the time when function ProcessRepliesIfAny is
invoked.
Only when publisher receive a response from subscriber, last_reply_timestamp is
set to last_processing and the flag waiting_for_ping_response is reset to
false.

When we are during the loop to skip all the changes of transaction, IIUC, we do
not invoke function ProcessRepliesIfAny. So I think last_processing and
last_reply_timestamp will not be changed in this loop.
Therefore I think about our use case, we should modify the condition of
invoking WalSndKeepalive.(please refer to
0004-Simple-modification-of-timing.patch, and note that this is only a patch
for testing).
At the same time I modify the input of WalSndKeepalive from true to false. This
is because when input is true, waiting_for_ping_response is set to true in
WalSndKeepalive. As mentioned above, ProcessRepliesIfAny is not invoked in the
loop, so I think waiting_for_ping_response will not be reset to false and
keepalive messages will not be sent.

I tested after applying patches(0001 and 0004), I found the timeout was not
printed in subscriber-side log. And the added messages "begin load changes" and
"commit the log" were printed in publisher-side log:
$ grep -ir "begin load changes" pub.log
2022-01-21 11:17:06.934 CST [2577699] LOG:  begin load changes
$ grep -ir "commit the log" pub.log
2022-01-21 11:21:15.564 CST [2577699] LOG:  commit the log

Attach the patches and test script mentioned above, in case someone wants to
try.

Regards,
Wang wei

0001-Send-keepalive.patch
Description: 0001-Send-keepalive.patch

0002-Add-some-logs-to-debug.patch
Description: 0002-Add-some-logs-to-debug.patch

0003-Add-more-logs.patch
Description: 0003-Add-more-logs.patch

0004-Simple-modification-of-timing.patch
Description: 0004-Simple-modification-of-timing.patch

test.sh
Description: test.sh

Re: ICU for global collation

2022-01-21 Thread Peter Eisentraut


On 18.01.22 13:54, Peter Eisentraut wrote:

On 18.01.22 05:02, Julien Rouhaud wrote:
If this is applied, then in my estimation all these hunks will 
completely
disappear from the global ICU patch.  So this would be a significant 
win.
Agreed, the patch will be quite smaller and also easier to review.  
Are you

planning to apply it on its own or add it to the default ICU patchset?


My plan is to apply this patch in the next few days.


This patch has been committed.

Here is a second preparation patch: Change collate and ctype fields to 
type text.


I think this is useful by itself because it allows longer locale names. 
ICU locale names with several options appended can be longer than 63 
bytes.  This case is probably broken right now.  Also, it saves space in 
the typical case, since most locale names are not anywhere near 63 bytes.From 1c46bf3138ad42074971aa3130142236de7e63f7 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 21 Jan 2022 10:01:25 +0100
Subject: [PATCH] Change collate and ctype fields to type text

This changes the data type of the catalog fields datcollate, datctype,
collcollate, and collctype from name to text.  There wasn't ever a
really good reason for them to be of type name; presumably this was
just carried over from when they were fixed-size fields in pg_control,
first into the corresponding pg_database fields, and then to
pg_collation.  The values are not identifiers or object names, and we
don't ever look them up that way.

Changing to type text saves space in the typical case, since locale
names are typically less than 10 bytes long.  But it is also possible
that an ICU locale name with several customization options appended
could be longer than 63 bytes, so this also enables that case, which
was previously probably broken.

Discussion: 
https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c1...@2ndquadrant.com
---
 doc/src/sgml/catalogs.sgml   | 40 +--
 src/backend/catalog/pg_collation.c   | 10 ++-
 src/backend/commands/collationcmds.c | 41 +++-
 src/backend/commands/dbcommands.c| 21 ++
 src/backend/utils/adt/pg_locale.c| 29 +---
 src/backend/utils/init/postinit.c| 11 ++--
 src/include/catalog/pg_collation.h   |  4 +--
 src/include/catalog/pg_database.h| 12 
 src/include/utils/pg_locale.h|  2 ++
 9 files changed, 104 insertions(+), 66 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 1e65c426b2..7d5b0b1656 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2368,7 +2368,7 @@ pg_collation 
Columns
 
  
   
-   collcollate name
+   collcollate text
   
   
LC_COLLATE for this collation object
@@ -2377,7 +2377,7 @@ pg_collation 
Columns
 
  
   
-   collctype name
+   collctype text
   
   
LC_CTYPE for this collation object
@@ -2951,24 +2951,6 @@ pg_database 
Columns
   
  
 
- 
-  
-   datcollate name
-  
-  
-   LC_COLLATE for this database
-  
- 
-
- 
-  
-   datctype name
-  
-  
-   LC_CTYPE for this database
-  
- 
-
  
   
datistemplate bool
@@ -3043,6 +3025,24 @@ pg_database 
Columns
   
  
 
+ 
+  
+   datcollate text
+  
+  
+   LC_COLLATE for this database
+  
+ 
+
+ 
+  
+   datctype text
+  
+  
+   LC_CTYPE for this database
+  
+ 
+
  
   
datacl aclitem[]
diff --git a/src/backend/catalog/pg_collation.c 
b/src/backend/catalog/pg_collation.c
index 5be6600652..bfc02d3038 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -58,9 +58,7 @@ CollationCreate(const char *collname, Oid collnamespace,
HeapTuple   tup;
Datum   values[Natts_pg_collation];
boolnulls[Natts_pg_collation];
-   NameDataname_name,
-   name_collate,
-   name_ctype;
+   NameDataname_name;
Oid oid;
ObjectAddress myself,
referenced;
@@ -163,10 +161,8 @@ CollationCreate(const char *collname, Oid collnamespace,
values[Anum_pg_collation_collprovider - 1] = CharGetDatum(collprovider);
values[Anum_pg_collation_collisdeterministic - 1] = 
BoolGetDatum(collisdeterministic);
values[Anum_pg_collation_collencoding - 1] = 
Int32GetDatum(collencoding);
-   namestrcpy(&name_collate, collcollate);
-   values[Anum_pg_collation_collcollate - 1] = NameGetDatum(&name_collate);
-   namestrcpy(&name_ctype, collctype);
-   values[Anum_pg_collation_collctype - 1] = NameGetDatum(&name_ctype);
+   values[Anum_pg_collation_collcollate - 1] = 
CStringGetTextDatum(collcollate);
+   values[Anum_pg_co

RE: row filtering for logical replication

2022-01-21 Thread houzj.f...@fujitsu.com

On Thur, Jan 20, 2022 7:25 PM Amit Kapila  wrote:
> 
> On Thu, Jan 20, 2022 at 6:42 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > Attach the V68 patch set which addressed the above comments and changes.
> >
> 
> Few comments and suggestions:
> ==
> 1.
> /*
> + * For updates, if both the new tuple and old tuple are not null, then both
> + * of them need to be checked against the row filter.
> + */
> + tmp_new_slot = new_slot;
> + slot_getallattrs(new_slot);
> + slot_getallattrs(old_slot);
> +
> 
> Isn't it better to add assert like
> Assert(map_changetype_pubaction[*action] == PUBACTION_UPDATE); before
> the above code? I have tried to change this part of the code in the
> attached top-up patch.
> 
> 2.
> + /*
> + * For updates, if both the new tuple and old tuple are not null, then both
> + * of them need to be checked against the row filter.
> + */
> + tmp_new_slot = new_slot;
> + slot_getallattrs(new_slot);
> + slot_getallattrs(old_slot);
> +
> + /*
> + * The new tuple might not have all the replica identity columns, in which
> + * case it needs to be copied over from the old tuple.
> + */
> + for (i = 0; i < desc->natts; i++)
> + {
> + Form_pg_attribute att = TupleDescAttr(desc, i);
> +
> + /*
> + * if the column in the new tuple or old tuple is null, nothing to do
> + */
> + if (tmp_new_slot->tts_isnull[i] || old_slot->tts_isnull[i])
> + continue;
> +
> + /*
> + * Unchanged toasted replica identity columns are only detoasted in the
> + * old tuple, copy this over to the new tuple.
> + */
> + if (att->attlen == -1 &&
> + VARATT_IS_EXTERNAL_ONDISK(tmp_new_slot->tts_values[i]) &&
> + !VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i]))
> + {
> + if (tmp_new_slot == new_slot)
> + {
> + tmp_new_slot = MakeSingleTupleTableSlot(desc, &TTSOpsVirtual);
> + ExecClearTuple(tmp_new_slot);
> + ExecCopySlot(tmp_new_slot, new_slot);
> + }
> +
> + tmp_new_slot->tts_values[i] = old_slot->tts_values[i];
> + tmp_new_slot->tts_isnull[i] = old_slot->tts_isnull[i];
> + }
> + }
> 
> 
> What is the need to assign new_slot to tmp_new_slot at the beginning
> of this part of the code? Can't we do this when we found some
> attribute that needs to be copied from the old tuple?

Thanks for the comments, Changed.

> The other part which is not clear to me by looking at this code and
> comments is how do we ensure that we cover all cases where the new
> tuple doesn't have values?

I will do some research about this and respond soon.

Best regards,
Hou zj

Re: row filtering for logical replication

2022-01-21 Thread Greg Nancarrow

On Thu, Jan 20, 2022 at 12:12 PM houzj.f...@fujitsu.com
 wrote:
>
> Attach the V68 patch set which addressed the above comments and changes.
> The version patch also fix the error message mentioned by Greg[1]
>

Some review comments for the v68 patch, mostly nitpicking:

(1) Commit message
Minor suggested updates:

BEFORE:
Allow specifying row filter for logical replication of tables.
AFTER:
Allow specifying row filters for logical replication of tables.

BEFORE:
If you choose to do the initial table synchronization, only data that
satisfies the row filters is pulled by the subscriber.
AFTER:
If you choose to do the initial table synchronization, only data that
satisfies the row filters is copied to the subscriber.

src/backend/executor/execReplication.c

(2)

BEFORE:
+ * table does not publish UPDATES or DELETES.
AFTER:
+ * table does not publish UPDATEs or DELETEs.

src/backend/replication/pgoutput/pgoutput.c

(3) pgoutput_row_filter_exec_expr
pgoutput_row_filter_exec_expr() returns false if "isnull" is true,
otherwise (if "isnull" is false) returns the value of "ret"
(true/false).
So the following elog needs to be changed (Peter Smith previously
pointed this out, but it didn't get completely changed):

BEFORE:
+ elog(DEBUG3, "row filter evaluates to %s (isnull: %s)",
+ DatumGetBool(ret) ? "true" : "false",
+ isnull ? "true" : "false");
AFTER:
+ elog(DEBUG3, "row filter evaluates to %s (isnull: %s)",
+ isnull ? "false" : DatumGetBool(ret) ? "true" : "false",
+ isnull ? "true" : "false");

(4) pgoutput_row_filter_init

BEFORE:
+  * we don't know yet if there is/isn't any row filters for this relation.
AFTER:
+  * we don't know yet if there are/aren't any row filters for this relation.

BEFORE:
+  * necessary at all. This avoids us to consume memory and spend CPU cycles
+  * when we don't need to.
AFTER:
+  * necessary at all. So this allows us to avoid unnecessary memory
+  * consumption and CPU cycles.

(5) pgoutput_row_filter

BEFORE:
+ * evaluates the row filter for that tuple and return.
AFTER:
+ * evaluate the row filter for that tuple and return.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: [PATCH] Implement INSERT SET syntax

2022-01-21 Thread wenjing zeng



Since this feature adds INSERT OVERRIDING SET syntax, it is recommended to add 
some related testcases.


Regards
Wenjing


> 2021年9月22日 07:38，Rachel Heaton  写道：
> 
>> On 4/23/20 8:04 PM, Gareth Palmer wrote:
>>> 
>>> Thank you for the review, attached is v7 of the patch which should
>>> apply correcly to HEAD.
>>> 
> 
> Hello Gareth,
> 
> This patch no longer applies to HEAD, can you please submit a rebased version?
> 
> Thanks,
> Rachel
> 
> 


0001-insert-overriding-set-case.patch
Description: Binary data

Re: postgres_fdw: incomplete subabort cleanup of connections used in async execution

2022-01-21 Thread Etsuro Fujita

On Wed, Dec 22, 2021 at 2:40 PM Etsuro Fujita  wrote:
> On Wed, Dec 22, 2021 at 1:08 AM Alexander Pyhalov
>  wrote:
> > Looks good to me.
>
> Great!  Thanks for reviewing!

I've committed the patch.

Best regards,
Etsuro Fujita

Re: New developer papercut - Makefile references INSTALL

2022-01-21 Thread Josef Šimánek

pá 21. 1. 2022 v 1:29 odesílatel samay sharma  napsal:
>
>
>
> On Wed, Jan 19, 2022 at 4:58 PM Tim McNamara  wrote:
>>
>> Hello,
>>
>> I encountered a minor road bump when checking out the pg source today. The 
>> Makefile's all target includes the following help message if GNUmakefile 
>> isn't available:
>>
>>   echo "You need to run the 'configure' program first. See the file"; \
>>   echo "'INSTALL' for installation instructions." ; \
>>
>> After consulting README.git, it looks as though INSTALL isn't created unless 
>> the source is bundled into a release or snapshot tarball. I'm happy to 
>> submit a patch to update the wording, but wanted to check on the preferred 
>> approach.
>>
>> Perhaps this would be sufficient?
>>
>>   echo "You need to run the 'configure' program first. See the file"; \
>>   echo "'INSTALL' for installation instructions, or visit" ; \
>>   echo "" ; \
>>
>> -Tim
>
>
> I noticed a similar thing in the README of the github repository. It asks to 
> see the INSTALL file for build and installation instructions but I couldn't 
> find that file and that confused me. This might confuse other new developers 
> as well. So, maybe we should update the text in the README too?

There is README.git explaining this. README itself is meant to be used
for distributed source code. You can generate INSTALL locally for
example by running make dist (INSTALL will be present in
postgresql-15devel directory).

Anyway I do agree this is confusing. Maybe we can actually rename
README.git to README and current README to README.dist or similar.
README.dist can be copied to distribution package as README during
Makefile magic.

I can try to provide a patch if welcomed.

> Regards,
> Samay

98 matches

Mail list logo