Re: Identify missing publications from publisher while create/alter subscription.

2021-11-12 Thread vignesh C
On Wed, Nov 10, 2021 at 11:16 AM Bharath Rupireddy
 wrote:
>
> On Tue, Nov 9, 2021 at 9:27 PM vignesh C  wrote:
> > Attached v12 version is rebased on top of Head.
>
> Thanks for the patch. Here are some comments on v12:
>
> 1) I think ERRCODE_TOO_MANY_ARGUMENTS isn't the right error code, the
> ERRCODE_UNDEFINED_OBJECT is more meaningful. Please change.
> + ereport(ERROR,
> + (errcode(ERRCODE_TOO_MANY_ARGUMENTS),
> + errmsg_plural("publication %s does not exist in the publisher",
> +"publications %s do not exist in the publisher",
>
> The existing code using
> ereport(ERROR,
> (errcode(ERRCODE_UNDEFINED_OBJECT),
>  errmsg("subscription \"%s\" does not exist", subname)));

Modified

> 2) Typo: It is "One of the specified publications exists."
> +# One of the specified publication exist.

Modified

> 3) I think we can remove the test case "+# Specified publication does
> not exist." because the "+# One of the specified publication exist."
> covers the code.

Modified

> 4) Do we need the below test case? Even with refresh = false, it does
> call connect_and_check_pubs() right? Please remove it.
> +# Specified publication does not exist with refresh = false.
> +($ret, $stdout, $stderr) = $node_subscriber->psql('postgres',
> +   "ALTER SUBSCRIPTION mysub1 SET PUBLICATION non_existent_pub
> WITH (REFRESH = FALSE, VALIDATE_PUBLICATION = TRUE)"
> +);
> +ok( $stderr =~
> + m/ERROR:  publication "non_existent_pub" does not exist in
> the publisher/,
> +   "Alter subscription for non existent publication fails");
> +

Modified

> 5) Change the test case names to different ones instead of the same.
> Have something like:
> "Create subscription fails with single non-existent publication");
> "Create subscription fails with multiple non-existent publications");
> "Create subscription fails with mutually exclusive options");
> "Alter subscription add publication fails with non-existent publication");
> "Alter subscription set publication fails with non-existent publication");
> "Alter subscription set publication fails with connection to a
> non-existent database");

Modified

Thanks for the comments, the attached v13 patch has the fixes for the same.

Regards,
Vignesh
From f85203bf95359dac6fb2eea3b973be7ceaadf4e1 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Tue, 9 Nov 2021 20:30:47 +0530
Subject: [PATCH v13] Identify missing publications from publisher while
 create/alter subscription.

Creating/altering subscription is successful when we specify a publication which
does not exist in the publisher. This patch checks if the specified publications
are present in the publisher and throws an error if any of the publication is
missing in the publisher.
---
 doc/src/sgml/ref/alter_subscription.sgml  |  13 ++
 doc/src/sgml/ref/create_subscription.sgml |  20 ++-
 src/backend/commands/subscriptioncmds.c   | 207 +++---
 src/bin/psql/tab-complete.c   |  14 +-
 src/test/subscription/t/007_ddl.pl|  56 +-
 5 files changed, 276 insertions(+), 34 deletions(-)

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 0b027cc346..95b6b6a0e5 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -168,6 +168,19 @@ ALTER SUBSCRIPTION name RENAME TO <
  
 

+
+   
+validate_publication (boolean)
+
+ 
+  When true, the command verifies if all the specified publications
+  that are being subscribed to are present in the publisher and throws
+  an error if any of the publication doesn't exist. The default is
+  false.
+ 
+
+   
+
   
 

diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 990a41f1a1..8ec028ad7d 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -110,12 +110,14 @@ CREATE SUBSCRIPTION subscription_nametrue.  Setting this to
   false will force the values of
-  create_slot, enabled and
-  copy_data to false.
+  create_slot, enabled,
+  copy_data and
+  validate_publication to false.
   (You cannot combine setting connect
   to false with
   setting create_slot, enabled,
-  or copy_data to true.)
+  copy_data or
+  validate_publication to true.)
  
 
  
@@ -170,6 +172,18 @@ CREATE SUBSCRIPTION subscription_name
 

+
+   
+validate_publication (boolean)
+
+ 
+  When true, the command verifies if all the specified publications
+  that are being subscribed to are present in the publisher and throws
+  an error if any of the publication doesn't exist. The default is
+  false.
+ 
+
+   
   
  
 
diff --git 

Re: Invalid Unicode escape value at or near "\u0000"

2021-11-12 Thread Thomas Munro
On Sat, Nov 13, 2021 at 4:32 PM Japin Li  wrote:
> When I try to insert an Unicode "\u", there is an error $subject.
>
> postgres=# CREATE TABLE tbl (s varchar(10));
> CREATE TABLE
> postgres=# INSERT INTO tbl VALUES (E'\u');
> ERROR:  invalid Unicode escape value at or near "\u"
> LINE 1: INSERT INTO tbl VALUES (E'\u');
>   ^
>
> "\u" is valid unicode [1], why not we cannot insert it?

Yes, it is a valid codepoint, but unfortunately PostgreSQL can't
support it because it sometimes deals in null terminated string, even
though internally it does track string data and length separately.  We
have to do that to use libc facilities like strcoll_r(), and probably
many other things.




Re: Parallel vacuum workers prevent the oldest xmin from advancing

2021-11-12 Thread Amit Kapila
On Fri, Nov 12, 2021 at 6:44 PM Alvaro Herrera  wrote:
>
> On 2021-Nov-11, Masahiko Sawada wrote:
>
> > On Thu, Nov 11, 2021 at 12:53 PM Amit Kapila  
> > wrote:
> > >
> > > On Thu, Nov 11, 2021 at 9:11 AM Andres Freund  wrote:
>
> > > > This seems like an unnecessary optimization. 
> > > > ProcArrayInstallRestoredXmin()
> > > > only happens in the context of much more expensive operations.
> > >
> > > Fair point. I think that will also make the change in
> > > ProcArrayInstallRestoredXmin() appear neat.
> >
> > This makes me think that it'd be better to copy status flags in a
> > separate function rather than ProcArrayInstallRestoredXmin().
>
> To me, and this is perhaps just personal opinion, it seems conceptually
> simpler to have ProcArrayInstallRestoredXmin acquire exclusive and do
> both things.  Why?  Because if you have two functions, you have to be
> careful not to call the new function after ProcArrayInstallRestoredXmin;
> otherwise you would create an instant during which you make an
> Xmin-without-flag visible to other procs; this causes the computed xmin
> go backwards, which is verboten.
>
> If I understand Amit correctly, his point is about the callers of
> RestoreTransactionSnapshot, which are two: CreateReplicationSlot and
> ParallelWorkerMain.  He wants you hypothetical new function called from
> the latter but not the former.  Looking at both, it seems a bit strange
> to make them responsible for a detail such as "copy ->statusFlags from
> source proc to mine".  It seems more reasonable to add a third flag to
>   RestoreTransactionSnapshot(Snapshot snapshot, void *source_proc, bool 
> is_vacuum)
> and if that is true, tell SetTransactionSnapshot to copy flags,
> otherwise not.
>

If we decide to go this way then I suggest adding a comment to convey
why we choose to copy status flags in ProcArrayInstallRestoredXmin()
as the function name doesn't indicate it.

>
> ... unrelated to this, and looking at CreateReplicationSlot, I wonder
> why does it pass MyProc as the source_pgproc parameter.  What good is
> that doing?  I mean, if the only thing we do with source_pgproc is to
> copy stuff from source_pgproc to MyProc, then if source_pgproc is
> MyProc, we're effectively doing nothing at all.  (You can't "fix" this
> by merely passing NULL, because what that would do is change the calling
> of ProcArrayInstallRestoredXmin into a call of
> ProcArrayInstallImportedXmin and that would presumably have different
> behavior.)  I may be misreading the code of course, but it sounds like
> the intention of CreateReplicationSlot is to "do nothing" with the
> transaction snapshot in a complicated way.
>

It ensures that the source transaction is still running, otherwise, it
won't allow the import to be successful. It also seems to help by
updating the state for GlobalVis* stuff. I think in the current form
it seems to help in not moving MyProc-xmin and TransactionXmin
backward due to checks in ProcArrayInstallRestoredXmin() and also
change them to the value in source snapshot.

-- 
With Regards,
Amit Kapila.




Re: BUFFERS enabled by default in EXPLAIN (ANALYZE)

2021-11-12 Thread Julien Rouhaud
On Sat, Nov 13, 2021 at 8:18 AM Tomas Vondra
 wrote:
>
> On 11/12/21 23:58, Nikolay Samokhvalov wrote:
> > Re-reading the thread [1] (cannot answer there – don't have those emails
> > in my box anymore),
>
> You can download the message as mbox and import it into your client
> (pretty much any client supports that, I think).

There is also a "resend mail" link available for each mail in the
archive which should be even more straightforward.




Re: Allow users to choose what happens when recovery target is not reached

2021-11-12 Thread Julien Rouhaud
On Sat, Nov 13, 2021 at 11:00 AM Bharath Rupireddy
 wrote:
>
> Users will always be optimistic and set a recovery target and try to
> reach it, but somehow the few of the WAL files haven't arrived (for
> whatever the reasons) the PITR target server, imagine if their primary
> isn't available too, then with the proposal I made, they can choose to
> have at least an available target server rather than a FATALly failed
> one.

If your primary server isn't available, why would you want a recovery
target in the first place?  I just don't understand in which case
someone would want to setup a recovery target and wouldn't care if the
recovery wasn't reached, especially if it can be off by GB / days of
data.

It seems like it could have the opposite effect of what you want most
of the time.  What if for some reason the restore_command is flawed,
and you end up starting your server because it couldn't restore WAL
that are actually available?  You would have to restart from scratch
and waste more time than if you didn't use this.

It look like what you actually want is some kind of a target window,
but the window you currently propose is a hardcoded (consistency,
given target], and it seems too dangerous to be useful.




Invalid Unicode escape value at or near "\u0000"

2021-11-12 Thread Japin Li


Hi, hackers

When I try to insert an Unicode "\u", there is an error $subject.

postgres=# CREATE TABLE tbl (s varchar(10));
CREATE TABLE
postgres=# INSERT INTO tbl VALUES (E'\u');
ERROR:  invalid Unicode escape value at or near "\u"
LINE 1: INSERT INTO tbl VALUES (E'\u');
  ^

"\u" is valid unicode [1], why not we cannot insert it?

[1] https://www.unicode.org/charts/PDF/U.pdf

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Inconsistent error message for varchar(n)

2021-11-12 Thread Japin Li


Hi, hackers

When I try to create table that has a varchar(n) data type, I find an
inconsistent error message for it.

postgres=# CREATE TABLE tbl (s varchar(2147483647));
ERROR:  length for type varchar cannot exceed 10485760
LINE 1: CREATE TABLE tbl (s varchar(2147483647));
^

postgres=# CREATE TABLE tbl (s varchar(2147483648));
ERROR:  syntax error at or near "2147483648"
LINE 1: CREATE TABLE tbl (s varchar(2147483648));
^

I find that in gram.y the varchar has an integer parameter which
means its value don't exceed 2147483647.

The first error message is reported by anychar_typmodin(), and the later
is reported by gram.y.  IMO, the syntax error for varchar(n) is more
confused.

Any thoughts?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Logical Replication - improve error message while adding tables to the publication in check_publication_add_relation

2021-11-12 Thread Bharath Rupireddy
On Sat, Nov 13, 2021 at 12:06 AM Euler Taveira  wrote:
> > Here's a rebased v8 patch. Please review it.
>
> This improves the user experience by increasing the granularity of error
> reporting, and maps well with the precedent set in 81d5995b4.  I'm marking 
> this
> Ready for Committer and will go ahead and apply this unless there are
> objections.
>
> Shouldn't we modify errdetail_relkind_not_supported() to include 
> relpersistence
> as a 2nd parameter and move those messages to it? I experiment this idea with
> the attached patch. The idea is to provide a unique function that reports
> accurate detail messages.

Thanks. It is a good idea to use errdetail_relkind_not_supported. I
slightly modified the API to "int errdetail_relkind_not_supported(Oid
relid, Form_pg_class rd_rel);" to simplify things and increase the
usability of the function further. For instance, it can report the
specific error for the catalog tables as well. And, also added "int
errdetail_relkind_not_supported _v2(Oid relid, char relkind, char
relpersistence);" so that the callers not having Form_pg_class (there
are 3 callers exist) can pass the parameters directly.

PSA v10.

Regards,
Bharath Rupireddy.
From 50e0570fe2c83dfddc85f3e7df0ffebb12c5ef3e Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sat, 13 Nov 2021 02:33:44 +
Subject: [PATCH v10] Improve publication error messages

Provide accurate messages for CREATE PUBLICATION commands. Also add
tests to cover temporary, unlogged, system and foreign tables.

While at it, modify errdetail_relkind_not_supported API to distinguish
tables based on persistence property and report for system tables.
Also, introduce errdetail_relkind_not_supported_v2 so that the callers
not having Form_pg_class object can send relid, relkind, relpersistence
as direct parameters. This way, the API is more flexbile and easier
to use. It also provides accurate detail messages for future work.
---
 contrib/amcheck/verify_heapam.c   |  2 +-
 contrib/pageinspect/rawpage.c |  2 +-
 contrib/pg_surgery/heap_surgery.c |  2 +-
 contrib/pg_visibility/pg_visibility.c |  2 +-
 contrib/pgstattuple/pgstatapprox.c|  2 +-
 contrib/pgstattuple/pgstatindex.c |  2 +-
 contrib/pgstattuple/pgstattuple.c |  2 +-
 .../postgres_fdw/expected/postgres_fdw.out|  6 
 contrib/postgres_fdw/sql/postgres_fdw.sql |  5 
 src/backend/catalog/pg_class.c| 24 ++--
 src/backend/catalog/pg_publication.c  | 28 ++-
 src/backend/commands/comment.c|  2 +-
 src/backend/commands/indexcmds.c  |  2 +-
 src/backend/commands/lockcmds.c   |  5 ++--
 src/backend/commands/seclabel.c   |  2 +-
 src/backend/commands/sequence.c   |  2 +-
 src/backend/commands/statscmds.c  |  2 +-
 src/backend/commands/subscriptioncmds.c   |  4 +--
 src/backend/commands/tablecmds.c  |  9 +++---
 src/backend/commands/trigger.c|  6 ++--
 src/backend/executor/execReplication.c|  5 ++--
 src/backend/parser/parse_utilcmd.c|  2 +-
 src/backend/replication/logical/relation.c|  2 +-
 src/backend/rewrite/rewriteDefine.c   |  4 +--
 src/include/catalog/pg_class.h|  4 ++-
 src/include/executor/executor.h   |  2 +-
 src/test/regress/expected/publication.out | 16 +++
 src/test/regress/sql/publication.sql  | 14 ++
 28 files changed, 107 insertions(+), 53 deletions(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index bae5340111..f6b16c8fb0 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -314,7 +314,7 @@ verify_heapam(PG_FUNCTION_ARGS)
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("cannot check relation \"%s\"",
 		RelationGetRelationName(ctx.rel)),
- errdetail_relkind_not_supported(ctx.rel->rd_rel->relkind)));
+ errdetail_relkind_not_supported(RelationGetRelid(ctx.rel), ctx.rel->rd_rel)));
 
 	/*
 	 * Sequences always use heap AM, but they don't show that in the catalogs.
diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index 4bfa346c24..3173fb83d3 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -160,7 +160,7 @@ get_raw_page_internal(text *relname, ForkNumber forknum, BlockNumber blkno)
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("cannot get raw page from relation \"%s\"",
 		RelationGetRelationName(rel)),
- errdetail_relkind_not_supported(rel->rd_rel->relkind)));
+ errdetail_relkind_not_supported(RelationGetRelid(rel), rel->rd_rel)));
 
 	/*
 	 * Reject attempts to read non-local temporary relations; we would be
diff --git a/contrib/pg_surgery/heap_surgery.c b/contrib/pg_surgery/heap_surgery.c
index 7edfe4f326..043efda2f6 100644
--- a/contrib/pg_surgery/heap_surgery.c

Re: Allow users to choose what happens when recovery target is not reached

2021-11-12 Thread Bharath Rupireddy
On Fri, Nov 12, 2021 at 4:09 PM Julien Rouhaud  wrote:
>
> On Fri, Nov 12, 2021 at 6:14 PM Bharath Rupireddy
>  wrote:
> >
> > Currently, the server shuts down with a FATAL error (added by commit
> > [1]) when the recovery target isn't reached. This can cause a server
> > availability problem, especially in case of disaster recovery (geo
> > restores) where the primary was down and the user is doing a PITR on a
> > server lying in another region where it had missed to receive few of
> > the last WAL files required to reach the recovery target. In this
> > case, users might want the server to be available rather than a no
> > server. With the commit [1], there's no way to achieve what users
> > wanted.
>
> if users don't mind if the recovery target is reached or not isn't it
> better to simply don't specify a target and let the recovery go as far
> as possible?

Users will always be optimistic and set a recovery target and try to
reach it, but somehow the few of the WAL files haven't arrived (for
whatever the reasons) the PITR target server, imagine if their primary
isn't available too, then with the proposal I made, they can choose to
have at least an available target server rather than a FATALly failed
one.

Regards,
Bharath Rupireddy.




Re: support for MERGE

2021-11-12 Thread Zhihong Yu
On Fri, Nov 12, 2021 at 9:58 AM Alvaro Herrera 
wrote:

> Here's a new version.  Many of the old complaints have been fixed;
> particularly, the handling of partitioned tables is now much cleaner and
> straightforward.  Amit Langote helped considerably in getting this part
> to shape -- thanks for that.  Amit also helped correct the EvalPlanQual
> behavior, which wasn't quite up to snuff.
>
> There are a few things that can still be improved here.  For one, I need
> to clean up the interactions with table AM (and thus heapam.c etc).
> Secondarily, and I'm now not sure that I really want to do it, is change
> the representation for executor: instead of creating a fake join between
> target and source, perhaps we should have just source, and give
> optimizer a separate query to fetch tuples from target.
>
> What I did do is change how the target table is represented from parse
> analysis to executor.  For example, in the original code, there were two
> RTEs that represented the target table; that is gone.  Now the target
> table is always just the query's resultRelation.  This removes a good
> number of ugly hacks that had been objected to.
>
> I'll park this in the January commitfest now.
>
> --
> Álvaro Herrera  Valdivia, Chile  —
> https://www.EnterpriseDB.com/
> "Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad;
> jugar
> al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en
> La Feria de las Tinieblas, R. Bradbury)
>
Hi,
This is continuation of review.

+   elog(WARNING, "hoping nothing needed here");

the above warning can be dropped.

+   ExprState  *mas_whenqual;   /* WHEN [NOT] MATCHED AND conditions */

In my earlier comment I suggested changing notMatched to unmatched - I
didn't pay attention to the syntax.
That suggestion can be ignored.

Cheers


Re: support for MERGE

2021-11-12 Thread Zhihong Yu
On Fri, Nov 12, 2021 at 3:13 PM Alvaro Herrera 
wrote:

> On 2021-Nov-12, Zhihong Yu wrote:
>
> > Hi,
> >
> > +   skipped_path = total - insert_path - update_path -
> delete_path;
> >
> > Should there be an assertion that skipped_path is not negative ?
>
> Hm, yeah, added.
>
> > +* We maintain separate transaction tables for UPDATE/INSERT/DELETE
> since
> > +* MERGE can run all three actions in a single statement. Note that
> UPDATE
> > +* needs both old and new transition tables
> >
> > Should the 'transaction' in the first line be transition ?
>
> Oh, of course.
>
> Uploaded fixup commits to
> https://github.com/alvherre/postgres/commits/merge-15
>
> --
> Álvaro Herrera  Valdivia, Chile  —
> https://www.EnterpriseDB.com/

Hi,

+   resultRelInfo->ri_notMatchedMergeAction = NIL;

ri_notMatchedMergeAction -> ri_unmatchedMergeAction

+static void ExecMergeNotMatched(ModifyTableState *mtstate,

ExecMergeNotMatched -> ExecMergeUnmatched

+*In this case, we are still dealing with a WHEN MATCHED case.
+*In this case, we recheck the list of WHEN MATCHED actions from

It seems the comment can be simplified to:

+*In this case, since we are still dealing with a WHEN MATCHED case,
+*we recheck the list of WHEN MATCHED actions from

+* If we got no tuple, or the tuple we get
has

'get' appears in different tenses. Better use either 'get' or 'got' in both
places.

+lmerge_matched:
...
+   foreach(l, resultRelInfo->ri_matchedMergeAction)

I suggest expanding the foreach macro into the form of for loop where the
loop condition has extra boolean variable merge_matched.
Initial value for merge_matched can be true.
Inside the loop, we can adjust merge_matched's value to control whether the
for loop continues.
This would avoid using goto label.

+   if (commandType == CMD_UPDATE && tuple_updated)

Since commandType can only be update or delete, it seems tuple_updated
and tuple_deleted can be consolidated into one boolean variable
(tuple_modified).
The above point is personal preference.

+* We've activated one of the WHEN clauses, so we don't search
+* further. This is required behaviour, not an optimization.
+*/
+   break;

We can directly return instead of break'ing.

+* Similar logic appears in ExecInitPartitionInfo(), so if changing
+* anything here, do so there too.

The above implies code dedup via refactoring - can be done in a separate
patch.

To be continued ...


Re: BUFFERS enabled by default in EXPLAIN (ANALYZE)

2021-11-12 Thread Vik Fearing
On 11/12/21 11:58 PM, Nikolay Samokhvalov wrote:
> Re-reading the thread [1] (cannot answer there – don't have those emails in
> my box anymore), I see that there was strong support for enabling BUFFERS
> in EXPLAIN ANALYZE by default. And there were patches. Commitfest entry [2]
> was marked Rejected because there were questions to the implementation
> based on GUCs.
> 
> Attached is a simple patch enabling BUFFERS by default, without involving
> GUCs.

I obviously agree with this (although I still wish we had gucs as we
keep adding more and more options to EXPLAIN).

The patch looks good to me, too.

+1
-- 
Vik Fearing




Re: ALTER TABLE DETACH PARTITION violates serializability

2021-11-12 Thread Robert Haas
On Fri, Nov 12, 2021 at 3:27 PM Tom Lane  wrote:
> I wasn't aware of $SUBJECT ... were you?

I wasn't aware of this particular example, but I didn't spend much
time worrying about it, because DDL interleaved with DML generally
isn't serializable anyway. If you issue "SELECT * FROM
some_previously_unacessed_table" in a transaction, it will find
some_table even if the XID that created that table is not visible to
your snapshot. And conversely if that table has been dropped by some
XID that is not visible to your snapshot, you will get an error that
the table doesn't exist. There's also ATRewriteTable(), which is not
MVCC-aware, and thus if someone applies that to a table before you
first access it, you may see rows you shouldn't see and fail to see
rows which you should see. I don't see those kinds of cases as being
all that different from this one.

My judgement is that outwaiting other XIDs would make far more people
unhappy than it would make happy. If we want to have that as an
optional behavior, I guess that would be OK, but I sure wouldn't make
it the default. I feel like I went to quite a bit of effort to keep
the lock levels as low as reasonably possible here and to allow as
much concurrency as I could, and I bet there are still a LOT more
people who are unhappy about things that take more locks than they'd
like than there are people who are sad about these kinds of MVCC
issues.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: BUFFERS enabled by default in EXPLAIN (ANALYZE)

2021-11-12 Thread Tomas Vondra




On 11/12/21 23:58, Nikolay Samokhvalov wrote:
Re-reading the thread [1] (cannot answer there – don't have those emails 
in my box anymore),


You can download the message as mbox and import it into your client 
(pretty much any client supports that, I think).



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Commitfest 2021-11 Patch Triage - Part 2

2021-11-12 Thread Tomas Vondra

On 11/10/21 16:54, Andrey Borodin wrote:




Daniel Gustafsson  writes:


2773: libpq compression
===
This patch intended to provide libpq connection compression to "replace SSL
compression" which was doomed when the patch was written, and have since been
removed altogether. The initial approach didn't get much traction but there
was significant discussion and work, which has since fizzled out. The patch
has been updated but there hasn't been meaningful review the past months, the
last comments seem to imply there being a fair amount of questionmarks left in
here. Robert, having been very involved in this do you have any thoughts on
where we are and where to go (if at all IYO)?


I'm not Robert, but I still have an opinion here, and that it's that this
feature would at best be an attractive nuisance. If you need compression
on a database session, it probably means that the connection is over the
open internet, which means that you need encryption even more. And we
know that compression and encryption do not play well together. The
reason compression was taken out of the latest TLS standards is not that
they wouldn't have liked to have it, nor that applying compression in a
separate code layer would be any safer. I fear offering this would
merely lead people to build CVE-worthy setups.



Compression is crucial for highly available setups. Replication traffic is 
often billed. Or route has bandwidth limits.
An entropy added by WAL headers makes CRIME attack against replication 
encryption impractical.


I very much doubt WAL headers are a reliable protection against CRIME, 
because the entropy of the headers is likely fairly constant. So if you 
compress the WAL stream, the WAL headers may change but the compression 
ratio should be pretty similar. At least that's my guess.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Should AT TIME ZONE be volatile?

2021-11-12 Thread Thomas Munro
On Sat, Nov 13, 2021 at 11:47 AM Ilya Anfimov  wrote:
>  Currently for glibc the version  looks  like  glibc  version  at
> initdb,  and that doesn't seem very reliable, but that could be a
> different task (to find LC_COLLATE  file  and  put  hash  of  the
> usuable data into version string, for example).

Yeah, I had a system exactly like that working (that is, a way to run
arbitrary commands to capture version strings, that could be used to
hash your collation definition files, patches somewhere in the
archives), but then we thought it'd be better to use glibc versions,
and separately, to perhaps try to ask the glibc people to expose a
version.  FreeBSD (at my request), Windows and ICU do expose versions
in a straightforward way, and we capture those.

>  Currently,  it  is  questionable  how to work with the different
> versions of collations -- but that could be solved e.g.  via  ap-
> propriate  naming.   Perhaps "collation@ver" ? But if the version
> would contain a hash, a full version could be a bit dubious.
>  And some database maintainance task could check that all the old
> collations are available, rename them as needed, and create a set
> the new ones.
>  Automatically invalidating all the indexes, unfortunately.

We built a system that at least detected the changes on a per-index
level, but failed to ship it in release 14.  See revert commit, and
links back to previous commits and discussion:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ec48314708262d8ea6cdcb83f803fc83dd89e721

It didn't invalidate indexes, but complained about each individual
index on first access in each session, until you REINDEXed it.  We
will try again :-)

In the archives you can find discussions of how to make a system that
tolerates multiple version existing at the same time as I think you're
getting at, like DB2 does.  It's tricky, because it's code and data.
DB2 ships with N copies of ICU in it.




Re: Commitfest 2021-11 Patch Triage - Part 2

2021-11-12 Thread Tomas Vondra




On 11/10/21 00:21, Bruce Momjian wrote:

On Tue, Nov  9, 2021 at 12:43:20PM -0500, Stephen Frost wrote:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

Daniel Gustafsson  writes:
I'm not Robert, but I still have an opinion here, and that it's that this
feature would at best be an attractive nuisance.  If you need compression
on a database session, it probably means that the connection is over the
open internet, which means that you need encryption even more.  And we
know that compression and encryption do not play well together.  The
reason compression was taken out of the latest TLS standards is not that
they wouldn't have liked to have it, nor that applying compression in a
separate code layer would be any safer.  I fear offering this would
merely lead people to build CVE-worthy setups.


I've got an opinion on this also and I don't agree that needing
compression means you're on the open internet or that we shouldn't allow
users the choice to decide if they want to use compression and
encryption together or not.  Yes, there's potential risks there, but I
don't think those risks would lead to CVEs against PG for supporting a
mode where we allow compression and then also allow encryption- if
that's a problem then it's an issue for the encryption library/method
being used and isn't the fault of us for allowing that combination.


Yeah, I am leaning this diretion too.  On the one hand, the number of
ways to exploit compression-then-encryption seem to continue to grow,
but it is the complex behavior of HTTPS and connecting to multiple sites
which seems to always be required, and we don't do that with the
Postgres client/server protocol.



+0.5

I don't think the number of attacks grows all that much, actually. Or 
more precisely, the new vulnerabilities are just variations on the CRIME 
pattern adopting it to different contexts.


For example Voracle [1] applies to openvpn connections, but in principle 
it's the same scenario - observing changes to the compressed size, after 
tweaking the requests.


[1] https://community.openvpn.net/openvpn/wiki/VORACLE

So it's true compression is a side-channel, but these attacks require 
other ingredients too - ability to MITM the connection, and ability to 
inject custom data.


Doing MITM is probably harder than when attacking HTTPS, because DB 
connections tend to be between app server and db server - so setting up 
a rogue AP does not really work. Sometimes people / DBAs may connect to 
the DB directly, although in a way that makes it hard to do modify the 
data (which is why psql/ssh are immune to this). But it's possible.


But how hard is injecting custom data in a way that'd make CRIME-like 
issue possible? Yes, you could hijack webapp similarly to CRIME, in a 
way that initiates database requests with custom data. But we don't 
generally send both the supplied and actual password to the DB to do the 
check there - we just send one of them, and check it against the value 
stored in the DB.


The one (somewhat) plausible scenario I can think of is an application doing

   UPDATE t SET field = user_data WHERE id = 1;
   SELECT field FROM t WHERE ...;

in which case we might learn if some other row in the result contains 
the same value. Not sure how practical this could be, though.


AFAIK OpenVPN still supports tunnel compression, but it's disabled by 
default. That probably says something about what they think about the 
feature, although that might be due to having compression and not 
wanting to just remove it.


One interesting idea OpenVPN allows is asymmetric compression, i.e. 
allowing compression in just one direction. For example it might be 
possible for the client to send compressed data, but the server would 
never send compressed. Or vice versa. That might still help with e.g. 
large imports or exports, depending on the use case, while maybe 
mitigating some issues. But it's clearly not a solid defense.



regards


--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: ALTER TABLE DETACH PARTITION violates serializability

2021-11-12 Thread Tom Lane
Alvaro Herrera  writes:
> I understand that the behavior is not fully correct, but given the way
> most people are going to use this (which is that they're no longer
> terribly interested in the data of the partition being detached/dropped)
> and the severity of the penalties if we implement a full solution, I
> lean towards documenting it rather than fixing it.

> Another option might be to play with the trade-offs in the CONCURRENTLY
> mode vs. the regular one.  If we make the CONCURRENTLY mode wait for all
> snapshots, we would only be making worse a mode that's already
> documented to potentially take a long time.  So people who want safety
> can use that mode, and the others are aware of the risk.

Meh ... "wrong by default" doesn't seem like it fits the Postgres ethos.
How about adding an option UNSAFE (orthogonal to CONCURRENTLY) that
activates the current behavior, and without it we wait for everything?

regards, tom lane




Re: support for MERGE

2021-11-12 Thread Alvaro Herrera
On 2021-Nov-12, Zhihong Yu wrote:

> Hi,
> 
> +   skipped_path = total - insert_path - update_path - delete_path;
> 
> Should there be an assertion that skipped_path is not negative ?

Hm, yeah, added.

> +* We maintain separate transaction tables for UPDATE/INSERT/DELETE since
> +* MERGE can run all three actions in a single statement. Note that UPDATE
> +* needs both old and new transition tables
> 
> Should the 'transaction' in the first line be transition ?

Oh, of course.

Uploaded fixup commits to
https://github.com/alvherre/postgres/commits/merge-15

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: ALTER TABLE DETACH PARTITION violates serializability

2021-11-12 Thread Alvaro Herrera
On 2021-Nov-12, Tom Lane wrote:

> I wasn't aware of $SUBJECT ... were you?

Yeah, I remember pointing out that DETACH and DROP and not fully correct
for serializability, but I can't find any thread where I said it in so
many words.  At the time I had no ideas on how to fix it; the idea of
waiting for snapshots to go away didn't occur to anybody, or at least it
didn't reach me.

> AFAICS, the only real way to fix this is to acquire lock on
> the target partition and then wait out any snapshots that are
> older than the lock, just in case those transactions would look
> at the partitioned table later.  I'm not sure if we want to go
> there, but if we don't, we at least have to document this gotcha.

ALTER TABLE DETACH PARTITION CONCURRENTLY already has a
wait-for-snapshots phase.  It doesn't fix this problem, because it
doesn't wait for all snapshots, just the ones holding any lock on the
parent table.  I'm not sure how upset would people be if we made it wait
on *all* snapshots, and worse, to make it in all cases and not just
CONCURRENTLY.

I understand that the behavior is not fully correct, but given the way
most people are going to use this (which is that they're no longer
terribly interested in the data of the partition being detached/dropped)
and the severity of the penalties if we implement a full solution, I
lean towards documenting it rather than fixing it.

Another option might be to play with the trade-offs in the CONCURRENTLY
mode vs. the regular one.  If we make the CONCURRENTLY mode wait for all
snapshots, we would only be making worse a mode that's already
documented to potentially take a long time.  So people who want safety
can use that mode, and the others are aware of the risk.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."   (Fotis)
   (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)




BUFFERS enabled by default in EXPLAIN (ANALYZE)

2021-11-12 Thread Nikolay Samokhvalov
Re-reading the thread [1] (cannot answer there – don't have those emails in
my box anymore), I see that there was strong support for enabling BUFFERS
in EXPLAIN ANALYZE by default. And there were patches. Commitfest entry [2]
was marked Rejected because there were questions to the implementation
based on GUCs.

Attached is a simple patch enabling BUFFERS by default, without involving
GUCs.

Why it is important?

In many cases, people forget about the BUFFERS option in EXPLAIN ANALYZE
and share execution plans without it – sending it via regular communication
channels for work or publishing to visualization systems. Meanwhile, the
option has a lower overhead compared to TIMING (enabled by default for
EXPLAIN ANALYZE) and it is extremely useful for query
optimization. This patch doesn't enable BUFFERS for EXPLAIN executed
without ANALYZE.

Open questions:

1. Should BUFFERS be enabled for regular (w/o ANALYZE) EXPLAIN? Now it may
make sense because of the buffer numbers the planner uses. This patch
doesn't do that, but it definitely may make sense because it can help
people understand why planning time is so big, in some cases.

2. How to adjust documentation? Should EXPLAIN ANALYZE examples be adjusted
to use BUFFERS OFF (easier change) or show some example buffer numbers –
like it is done for timing and cost numbers? I tend to think that the
latter makes more sense.

3. How to adjust regression tests? Of course, now many tests fail. Same
question as for documentation. Excluding buffer, numbers would be an easier
fix, of course – but at least some tests should
check the behavior of BUFFERS (such as both default options – with and
without ANALYZE).
On any given platform, the buffer numbers are pretty stable, so we could
rely on it, but I'm not sure about all possible options being tested and
would appreciate advice here (of course, if the patch makes it thru the
discussion in general).


## Links
[1]
https://www.postgresql.org/message-id/flat/b3197ba8-225f-f53c-326d-5b1756c77c3e%40postgresfriends.org
[2] https://commitfest.postgresql.org/28/2567/


001-buffers-in-explain-analyze-enabled-by-default.patch
Description: Binary data


Re: Should AT TIME ZONE be volatile?

2021-11-12 Thread Ilya Anfimov
On Thu, Nov 11, 2021 at 09:52:52AM -0500, Tom Lane wrote:
> Robert Haas  writes:
> > I'm not really convinced that ICU is better, either. I think it's more
> > that it isn't used as much.
> 
> Well, at least ICU has a notion of attaching versions to collations.
> How mindful they are of bumping the version number when necessary
> remains to be seen.  But the POSIX locale APIs don't even offer the
> opportunity to get it right.
> 
> > I don't have any constructive proposal for what to do about any of
> > this. It sure is frustrating, though.
> 
> Yup.  If we had reliable ways to detect changes in this sort of
> environment-supplied data, maybe we could do something about it
> (a la the work that's been happening on attaching collation versions
> to indexes).  But personally I can't summon the motivation to work

 Theoretically there are versions attached to collations already:
the collation in index is an oid referencing the pg_collation.
 And the pg_collation already has versions.

 Currently for glibc the version  looks  like  glibc  version  at
initdb,  and that doesn't seem very reliable, but that could be a
different task (to find LC_COLLATE  file  and  put  hash  of  the
usuable data into version string, for example).

 Currently,  it  is  questionable  how to work with the different
versions of collations -- but that could be solved e.g.  via  ap-
propriate  naming.   Perhaps "collation@ver" ? But if the version
would contain a hash, a full version could be a bit dubious.
 And some database maintainance task could check that all the old
collations are available, rename them as needed, and create a set
the new ones.
 Automatically invalidating all the indexes, unfortunately.

> on that, when ICU is the *only* such infrastructure that offers
> readily program-readable versioning.
> 
>   regards, tom lane
> 




Re: Undocumented AT TIME ZONE INTERVAL syntax

2021-11-12 Thread Tom Lane
Corey Huinker  writes:
> Attached is an attempt at an explanation of the edge cases I was
> encountering, as well as some examples. If nothing else, the examples will
> draw eyes and searches to the explanations that were already there.

I looked this over and have a few thoughts:

* I don't think your explanation of the behavior of colon-separated
times is quite correct; for example, it doesn't correctly describe
this:

regression=# select INTERVAL '2:03:04' minute to second;
 interval 
--
 02:03:04
(1 row)

I think the actual rule is that hh:mm:ss is always interpreted that
way regardless of the typmod (though we may then drop low-order
fields if the typmod says to).  Two colon-separated numbers are
interpreted as hh:mm by default, but as mm:ss if the typmod is
exactly "minute to second".  (This might work better in a separate
para; the one you've modified here is mostly about what we do with
unmarked quantities, but the use of colons makes these numbers
not unmarked.)

* I'm not sure I would bother with examples for half-broken formats
like "2:".  People who really want to know about that can experiment,
while for the rest of us it seems like adding confusion.

* In the same vein, I think your 0002 example adds way more confusion
than illumination.  Maybe better would be a less contrived offset,
say

# SELECT TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00' AT TIME ZONE 
INTERVAL '3:00:00';
  timezone   
-
 2001-02-16 23:38:40
(1 row)

which could be put after the second example and glossed as "The third
example rotates a timestamp specified in UTC to the zone three hours
east of Greenwich, using a constant interval as the time zone
specification".

regards, tom lane




Re: Commitfest 2021-11 Patch Triage - Part 2

2021-11-12 Thread Nikolay Samokhvalov
On Tue, Nov 9, 2021 at 7:02 AM Tom Lane  wrote:

> Daniel Gustafsson  writes:
> > 2773: libpq compression
> > ===
> > This patch intended to provide libpq connection compression to "replace
> SSL
> > compression" which was doomed when the patch was written, and have since
> been
> > removed altogether.  The initial approach didn't get much traction but
> there
> > was significant discussion and work, which has since fizzled out.  The
> patch
> > has been updated but there hasn't been meaningful review the past
> months, the
> > last comments seem to imply there being a fair amount of questionmarks
> left in
> > here.  Robert, having been very involved in this do you have any
> thoughts on
> > where we are and where to go (if at all IYO)?
>
> I'm not Robert, but I still have an opinion here, and that it's that this
> feature would at best be an attractive nuisance.  If you need compression
> on a database session, it probably means that the connection is over the
> open internet, which means that you need encryption even more.
>

This assumption is very vague. I personally had multiple cases when network
bandwidth for app <--> Postgres communication, that were fixed wither via
upgrades (spending more money) or making application developers cache more
data (optimization), or, usually, both. Never public internet connections
were involved.

Actually, any growing startup experiences such issues sooner or later.
Network compression would be a great tool for many setups, and some other
popular database systems offer it for a long.


Re: support for MERGE

2021-11-12 Thread Zhihong Yu
On Fri, Nov 12, 2021 at 9:58 AM Alvaro Herrera 
wrote:

> Here's a new version.  Many of the old complaints have been fixed;
> particularly, the handling of partitioned tables is now much cleaner and
> straightforward.  Amit Langote helped considerably in getting this part
> to shape -- thanks for that.  Amit also helped correct the EvalPlanQual
> behavior, which wasn't quite up to snuff.
>
> There are a few things that can still be improved here.  For one, I need
> to clean up the interactions with table AM (and thus heapam.c etc).
> Secondarily, and I'm now not sure that I really want to do it, is change
> the representation for executor: instead of creating a fake join between
> target and source, perhaps we should have just source, and give
> optimizer a separate query to fetch tuples from target.
>
> What I did do is change how the target table is represented from parse
> analysis to executor.  For example, in the original code, there were two
> RTEs that represented the target table; that is gone.  Now the target
> table is always just the query's resultRelation.  This removes a good
> number of ugly hacks that had been objected to.
>
> I'll park this in the January commitfest now.
>
> --
> Álvaro Herrera  Valdivia, Chile  —
> https://www.EnterpriseDB.com/
> "Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad;
> jugar
> al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en
> La Feria de las Tinieblas, R. Bradbury)
>
Hi,

+   skipped_path = total - insert_path - update_path - delete_path;

Should there be an assertion that skipped_path is not negative ?

+* We maintain separate transaction tables for UPDATE/INSERT/DELETE
since
+* MERGE can run all three actions in a single statement. Note that
UPDATE
+* needs both old and new transition tables

Should the 'transaction' in the first line be transition ?

cheers


Re: Add additional information to src/test/ssl/README

2021-11-12 Thread Daniel Gustafsson
> On 12 Nov 2021, at 16:21, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> Attached is a small addition mentioning PG_TEST_NOCLEAN
> 
> Maybe could use a bit of copy-editing, say
> 
> Data directories will also be left behind for analysis when a test fails;
> they are named according to the test filename.  But if the environment
> variable PG_TEST_NOCLEAN is set, data directories will be retained
> regardless of test status.

Agreed, that reads better.  Pushed with those changes.

--
Daniel Gustafsson   https://vmware.com/





Re: [Patch] ALTER SYSTEM READ ONLY

2021-11-12 Thread Robert Haas
On Mon, Nov 8, 2021 at 8:20 AM Amul Sul  wrote:
> Attached is the rebased version of refactoring as well as the
> pg_prohibit_wal feature patches for the latest master head (commit #
> 39a3105678a).

I spent a lot of time today studying 0002, and specifically the
question of whether EndOfLog must be the same as
XLogCtl->replayEndRecPtr and whether EndOfLogTLI must be the same as
XLogCtl->replayEndTLI.

The answer to the former question is "no" because, if we don't enter
redo, XLogCtl->replayEndRecPtr won't be initialized at all. If we do
enter redo, then I think it has to be the same unless something very
weird happens. EndOfLog gets set like this:

XLogBeginRead(xlogreader, LastRec);
record = ReadRecord(xlogreader, PANIC, false, replayTLI);
EndOfLog = EndRecPtr;

In every case that exists in our regression tests, EndRecPtr is the
same before these three lines of code as it is afterward. However, if
you test with recovery_target=immediate, you can get it to be
different, because in that case we drop out of the redo loop after
calling recoveryStopsBefore() rather than after calling
recoveryStopsAfter(). Similarly I'm fairly sure that if you use
recovery_target_inclusive=off you can likewise get it to be different
(though I discovered the hard way that recovery_target_inclusive=off
is ignored when you use recovery_target_name). It seems like a really
bad thing that neither recovery_target=immediate nor
recovery_target_inclusive=off have any tests, and I think we ought to
add some.

Anyway, in effect, these three lines of code have the effect of
backing up the xlogreader by one record when we stop before rather
than after a record that we're replaying. What that means is that
EndOfLog is going to be the end+1 of the last record that we actually
replayed. There might be one more record that we read but did not
replay, and that record won't impact the value we end up with in
EndOfLog. Now, XLogCtl->replayEndRecPtr is also that end+1 of the last
record that we actually replayed. To put that another way, there's no
way to exit the main redo loop after we set XLogCtl->replayEndRecPtr
and before we change LastRec. So in the cases where
XLogCtl->replayEndRecPtr gets initialized at all, it can only be
different from EndOfLog if something different happens when we re-read
the last-replayed WAL record than what happened when we read it the
first time. That seems unlikely, and would be unfortunate it if it did
happen. I am inclined to think that it might be better not to reread
the record at all, though. As far as this patch goes, I think we need
a solution that doesn't involve fetching EndOfLog from a variable
that's only sometimes initialized and then not doing anything with it
except in the cases where it was initialized.

As for EndOfLogTLI, I'm afraid I don't think that's the same thing as
XLogCtl->replayEndTLI. Now, it's hard to be sure, because I don't
think the regression tests contain any scenarios where we run recovery
and the values end up different. However, I think that the code sets
EndOfLogTLI to the TLI of the last WAL file that we read, and I think
XLogCtl->replayEndTLI gets set to the timeline from which that WAL
record originated. So imagine that we are looking for WAL that ought
to be in 00010003 but we don't find it; instead we
find 00020003 because our recovery target timeline is
2, or something that has 2 in its history. We will read the WAL for
timeline 1 from this file which has timeline 2 in the file name. I
think if recovery ends in this file before the timeline switch, these
values will be different. I did not try to construct a test case for
this today due to not having enough time, so it's possible that I'm
wrong about this, but that's how it looks to me from the code.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: simplifying foreign key/RI checks

2021-11-12 Thread Tom Lane
Amit Langote  writes:
> On Fri, Nov 12, 2021 at 8:19 AM Tom Lane  wrote:
>> Anyway, I think that (1) we should write some more test cases around
>> this behavior, (2) you need to establish the snapshot to use in two
>> different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
>> and (3) something's going to have to be done to repair the behavior
>> in v14 (unless we want to back-patch this into v14, which seems a
>> bit scary).

> Okay, I'll look into getting 1 and 2 done for this patch and I guess
> work with Alvaro on 3.

Actually, it seems that DETACH PARTITION is broken for concurrent
serializable/repeatable-read transactions quite independently of
whether they attempt to make any FK checks [1].  If we do what
I speculated about there, namely wait out all such xacts before
detaching, it might be possible to fix (3) just by reverting the
problematic change in ri_triggers.c.  I'm thinking the wait would
render it unnecessary to get FK checks to do anything weird about
partition lookup.  But I might well be missing something.

regards, tom lane

[1] https://www.postgresql.org/message-id/1849918.1636748862%40sss.pgh.pa.us




Re: Is heap_page_prune() stats collector accounting wrong?

2021-11-12 Thread Peter Geoghegan
On Fri, Nov 12, 2021 at 11:29 AM Peter Geoghegan  wrote:
> We compensate here precisely because we are not running in VACUUM (it
> has to be an opportunistic prune in practice).

> If we're not running in VACUUM, and have to make a statistics
> collector call, then we don't want to forget about DEAD tuples that
> were pruned-away (i.e. no longer have tuple storage) when they still
> have an LP_DEAD stub item. There is obviously no justification for
> just ignoring LP_DEAD items there, because we don't know when VACUUM
> is going to run next (since we are not VACUUM).

Attached patch clears this up by adding some comments. It also moves
the call to pgstat_update_heap_dead_tuples() from heap_page_prune() to
heap_page_prune_opt(), which feels like a better place for it to me.

-- 
Peter Geoghegan


v1-0001-Clear-up-ndeleted-pgstats-accounting.patch
Description: Binary data


Re: [PATCH v2] use has_privs_for_role for predefined roles

2021-11-12 Thread Joshua Brindle
On Wed, Nov 10, 2021 at 12:45 PM Bossart, Nathan  wrote:
>
> On 11/8/21, 2:19 PM, "Joshua Brindle"  wrote:
> > Thanks for the review, attached is an update with that comment fixed
> > and also sgml documentation changes that I missed earlier.
>
> I think there are a number of documentation changes that are still
> missing.  I did a quick scan and saw the "is member of" language in
> func.sgml, monitoring.sgml, pgbuffercache.sgml, pgfreespacemap.sgml,
> pgrowlocks.sgml, pgstatstatements.sgml, and pgvisibility.sgml.

All of these and also adminpack.sgml updated. I think that is all of
them but docs broken across lines and irregular wording makes it
difficult.

> 
>  By default, the pg_shmem_allocations view can be
> -   read only by superusers or members of the 
> pg_read_all_stats
> -   role.
> +   read only by superusers or roles with privilges of the
> +   pg_read_all_stats role.
> 
> 
>
> nitpick: "privileges" is misspelled.

Fixed, thanks for reviewing.


0001-use-has_privs_for_roles-for-predefined-role-checks.patch
Description: Binary data


ALTER TABLE DETACH PARTITION violates serializability

2021-11-12 Thread Tom Lane
I wasn't aware of $SUBJECT ... were you?

Here's a demonstration:

drop table if exists pk, fk, pk1, pk2;
create table pk (f1 int primary key) partition by list(f1);
create table pk1 partition of pk for values in (1);
create table pk2 partition of pk for values in (2);
insert into pk values(1);
insert into pk values(2);
create table fk (f1 int references pk);
insert into fk values(1);
insert into fk values(2);

In session 1, next do

regression=# begin isolation level serializable;
BEGIN
regression=*# select * from unrelated_table;   -- to set the xact snapshot
...

Now in session 2, do

regression=# delete from fk where f1=2;
DELETE 1
regression=# alter table pk detach partition pk2; 
ALTER TABLE

Back at session 1, we now see what's not only a serializability
violation, but a not-even-self-consistent view of the database:

regression=*# select * from fk;
 f1 

  1
  2
(2 rows)

regression=*# select * from pk;
 f1 

  1
(1 row)

This is slightly ameliorated by the fact that if session 1 has
already touched either pk or fk, locking considerations will
block the DETACH.  But only slightly.

(Dropping a partition altogether has the same issue, of course.)

AFAICS, the only real way to fix this is to acquire lock on
the target partition and then wait out any snapshots that are
older than the lock, just in case those transactions would look
at the partitioned table later.  I'm not sure if we want to go
there, but if we don't, we at least have to document this gotcha.

regards, tom lane




Re: Improving psql's \password command

2021-11-12 Thread Bossart, Nathan
On 11/12/21, 11:58 AM, "Tom Lane"  wrote:
> "Bossart, Nathan"  writes:
>> I haven't started on the SIGINT stuff yet, but I did extract the
>> PQuser() fix so that we can hopefully get that one out of the way.  I
>> made some small adjustments in an attempt to keep the branching in
>> this function from getting too complicated.
>
> Right, it makes sense to consider just this much as the back-patchable
> part, and leave the more complicated messing with simple_prompt()
> for HEAD only.  I made a couple further cosmetic tweaks and pushed it.

Thanks!

Nathan



Re: Improving psql's \password command

2021-11-12 Thread Tom Lane
"Bossart, Nathan"  writes:
> I haven't started on the SIGINT stuff yet, but I did extract the
> PQuser() fix so that we can hopefully get that one out of the way.  I
> made some small adjustments in an attempt to keep the branching in
> this function from getting too complicated.

Right, it makes sense to consider just this much as the back-patchable
part, and leave the more complicated messing with simple_prompt()
for HEAD only.  I made a couple further cosmetic tweaks and pushed it.

regards, tom lane




Re: Is heap_page_prune() stats collector accounting wrong?

2021-11-12 Thread Peter Geoghegan
On Fri, Nov 12, 2021 at 10:45 AM Peter Geoghegan  wrote:
> Let's assume that somehow I have it wrong. Even then, why should we
> compensate like this for the stats collector, but not for VACUUM?
> There's certainly no corresponding code in vacuumlazy.c that does a
> similar transformation with ndeleted.

I think that I figured it out myself, but it's very confusing. Could
definitely do with a better explanatory comment. Here's what I think
is actually going on here:

We compensate here precisely because we are not running in VACUUM (it
has to be an opportunistic prune in practice).

If we're running in VACUUM, then we are justified in ignoring
newly-pruned LP_DEAD items, because 1.) They're going to be turned
into LP_UNUSED items in the same VACUUM anyway (barring edge-cases),
and 2.) Newly-pruned LP_DEAD items are really no different to existing
LP_DEAD items that VACUUM found on the page before its own pruning
operation even began -- if VACUUM wants to count them or report on
them at all, then it had better count them itself, after pruning
(which it only began to do in Postgres 14).

If we're not running in VACUUM, and have to make a statistics
collector call, then we don't want to forget about DEAD tuples that
were pruned-away (i.e. no longer have tuple storage) when they still
have an LP_DEAD stub item. There is obviously no justification for
just ignoring LP_DEAD items there, because we don't know when VACUUM
is going to run next (since we are not VACUUM).

Does that sound correct?

-- 
Peter Geoghegan




Re: [BUG]Invalidate relcache when setting REPLICA IDENTITY

2021-11-12 Thread Euler Taveira
On Fri, Nov 12, 2021, at 3:10 AM, houzj.f...@fujitsu.com wrote:
> On Fri, Nov 12, 2021 1:33 PM Amit Kapila  wrote:
> > On Fri, Nov 12, 2021 at 10:50 AM Amit Kapila  
> > wrote:
> > > But won't that generate invalidation for the rel twice in the case
> > > (change Replica Identity from Nothing to some index) you mentioned in
> > > the previous email?
> > >
> > 
> > Oh, I see the point. I think this is okay because
> > AddRelcacheInvalidationMessage doesn't allow to add duplicate rel
> > invalidation. If that is the case I wonder why not simply register
> > invalidation without any check in the for loop as was the case with
> > Tang's original patch?
> 
> OK, I also think the code in Tang's original patch is fine.
> Attach the patch which register invalidation without any check in the for 
> loop.
WFM.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Is heap_page_prune() stats collector accounting wrong?

2021-11-12 Thread Peter Geoghegan
The following code appears at the end of heap_page_prune():

/*
 * If requested, report the number of tuples reclaimed to pgstats. This is
 * ndeleted minus ndead, because we don't want to count a now-DEAD root
 * item as a deletion for this purpose.
 */
if (report_stats && ndeleted > prstate.ndead)
pgstat_update_heap_dead_tuples(relation, ndeleted - prstate.ndead);

In general, heap_page_prune() has to know the exact definition of
ndeleted (which is its return value) -- and the definition is kind of
subtle. The definition is really "the number of tuples that had
storage before pruning, but not after pruning, because pruning
considered them DEAD and removed the tuple itself (though not
necessarily its line pointer, which could have just been marked
LP_DEAD rather than LP_UNUSED)". As far as I know, the handling for
all that at the end of heap_prune_chain() (at the point that we decide
some tuples from a HOT chain are considered DEAD) is correct. It
handles all the subtleties in a way that's consistent.

I can't see why we should need to compensate like this (by subtracting
prstate.ndead) at all, because, as I said, heap_page_prune() *already*
does everything for us. In particular, if the now-DEAD root item
started out as an LP_NORMAL tuple with storage (a root item that is a
plain heap tuple, or just a simple tuple on its own), then we already
make sure to count it in ndeleted here, at the end of
heap_prune_chain():

/*
 * If the root entry had been a normal tuple, we are deleting it, so
 * count it in the result.  But changing a redirect (even to DEAD
 * state) doesn't count.
 */
if (ItemIdIsNormal(rootlp))
ndeleted++;

So we're always going to count tuples that had storage but then got
pruned away in ndeleted -- we *do* want to count root items when
they're LP_NORMAL. We also avoid counting root items when that is
inappropriate: when they started out as an LP_REDIRECT, but became
LP_DEAD. It would be inappropriate in the case of an LP_REDIRECT root
item specifically, because it doesn't start out with tuple storage
anyway (heap-only tuples from the same HOT chain are another matter).
So why compensate when calling pgstat_update_heap_dead_tuples() at
all?

Let's assume that somehow I have it wrong. Even then, why should we
compensate like this for the stats collector, but not for VACUUM?
There's certainly no corresponding code in vacuumlazy.c that does a
similar transformation with ndeleted.

-- 
Peter Geoghegan




Re: RFC: Logging plan of the running query

2021-11-12 Thread Justin Pryzby
On Wed, Oct 13, 2021 at 05:28:30PM +0300, Ekaterina Sokolova wrote:
> Hi, hackers!
> 
> • pg_query_state is contrib with 2 patches for core (I hope someday
> Community will support adding this patches to PostgreSQL). It contains

I reviewed this version of the patch - I have some language fixes.

I didn't know about pg_query_state, thanks.

> To improve this situation, this patch adds
> pg_log_current_query_plan() function that requests to log the
> plan of the specified backend process.

To me, "current plan" seems to mean "plan of *this* backend" (which makes no
sense to log).  I think the user-facing function could be called
pg_log_query_plan().  It's true that the implementation is a request to another
backend to log its *own* query plan - but users shouldn't need to know about
the implementation.

> +Only superusers can request to log plan of the running query.

.. log the plan of a running query.

> +Note that nested statements (statements executed inside a function) are 
> not
> +considered for logging. Only the deepest nesting query's plan is logged.

Only the plan of the most deeply nested query is logged.

> + (errmsg("backend with PID %d is not running a 
> query",
> + MyProcPid)));

The extra parens around errmsg() are not needed since e3a87b499.

> + (errmsg("backend with PID %d is now holding a 
> page lock. Try again",

remove "now"

> + (errmsg("plan of the query running on backend with PID 
> %d is:\n%s",
> + MyProcPid, es->str->data),

Maybe this should say "query plan running on backend with PID 17793 is:"

> + * would cause lots of log messages and which can lead to denial of

remove "and"

> +  errmsg("must be a superuser to log information 
> about specified process")));

I think it should say not say "specified", since that sounds like the user
might have access to log information about some other processes:
| must be a superuser to log information about processes

> +
> + proc = BackendPidGetProc(pid);
> +
> + /*
> +  * BackendPidGetProc returns NULL if the pid isn't valid; but by the 
> time
> +  * we reach kill(), a process for which we get a valid proc here might
> +  * have terminated on its own.  There's no way to acquire a lock on an
> +  * arbitrary process to prevent that. But since this mechanism is 
> usually
> +  * used to below purposes, it might end its own first and the 
> information

used for below purposes,

-- 
Justin




Re: Logical Replication - improve error message while adding tables to the publication in check_publication_add_relation

2021-11-12 Thread Euler Taveira
On Fri, Nov 12, 2021, at 9:41 AM, Daniel Gustafsson wrote:
> > On 4 Nov 2021, at 05:24, Bharath Rupireddy 
> >  wrote:
> > 
> > On Wed, Nov 3, 2021 at 6:21 PM Daniel Gustafsson  wrote:
> >> 
> >>> On 26 Jul 2021, at 09:33, Bharath Rupireddy 
> >>>  wrote:
> >> 
> >>> PSA v7.
> >> 
> >> This patch no longer applies on top of HEAD, please submit a rebased 
> >> version.
> > 
> > Here's a rebased v8 patch. Please review it.
> 
> This improves the user experience by increasing the granularity of error
> reporting, and maps well with the precedent set in 81d5995b4.  I'm marking 
> this
> Ready for Committer and will go ahead and apply this unless there are
> objections.
Shouldn't we modify errdetail_relkind_not_supported() to include relpersistence
as a 2nd parameter and move those messages to it? I experiment this idea with
the attached patch. The idea is to provide a unique function that reports
accurate detail messages.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From f068d4688a95c8e8c5a98d8d6f1847ddfafda43c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Thu, 4 Nov 2021 04:22:21 +
Subject: [PATCH v9] Improve publication error messages

Provide accurate messages for CREATE PUBLICATION commands. While at it,
modify errdetail_relkind_not_supported to distinguish tables based on
its persistence property. It provides accurate detail messages for
future work. It also includes tests to cover temporary, unlogged, system
and foreign tables.
---
 contrib/amcheck/verify_heapam.c|  2 +-
 contrib/pageinspect/rawpage.c  |  2 +-
 contrib/pg_surgery/heap_surgery.c  |  2 +-
 contrib/pg_visibility/pg_visibility.c  |  2 +-
 contrib/pgstattuple/pgstatapprox.c |  2 +-
 contrib/pgstattuple/pgstatindex.c  |  2 +-
 contrib/pgstattuple/pgstattuple.c  |  2 +-
 contrib/postgres_fdw/expected/postgres_fdw.out |  6 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql  |  5 +
 src/backend/catalog/pg_class.c | 10 --
 src/backend/catalog/pg_publication.c   | 13 +++--
 src/backend/commands/comment.c |  2 +-
 src/backend/commands/indexcmds.c   |  2 +-
 src/backend/commands/lockcmds.c|  5 +++--
 src/backend/commands/seclabel.c|  2 +-
 src/backend/commands/sequence.c|  2 +-
 src/backend/commands/statscmds.c   |  2 +-
 src/backend/commands/tablecmds.c   |  9 +
 src/backend/commands/trigger.c |  6 +++---
 src/backend/executor/execReplication.c |  2 +-
 src/backend/parser/parse_utilcmd.c |  2 +-
 src/backend/rewrite/rewriteDefine.c|  4 ++--
 src/include/catalog/pg_class.h |  2 +-
 src/test/regress/expected/publication.out  | 16 
 src/test/regress/sql/publication.sql   | 14 ++
 25 files changed, 80 insertions(+), 38 deletions(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index bae5340111..43854d1361 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -314,7 +314,7 @@ verify_heapam(PG_FUNCTION_ARGS)
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("cannot check relation \"%s\"",
 		RelationGetRelationName(ctx.rel)),
- errdetail_relkind_not_supported(ctx.rel->rd_rel->relkind)));
+ errdetail_relkind_not_supported(ctx.rel->rd_rel->relkind, ctx.rel->rd_rel->relpersistence)));
 
 	/*
 	 * Sequences always use heap AM, but they don't show that in the catalogs.
diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index 4bfa346c24..2259068d73 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -160,7 +160,7 @@ get_raw_page_internal(text *relname, ForkNumber forknum, BlockNumber blkno)
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("cannot get raw page from relation \"%s\"",
 		RelationGetRelationName(rel)),
- errdetail_relkind_not_supported(rel->rd_rel->relkind)));
+ errdetail_relkind_not_supported(rel->rd_rel->relkind, rel->rd_rel->relpersistence)));
 
 	/*
 	 * Reject attempts to read non-local temporary relations; we would be
diff --git a/contrib/pg_surgery/heap_surgery.c b/contrib/pg_surgery/heap_surgery.c
index 7edfe4f326..1c90c87c04 100644
--- a/contrib/pg_surgery/heap_surgery.c
+++ b/contrib/pg_surgery/heap_surgery.c
@@ -110,7 +110,7 @@ heap_force_common(FunctionCallInfo fcinfo, HeapTupleForceOption heap_force_opt)
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("cannot operate on relation \"%s\"",
 		RelationGetRelationName(rel)),
- errdetail_relkind_not_supported(rel->rd_rel->relkind)));
+ errdetail_relkind_not_supported(rel->rd_rel->relkind, rel->rd_rel->relpersistence)));
 
 	if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
 		ereport(ERROR,
diff --git a/contrib/pg_visibility/pg_visibility.c 

Re: support for MERGE

2021-11-12 Thread Alvaro Herrera
On 2021-Nov-12, Tomas Vondra wrote:

> On 11/12/21 18:57, Alvaro Herrera wrote:

> > Secondarily, and I'm now not sure that I really want to do it, is change
> > the representation for executor: instead of creating a fake join between
> > target and source, perhaps we should have just source, and give
> > optimizer a separate query to fetch tuples from target.
> 
> When you say you're not sure you want to change this, is that because you
> don't have time for that, or because you think the current approach is
> better?

I'm not sure that doing it the other way will be better.  We'll have two
queries to optimize: one is reading from the data source, and the other
is fetching tuples from the target table based on the conditions applied
to each row obtained form the data source.  Right now, we just apply a
join and let the optimizer do it's thing.  It's simpler, but ...

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: XTS cipher mode for cluster file encryption

2021-11-12 Thread Bruce Momjian
On Mon, Nov  1, 2021 at 02:24:36PM -0400, Stephen Frost wrote:
> I can understand the general idea that we should be sure to engineer
> this in a way that multiple methods can be used, as surely one day folks
> will say that AES128 isn't acceptable any more.  In terms of what we'll
> do from the start, I would think providing the options of AES128 and
> AES256 would be good to ensure that we have the bits covered to support
> multiple methods and I don't think that would put us into a situation of
> having to really explain which to use to users (we don't for pgcrypto
> anyway, as an example).  I agree that we shouldn't be looking at adding
> in a whole new crypto library for this though, that's a large and
> independent effort (see the work on NSS happening nearby).

Since it has been two weeks since the last activity on this thread, I
have updated the TDE wiki to reflect the conclusions and discussions:

https://wiki.postgresql.org/wiki/Transparent_Data_Encryption

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: support for MERGE

2021-11-12 Thread Tomas Vondra

On 11/12/21 18:57, Alvaro Herrera wrote:

Here's a new version.  Many of the old complaints have been fixed;
particularly, the handling of partitioned tables is now much cleaner and
straightforward.  Amit Langote helped considerably in getting this part
to shape -- thanks for that.  Amit also helped correct the EvalPlanQual
behavior, which wasn't quite up to snuff.



Thanks!


There are a few things that can still be improved here.  For one, I need
to clean up the interactions with table AM (and thus heapam.c etc).
Secondarily, and I'm now not sure that I really want to do it, is change
the representation for executor: instead of creating a fake join between
target and source, perhaps we should have just source, and give
optimizer a separate query to fetch tuples from target.



When you say you're not sure you want to change this, is that because 
you don't have time for that, or because you think the current approach 
is better?



What I did do is change how the target table is represented from parse
analysis to executor.  For example, in the original code, there were two
RTEs that represented the target table; that is gone.  Now the target
table is always just the query's resultRelation.  This removes a good
number of ugly hacks that had been objected to.

I'll park this in the January commitfest now.




regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: increase size of pg_commit_ts buffers

2021-11-12 Thread Tomas Vondra

Hi,

I think this is ready to go. I was wondering why it merely doubles the 
number of buffers, as described by previous comments. That seems like a 
very tiny increase, so considering how much the hardware grew over the 
last few years it'd probably fail to help some of the large boxes.


But it turns out that's not what the patch does. The change is this

> -  return Min(16, Max(4, NBuffers / 1024));
> +  return Min(256, Max(4, NBuffers / 512));

So it does two things - (a) it increases the maximum from 16 to 256 (so 
16x), and (b) it doubles the speed how fast we get there. Until now we 
add 1 buffer per 1024 shared buffers, and the maximum would be reached 
with 128MB. The patch lowers the steps to 512, and the maximum to 1GB.


So this actually increases the number of commit_ts buffers 16x, not 2x. 
That seems reasonable, I guess. The increase may be smaller for systems 
with less that 1GB shared buffers, but IMO that's a tiny minority of 
production systems busy enough for this patch to make a difference.


The other question is of course what overhead could this change have on 
workload that does not have issues with commit_ts buffers (i.e. it's 
using commit_ts, but would be fine with just the 16 buffers). But my 
guess is this is negligible, based on how simple the SLRU code is and my 
previous experiments with SLRU.


So +1 to just get this committed, as it is.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Parallel Append can break run-time partition pruning

2021-11-12 Thread Justin Pryzby
On Mon, Nov 02, 2020 at 01:50:57PM +1300, David Rowley wrote:
> On Tue, 27 Oct 2020 at 19:40, Amit Langote  wrote:
> > Some comments:
> 
> Thanks for having a look at this.
> 
> I've made some adjustments to those comments and pushed.

commit a929e17e5 doesn't appear in the v14 release notes, but I wanted to
mention that this appears to allow fixing a rowcount mis-estimate for us,
which I had reported before:
https://www.postgresql.org/message-id/20170326193344.GS31628%40telsasoft.com
https://www.postgresql.org/message-id/20170415002322.ga24...@telsasoft.com
https://www.postgresql.org/message-id/20170524211730.gm31...@telsasoft.com

And others have reported before:
https://www.postgresql.org/message-id/flat/7df51702-0f6a-4571-80bb-188aaef26...@gmail.com
https://www.postgresql.org/message-id/SG2PR01MB29673BE6F7AA24424FDBFF60BC670%40SG2PR01MB2967.apcprd01.prod.exchangelabs.com

For years, our reports have included a generated WHERE clause for each table
being queried, to allow each table's partitions to be properly pruned/excluded
(not just one table, as happened if we used a single WHERE clause).

That worked, but then the planner underestimates the rowcount, since it doesn't
realize that the conditions are redundant (since "equality classes" do not
handle the inequality conditions).

In v14, one WHERE clause per table still gives an underestimate; but, now
multiple WHERE clauses aren't required, because a single WHERE clause excludes
partitions from each table, and the rowcount from the elided partitions is
excluded from the Append rowcount at plan time.

Thanks for this feature !

-- 
Justin




Re: Logical replication timeout problem

2021-11-12 Thread Fabrice Chapuis
I made a mistake in the configuration of my test script, in fact I cannot
reproduce the problem at the moment.
Yes, on the original environment there is physical replication, that's why
for the lab I configured 2 nodes with physical replication.
I'll try new tests next week
Regards

On Fri, Nov 12, 2021 at 7:23 AM Amit Kapila  wrote:

> On Thu, Nov 11, 2021 at 11:15 PM Fabrice Chapuis
>  wrote:
> >
> > Hello,
> > Our lab is ready now. Amit,  I compile Postgres 10.18 with your
> patch.Tang, I used your script to configure logical replication between 2
> databases and to generate 10 million entries in an unreplicated foo table.
> On a standalone instance no error message appears in log.
> > I activate the physical replication between 2 nodes, and I got following
> error:
> >
> > 2021-11-10 10:49:12.297 CET [12126] LOG:  attempt to send keep alive
> message
> > 2021-11-10 10:49:12.297 CET [12126] STATEMENT:  START_REPLICATION
> 0/300 TIMELINE 1
> > 2021-11-10 10:49:15.127 CET [12064] FATAL:  terminating logical
> replication worker due to administrator command
> > 2021-11-10 10:49:15.127 CET [12036] LOG:  worker process: logical
> replication worker for subscription 16413 (PID 12064) exited with exit code
> 1
> > 2021-11-10 10:49:15.155 CET [12126] LOG:  attempt to send keep alive
> message
> >
> > This message look like strange because no admin command have been
> executed during data load.
> > I did not find any error related to the timeout.
> > The message coming from the modification made with the patch comes back
> all the time: attempt to send keep alive message. But there is no "sent
> keep alive message".
> >
> > Why logical replication worker exit when physical replication is
> configured?
> >
>
> I am also not sure why that happened may be due to
> max_worker_processes reaching its limit. This can happen because it
> seems you configured both publisher and subscriber in the same
> cluster. Tang, did you also see the same problem?
>
> BTW, why are you bringing physical standby configuration into the
> test? Does in your original setup where you observe the problem the
> physical standbys were there?
>
> --
> With Regards,
> Amit Kapila.
>


SKIP LOCKED assert triggered

2021-11-12 Thread Simon Riggs
The combination of these two statements in a transaction hits an
Assert in heapam.c at line 4770 on REL_14_STABLE

BEGIN;
SELECT * FROM queue LIMIT 1 FOR UPDATE SKIP LOCKED;
...
UPDATE queue SET status = 'UPDATED' WHERE id = :id;
COMMIT;

pgbench reliably finds this, running from inside a PL/pgSQL function.

Alvaro suggests we just ignore the Assert, which on testing appears to
be the right approach. Patch attached, which solves it for me.

There is no formal test that does lock then update, so I have
attempted to create one, but this has not successfully reproduced it
(attached anyway), but this code is different from the pgbench test.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


skip_locked_assert.v1.patch
Description: Binary data


skip_locked_then_update_test.v1.patch
Description: Binary data


Re: [PATCH] postgres_fdw: suppress explicit casts in text:text comparisons (was: column option to override foreign types)

2021-11-12 Thread Tom Lane
"Dian M Fay"  writes:
> Eminently reasonable all around! `git apply` insisted that the v8 patch
> didn't (apply, that is), but `patch -p1` liked it fine. I've put it
> through a few paces and it seems good; what needs to happen next?

I don't see anything else to do except shove it out into the light
of day and see what happens.  Hence, pushed.

As I remarked in the commit message:

>> One point that I (tgl) remain slightly uncomfortable with is that we
>> will ignore an implicit RelabelType when deciding if the non-Const input
>> is a plain Var.  That makes it a little squishy to argue that the remote
>> should resolve the Const as being of the same type as its Var, because
>> then our Const is not the same type as our Var.  However, if we don't do
>> that, then this hack won't work as desired if the user chooses to use
>> varchar rather than text to represent some remote column.  That seems
>> useful, so do it like this for now.  We might have to give up the
>> RelabelType-ignoring bit if any problems surface.

I think we can await complaints before doing more, but I wanted that
bit on record for anyone perusing the archives.

regards, tom lane




Re: Commitfest 2021-11 Patch Triage - Part 2

2021-11-12 Thread Daniel Gustafsson
> On 9 Nov 2021, at 18:43, Stephen Frost  wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Daniel Gustafsson  writes:
>>> 2773: libpq compression
>>> ===
>>> This patch intended to provide libpq connection compression to "replace SSL
>>> compression" which was doomed when the patch was written, and have since 
>>> been
>>> removed altogether.  The initial approach didn't get much traction but there
>>> was significant discussion and work, which has since fizzled out.  The patch
>>> has been updated but there hasn't been meaningful review the past months, 
>>> the
>>> last comments seem to imply there being a fair amount of questionmarks left 
>>> in
>>> here.  Robert, having been very involved in this do you have any thoughts on
>>> where we are and where to go (if at all IYO)?
>> 
>> I'm not Robert, but I still have an opinion here, and that it's that this
>> feature would at best be an attractive nuisance.  If you need compression
>> on a database session, it probably means that the connection is over the
>> open internet, which means that you need encryption even more.  And we
>> know that compression and encryption do not play well together.  The
>> reason compression was taken out of the latest TLS standards is not that
>> they wouldn't have liked to have it, nor that applying compression in a
>> separate code layer would be any safer.  I fear offering this would
>> merely lead people to build CVE-worthy setups.
> 
> I've got an opinion on this also and I don't agree that needing
> compression means you're on the open internet or that we shouldn't allow
> users the choice to decide if they want to use compression and
> encryption together or not.  Yes, there's potential risks there, but I
> don't think those risks would lead to CVEs against PG for supporting a
> mode where we allow compression and then also allow encryption- if
> that's a problem then it's an issue for the encryption library/method
> being used and isn't the fault of us for allowing that combination.

I'm not so sure, if the parts aren't vulnerable separately on their own but the
combination of them + libpq allows for weakening encryption in a TLS stream I'd
say the CVE is on us.

> Further, the general issue with encryption+compression attacks is when
> the attacker is able to inject chosen plaintext content into what ends
> up being compressed and encrypted, and then is able to observe the
> compressed+encrypted traffic, and that's not nearly as easy to do when
> you're talking about a database connection vs. a case where an attacker
> is able to convince a victim to go to a malicious website and through
> that is able to do cross-site requests to inject the known plaintext
> into requests to a trusted webpage and see the results coming back.

I agree with this, the known weaknesses rely on mechanisms that aren't readily
applicable to postgres connections.  I don't remember if the patch already does
this, but if not we should ensure we only negotiate compression after
authentication like how OpenSSH does [0].

--
Daniel Gustafsson   https://vmware.com/

[0] https://www.openssh.com/txt/draft-miller-secsh-compression-delayed-00.txt



Re: Test::More version

2021-11-12 Thread Andrew Dunstan


On 11/12/21 10:05, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 11/11/21 15:44, Tom Lane wrote:
>>> Yeah ... configure is already checking those versions, so maybe we could
>>> make it print them out along the way?  I'd been thinking of doing exactly
>>> this just for documentation purposes, so if there's a concrete need for
>>> it, let's get it done.
>> The attached seems to do the trick:
> LGTM (by eyeball, didn't test)



Thanks. pushed a slightly more robust version.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display

2021-11-12 Thread Ashutosh Sharma
On Thu, Nov 11, 2021 at 6:07 PM Alvaro Herrera 
wrote:

> On 2021-Nov-10, Bossart, Nathan wrote:
>
> > On 11/10/21, 9:43 AM, "Bharath Rupireddy" <
> bharath.rupireddyforpostg...@gmail.com> wrote:
> > > As discussed in [1], isn't it a better idea to add some of activity
> > > messages [2] such as recovery, archive, backup, streaming etc. to
> > > server logs at LOG level?
>
> > I think this would make the logs far too noisy for many servers.  For
> > archiving alone, this could cause tens of thousands more log messages
> > per hour on a busy system.  I think you can already see such
> > information at a debug level, anyway.
>
> Yeah.  If we had some sort of ring buffer in which to store these
> messages, the user could examine them through a view; they would still
> be accessible in a running server, but they would not be written to the
> server log.
>

That's a good idea. How about also adding some GUC(s) to the log archive,
recovery related log messages just like we have for checkpoints, autovacuum
etc? Maybe something like log_archive, log_recovery etc.

--
With Regards,
Ashutosh Sharma.


Re: Patch abstracts in the Commitfest app

2021-11-12 Thread David Steele

On 11/12/21 7:51 AM, Daniel Gustafsson wrote:


It should be a free-form textfield, and only apply to new submissions to avoid
a need to backfill.  The field should, IMHO, be mandatory.  The implementation
can be left to discussion on -www iff this list agrees that it would be a good
thing.


+1.

The patch title is often not enough to tell what the patch does, and 
patches also change over time. The abstract may also be helpful to the 
committer when generating a commit message.


Regards,
--
-David
da...@pgmasters.net




Re: Test::More version

2021-11-12 Thread Tom Lane
Peter Eisentraut  writes:
> On 11.11.21 21:04, Andrew Dunstan wrote:
>> But I'd really like it if we could shift the goalposts a bit and require
>> 0.96. subtests would be really nice to have available. Unfortunately I
>> don't have any idea what versions of Test::More are actually in use.

> My initial patch for TAP tests used subtests, and IIRC that was taken 
> out because the Perl that comes with CentOS 6 is too old.

I retired my RHEL6 installation a year ago, but I still have backups
at hand (...digs...)  Yeah, its Test/More.pm says $VERSION = '0.92';
so your recollection is accurate.

RHEL6 has been EOL for awhile, and that specific platform isn't in
the buildfarm, so it needn't constrain our decision.  But it may
well be that some of the older buildfarm members have similarly
ancient Test::More.  I await results from Andrew's patch.

regards, tom lane




Re: RecoveryInProgress() has critical side effects

2021-11-12 Thread Bossart, Nathan
On 11/11/21, 9:20 AM, "Robert Haas"  wrote:
> memory, and not much is lost if they are set up and not used. So, in
> 0001, I propose to move the call to InitXLogInsert() from
> InitXLOGAccess() to BaseInit(). That means every backend will do it
> the first time it starts up. It also means that CreateCheckPoint()
> will no longer need a special case call to InitXLogInsert(), which
> seems like a good thing.

0001 looks reasonable to me.

> connected. However, it does have the effect of introducing a branch
> into GetFullPageWritesInfo(), which I'm not 100% sure is cheap enough
> not to matter. I think the obvious alternative is to just not
> initialize RedoRecPtr and doPageWrites at all and document in the
> comments that the first time each backend does something with them
> it's going to end up retrying once; perhaps that is preferable. Going

Unless there's a demonstrable performance benefit from adding the
branch, my preference is to leave it out.  I could be off-base, but it
seems possible that future changes might end up depending on any side
effects from this new branch, which is exactly what you're trying to
fix.  Plus, always using the retry path is a good way to test that it
works as expected.

Nathan



Re: Add additional information to src/test/ssl/README

2021-11-12 Thread Tom Lane
Daniel Gustafsson  writes:
> Attached is a small addition mentioning PG_TEST_NOCLEAN

Maybe could use a bit of copy-editing, say

Data directories will also be left behind for analysis when a test fails;
they are named according to the test filename.  But if the environment
variable PG_TEST_NOCLEAN is set, data directories will be retained
regardless of test status.

regards, tom lane




Re: Add additional information to src/test/ssl/README

2021-11-12 Thread Daniel Gustafsson
> On 31 Oct 2021, at 23:13, Tom Lane  wrote:

> Done that way; feel free to add more material to perl/README.

Attached is a small addition mentioning PG_TEST_NOCLEAN

--
Daniel Gustafsson   https://vmware.com/



readme_retaindir.diff
Description: Binary data


Re: Test::More version

2021-11-12 Thread Tom Lane
Andrew Dunstan  writes:
> On 11/11/21 15:44, Tom Lane wrote:
>> Yeah ... configure is already checking those versions, so maybe we could
>> make it print them out along the way?  I'd been thinking of doing exactly
>> this just for documentation purposes, so if there's a concrete need for
>> it, let's get it done.

> The attached seems to do the trick:

LGTM (by eyeball, didn't test)

regards, tom lane




Re: Patch abstracts in the Commitfest app

2021-11-12 Thread Tom Lane
Julien Rouhaud  writes:
> On Fri, Nov 12, 2021 at 8:51 PM Daniel Gustafsson  wrote:
>> It should be a free-form textfield, and only apply to new submissions to 
>> avoid
>> a need to backfill.  The field should, IMHO, be mandatory.  The 
>> implementation
>> can be left to discussion on -www iff this list agrees that it would be a 
>> good
>> thing.

> +1 for the idea.  Not sure if that should be mandatory though, as some
> simple entries should be self explanatory.

Yeah, there are plenty of entries for which the title seems adequate.
No objection to making room for an additional optional description,
though.

> Really this is mainly
> helpful for long and lengthy threads, when people start to ask a
> summary of the situation, so it should be editable and maybe
> automatically generates an email on the thread when that happens.

Editable certainly, but I don't think we want any auto-generated mail.

regards, tom lane




Re: Test::More version

2021-11-12 Thread Andrew Dunstan

On 11/11/21 15:44, Tom Lane wrote:
> Alvaro Herrera  writes:
>> On 2021-Nov-11, Andrew Dunstan wrote:
>>> Perhaps we could add a line to one of the TAP tests that would spit out
>>> the version on the log?
>> Maybe have it spit out the version of *all* the modules we require
>> (which aren't all that many), not just Test::More?
> Yeah ... configure is already checking those versions, so maybe we could
> make it print them out along the way?  I'd been thinking of doing exactly
> this just for documentation purposes, so if there's a concrete need for
> it, let's get it done.
>

The attached seems to do the trick:


checking for perl module IPC::Run 0.79... 20200505.0
checking for perl module Test::More 0.87... 1.302183
checking for perl module Time::HiRes 1.52... 1.9764


>
> I have no objection to updating prairiedog to whatever new minimum
> version we settle on.  But if there are other buildfarm animals with
> old versions, that might be a reason not to change it.
>
>   



Yeah, let's see what's actually in use and then we can decide.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/config/ax_prog_perl_modules.m4 b/config/ax_prog_perl_modules.m4
index 70b3230ebd..03c25c3ab2 100644
--- a/config/ax_prog_perl_modules.m4
+++ b/config/ax_prog_perl_modules.m4
@@ -55,12 +55,12 @@ if test "x$PERL" != x; then
 AC_MSG_CHECKING(for perl module $ax_perl_module)
 
 # Would be nice to log result here, but can't rely on autoconf internals
-$PERL -e "use $ax_perl_module; exit" > /dev/null 2>&1
+modversion=`$PERL -e "use $ax_perl_module; my \\\$x=q($ax_perl_module); \\\$x =~ s/ .*/::VERSION/; eval qq{print $\\\$x\\n}; exit;" 2>/dev/null`
 if test $? -ne 0; then
   AC_MSG_RESULT(no);
   ax_perl_modules_failed=1
else
-  AC_MSG_RESULT(ok);
+  AC_MSG_RESULT($modversion);
 fi
   done
 
diff --git a/configure b/configure
index 7d4e42dd62..d0f0f88a1b 100755
--- a/configure
+++ b/configure
@@ -19301,14 +19301,14 @@ if test "x$PERL" != x; then
 $as_echo_n "checking for perl module $ax_perl_module... " >&6; }
 
 # Would be nice to log result here, but can't rely on autoconf internals
-$PERL -e "use $ax_perl_module; exit" > /dev/null 2>&1
+modversion=`$PERL -e "use $ax_perl_module; my \\\$x=q($ax_perl_module); \\\$x =~ s/ .*/::VERSION/; eval qq{print $\\\$x\\n}; exit;" 2>/dev/null`
 if test $? -ne 0; then
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
 $as_echo "no" >&6; };
   ax_perl_modules_failed=1
else
-  { $as_echo "$as_me:${as_lineno-$LINENO}: result: ok" >&5
-$as_echo "ok" >&6; };
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $modversion" >&5
+$as_echo "$modversion" >&6; };
 fi
   done
 


Re: Patch abstracts in the Commitfest app

2021-11-12 Thread Dmitry Dolgov
> On Fri, Nov 12, 2021 at 03:36:43PM +0100, Daniel Gustafsson wrote:
> > On 12 Nov 2021, at 15:24, Justin Pryzby  wrote:
> >
> > On Fri, Nov 12, 2021 at 01:51:28PM +0100, Daniel Gustafsson wrote:
> >> While reading through and working with the hundreds of patches in the CF 
> >> app a
> >> small feature/process request struck me: it would be really helpful if the
> >> patch had a brief abstract outlining what it aims to add or fix (or 
> >> summary,
> >> description or something else; not sure what to call it).  Basically a
> >> two-sentence or so version of the email posting the patch to -hackers.
> >
> > This seems fine ; that purpose is partially served (and duplicated) by the
> > patch commit messages (if used).
>
> That's the problem, many patches are in diff format and don't have commit
> messages at all.  There are also many entries with patchsets containing
> multiple patches and thus commitmessages but the app will only link to one.

Probably encouraging to use cover letters, generated for such patch
series and linked by the CF app, would be a good idea?




Re: Patch abstracts in the Commitfest app

2021-11-12 Thread Bossart, Nathan
On 11/12/21, 4:53 AM, "Daniel Gustafsson"  wrote:
> While reading through and working with the hundreds of patches in the CF app a
> small feature/process request struck me: it would be really helpful if the
> patch had a brief abstract outlining what it aims to add or fix (or summary,
> description or something else; not sure what to call it).  Basically a
> two-sentence or so version of the email posting the patch to -hackers.

+1

Nathan



Re: Should AT TIME ZONE be volatile?

2021-11-12 Thread Robert Haas
On Fri, Nov 12, 2021 at 8:42 AM Peter Eisentraut
 wrote:
> There are standards for sort order, and the major hiccups we had in the
> past were mostly moving from older versions of those standards to newer
> versions.  So at some point this should stabilize.

Only if they don't keep making new versions of the standards.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Patch abstracts in the Commitfest app

2021-11-12 Thread Daniel Gustafsson
> On 12 Nov 2021, at 15:24, Justin Pryzby  wrote:
> 
> On Fri, Nov 12, 2021 at 01:51:28PM +0100, Daniel Gustafsson wrote:
>> While reading through and working with the hundreds of patches in the CF app 
>> a
>> small feature/process request struck me: it would be really helpful if the
>> patch had a brief abstract outlining what it aims to add or fix (or summary,
>> description or something else; not sure what to call it).  Basically a
>> two-sentence or so version of the email posting the patch to -hackers.
> 
> This seems fine ; that purpose is partially served (and duplicated) by the
> patch commit messages (if used).

That's the problem, many patches are in diff format and don't have commit
messages at all.  There are also many entries with patchsets containing
multiple patches and thus commitmessages but the app will only link to one.

> Actually, it's already possible to add an "annotation" to a given message-id.
> I've just added this here: https://commitfest.postgresql.org/35/2800/
> 
> I think this that feature is rarely used, but I can see the value in using it
> more, for long threads.

I consider this feature very valuable, but I don't see it as a replacement for
a text field Abstract in the patch entry.

--
Daniel Gustafsson   https://vmware.com/





Re: simplifying foreign key/RI checks

2021-11-12 Thread Tom Lane
Amit Langote  writes:
> Whatever mechanism we will use would still need to involve setting a
> global Snapshot variable though, right?

In v14 we'll certainly still be passing the snapshot(s) to SPI, which will
eventually make the snapshot active.  With your patch, since we're just
handing the snapshot to the scan mechanism, it seems at least
theoretically possible that we'd not have to do PushActiveSnapshot on it.
Not doing so might be a bad idea however; if there is any user-defined
code getting called, it might have expectations about ActiveSnapshot being
relevant.  On the whole I'd be inclined to say that we still want the
RI test_snapshot to be the ActiveSnapshot while performing the test. 

regards, tom lane




Re: Patch abstracts in the Commitfest app

2021-11-12 Thread Justin Pryzby
On Fri, Nov 12, 2021 at 01:51:28PM +0100, Daniel Gustafsson wrote:
> While reading through and working with the hundreds of patches in the CF app a
> small feature/process request struck me: it would be really helpful if the
> patch had a brief abstract outlining what it aims to add or fix (or summary,
> description or something else; not sure what to call it).  Basically a
> two-sentence or so version of the email posting the patch to -hackers.

This seems fine ; that purpose is partially served (and duplicated) by the
patch commit messages (if used).

Actually, it's already possible to add an "annotation" to a given message-id.
I've just added this here: https://commitfest.postgresql.org/35/2800/

I think this that feature is rarely used, but I can see the value in using it
more, for long threads.

If you really wanted to make an "abstract" not associated with any message-id,
I think it would be implemented simply by removing the restriction that an
message-id must be specified and must match an existing message in the thead.

-- 
Justin




Re: simplifying foreign key/RI checks

2021-11-12 Thread Amit Langote
On Fri, Nov 12, 2021 at 10:58 AM Tom Lane  wrote:
> I wrote:
> > Anyway, I think that (1) we should write some more test cases around
> > this behavior, (2) you need to establish the snapshot to use in two
> > different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
> > and (3) something's going to have to be done to repair the behavior
> > in v14 (unless we want to back-patch this into v14, which seems a
> > bit scary).
>
> I wrote that thinking that point (2), ie fix the choice of snapshots for
> these RI queries, would solve the brokenness in partitioned tables,
> so that (3) would potentially only require hacking up v14.
>
> However after thinking more I realize that (2) will break the desired
> behavior for concurrent partition detaches, because that's being driven
> off ActiveSnapshot.  So we really need a solution that decouples the
> partition detachment logic from ActiveSnapshot, in both branches.

ISTM that the latest snapshot would still have to be passed to the
find_inheritance_children_extended() *somehow* by ri_trigger.c.  IIUC
the problem with using the ActiveSnapshot mechanism to do that is that
it causes the SPI query to see even user table rows that it shouldn't
be able to, so that is why you say it is too global a mechanism for
this hack.

Whatever mechanism we will use would still need to involve setting a
global Snapshot variable though, right?

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: Test::More version

2021-11-12 Thread Peter Eisentraut

On 11.11.21 21:04, Andrew Dunstan wrote:

But I'd really like it if we could shift the goalposts a bit and require
0.96. subtests would be really nice to have available. Unfortunately I
don't have any idea what versions of Test::More are actually in use.


My initial patch for TAP tests used subtests, and IIRC that was taken 
out because the Perl that comes with CentOS 6 is too old.





Re: Should AT TIME ZONE be volatile?

2021-11-12 Thread Peter Eisentraut

On 11.11.21 18:32, Robert Haas wrote:

I agree with Tom that it sounds like a lot of work. And to be honest
it's work that I don't really feel very excited about. It would be
necessary to understand not only the bona fide sorting rules of every
human language out there, which might actually be sort of fun at least
for a while, but also to decide - probably according to some
incomprehensible standard - how Japanese katakana ought to sort in
comparison to, say, box-drawing characters, the Mongolian alphabet,
and smiley-face emojis. I think it's not particularly likely that
there are a whole lot of documents out there that include all of those
things, but the comparison algorithm has to return something, and
probably there are people who have strong feelings about what the
right answers are. That's a pretty unappealing thing to tackle, and I
am not volunteering.

On the other hand, if we don't do it, I'm suspicious that things will
never get any better. And that would be sad.


There are standards for sort order, and the major hiccups we had in the 
past were mostly moving from older versions of those standards to newer 
versions.  So at some point this should stabilize.






Re: Parallel vacuum workers prevent the oldest xmin from advancing

2021-11-12 Thread Alvaro Herrera
On 2021-Nov-11, Masahiko Sawada wrote:

> On Thu, Nov 11, 2021 at 12:53 PM Amit Kapila  wrote:
> >
> > On Thu, Nov 11, 2021 at 9:11 AM Andres Freund  wrote:

> > > This seems like an unnecessary optimization. 
> > > ProcArrayInstallRestoredXmin()
> > > only happens in the context of much more expensive operations.
> >
> > Fair point. I think that will also make the change in
> > ProcArrayInstallRestoredXmin() appear neat.
> 
> This makes me think that it'd be better to copy status flags in a
> separate function rather than ProcArrayInstallRestoredXmin().

To me, and this is perhaps just personal opinion, it seems conceptually
simpler to have ProcArrayInstallRestoredXmin acquire exclusive and do
both things.  Why?  Because if you have two functions, you have to be
careful not to call the new function after ProcArrayInstallRestoredXmin;
otherwise you would create an instant during which you make an
Xmin-without-flag visible to other procs; this causes the computed xmin
go backwards, which is verboten.

If I understand Amit correctly, his point is about the callers of
RestoreTransactionSnapshot, which are two: CreateReplicationSlot and
ParallelWorkerMain.  He wants you hypothetical new function called from
the latter but not the former.  Looking at both, it seems a bit strange
to make them responsible for a detail such as "copy ->statusFlags from
source proc to mine".  It seems more reasonable to add a third flag to
  RestoreTransactionSnapshot(Snapshot snapshot, void *source_proc, bool 
is_vacuum)
and if that is true, tell SetTransactionSnapshot to copy flags,
otherwise not.


... unrelated to this, and looking at CreateReplicationSlot, I wonder
why does it pass MyProc as the source_pgproc parameter.  What good is
that doing?  I mean, if the only thing we do with source_pgproc is to
copy stuff from source_pgproc to MyProc, then if source_pgproc is
MyProc, we're effectively doing nothing at all.  (You can't "fix" this
by merely passing NULL, because what that would do is change the calling
of ProcArrayInstallRestoredXmin into a call of
ProcArrayInstallImportedXmin and that would presumably have different
behavior.)  I may be misreading the code of course, but it sounds like
the intention of CreateReplicationSlot is to "do nothing" with the
transaction snapshot in a complicated way.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Patch abstracts in the Commitfest app

2021-11-12 Thread Julien Rouhaud
On Fri, Nov 12, 2021 at 8:51 PM Daniel Gustafsson  wrote:
>
> While reading through and working with the hundreds of patches in the CF app a
> small feature/process request struck me: it would be really helpful if the
> patch had a brief abstract outlining what it aims to add or fix (or summary,
> description or something else; not sure what to call it).  Basically a
> two-sentence or so version of the email posting the patch to -hackers.
>
> The "title" field isn't always explanatory enough, and diving into a long
> thread to piece together clues can be quite daunting.  I'm especially thinking
> about new contributors who would like to contribute but who have a hard time
> finding something in the CF app (which is a complaint I've heard from multiple
> such contributors).
>
> It should be a free-form textfield, and only apply to new submissions to avoid
> a need to backfill.  The field should, IMHO, be mandatory.  The implementation
> can be left to discussion on -www iff this list agrees that it would be a good
> thing.

+1 for the idea.  Not sure if that should be mandatory though, as some
simple entries should be self explanatory.   Really this is mainly
helpful for long and lengthy threads, when people start to ask a
summary of the situation, so it should be editable and maybe
automatically generates an email on the thread when that happens.




Patch abstracts in the Commitfest app

2021-11-12 Thread Daniel Gustafsson
While reading through and working with the hundreds of patches in the CF app a
small feature/process request struck me: it would be really helpful if the
patch had a brief abstract outlining what it aims to add or fix (or summary,
description or something else; not sure what to call it).  Basically a
two-sentence or so version of the email posting the patch to -hackers.

The "title" field isn't always explanatory enough, and diving into a long
thread to piece together clues can be quite daunting.  I'm especially thinking
about new contributors who would like to contribute but who have a hard time
finding something in the CF app (which is a complaint I've heard from multiple
such contributors).

It should be a free-form textfield, and only apply to new submissions to avoid
a need to backfill.  The field should, IMHO, be mandatory.  The implementation
can be left to discussion on -www iff this list agrees that it would be a good
thing.

Thoughts?

--
Daniel Gustafsson   https://vmware.com/





Re: Logical Replication - improve error message while adding tables to the publication in check_publication_add_relation

2021-11-12 Thread Daniel Gustafsson
> On 4 Nov 2021, at 05:24, Bharath Rupireddy 
>  wrote:
> 
> On Wed, Nov 3, 2021 at 6:21 PM Daniel Gustafsson  wrote:
>> 
>>> On 26 Jul 2021, at 09:33, Bharath Rupireddy 
>>>  wrote:
>> 
>>> PSA v7.
>> 
>> This patch no longer applies on top of HEAD, please submit a rebased version.
> 
> Here's a rebased v8 patch. Please review it.

This improves the user experience by increasing the granularity of error
reporting, and maps well with the precedent set in 81d5995b4.  I'm marking this
Ready for Committer and will go ahead and apply this unless there are
objections.

--
Daniel Gustafsson   https://vmware.com/





Re: Printing backtrace of postgres processes

2021-11-12 Thread Bharath Rupireddy
On Fri, Nov 12, 2021 at 5:15 PM Bharath Rupireddy
 wrote:
>
> On Thu, Nov 11, 2021 at 12:14 PM vignesh C  wrote:
> > Thanks for the comments, the attached v10 patch has the fixes for the same.
>
> Thanks for the patches. Here are some comments:
>
> 1) In the docs, let's have the similar description of
> pg_log_backend_memory_contexts for pg_print_backtrace, just for the
> continuity in the users readability.
>
> 2) I don't know how the  part looks like in the Server
> Signaling Functions table. I think here you can just say, it will emit
> a warning and return false if not supported by the installation. And
> you can give the  part after the description where you are
> showing a sample backtrace.
>
> +capture backtrace. If not available, the function will return false
> +and a warning is issued, for example:
> +
> +WARNING:  backtrace generation is not supported by this installation
> + pg_print_backtrace
> +
> + f
> +
> +   
> +  
>
> 3) Replace '!' with '.'.
> + * Note: this is called within a signal handler!  All we can do is set
>
> 4) It is not only the next CFI but also the process specific interrupt
> handlers (in your 0002 patch) right?
> + * a flag that will cause the next CHECK_FOR_INTERRUPTS to invoke
>
> 5) I think you need to update CATALOG_VERSION_NO, mostly the committer
> will take care of it but just in case.
>
> 6) Be consistent with casing "Verify" and "Might"
> +# Verify that log output gets to the file
> +# might need to retry if logging collector process is slow...

Few more comments:

7) Do we need TAP tests for this function? I think it is sufficient to
test the function in misc_functions.sql, please remove
002_print_backtrace_validation.pl. Note that we don't do similar TAP
testing for pg_log_backend_memory_contexts as well.

8) I hope you have manually tested the pg_print_backtrace for the
startup process and other auxiliary processes.

9) I think we can have a similar description (to the patch [1]). with
links to each process definition in the glossary so that it will be
easier for the users to follow the links and know what each process
is. Especially, I didn't like the 0002 mentioned about the
BackendPidGetProc or AuxiliaryPidGetProc as these are internal to the
server and the users don't care about them.

- * end on its own first and its backtrace are not logged is not a problem.
+ * BackendPidGetProc or AuxiliaryPidGetProc returns NULL if the pid isn't
+ * valid; but by the time we reach kill(), a process for which we get a
+ * valid proc here might have terminated on its own.  There's no way to
+ * acquire a lock on an arbitrary process to prevent that. But since this
+ * mechanism is usually used to debug a backend running and consuming lots
+ * of CPU cycles, that it might end on its own first and its backtrace are
+ * not logged is not a problem.
  */

Here's what I have written in the other patch [1], if okay please use this:

+Requests to log memory contexts of the backend or the
+WAL sender or
+the auxiliary
process
+with the specified process ID. All of the
+auxiliary
processes
+are supported except the logger
+and the statistics collector
+as they are not connected to shared memory the function can
not make requests.
+These memory contexts will be logged at
LOG message level.
+They will appear in the server log based on the log configuration set
 (See  for more information),

I have no further comments on v10.

[1] - 
https://www.postgresql.org/message-id/CALj2ACU1nBzpacOK2q%3Da65S_4%2BOaz_rLTsU1Ri0gf7YUmnmhfQ%40mail.gmail.com

Regards,
Bharath Rupireddy.




Re: Printing backtrace of postgres processes

2021-11-12 Thread Bharath Rupireddy
On Thu, Nov 11, 2021 at 12:14 PM vignesh C  wrote:
> Thanks for the comments, the attached v10 patch has the fixes for the same.

Thanks for the patches. Here are some comments:

1) In the docs, let's have the similar description of
pg_log_backend_memory_contexts for pg_print_backtrace, just for the
continuity in the users readability.

2) I don't know how the  part looks like in the Server
Signaling Functions table. I think here you can just say, it will emit
a warning and return false if not supported by the installation. And
you can give the  part after the description where you are
showing a sample backtrace.

+capture backtrace. If not available, the function will return false
+and a warning is issued, for example:
+
+WARNING:  backtrace generation is not supported by this installation
+ pg_print_backtrace
+
+ f
+
+   
+  

3) Replace '!' with '.'.
+ * Note: this is called within a signal handler!  All we can do is set

4) It is not only the next CFI but also the process specific interrupt
handlers (in your 0002 patch) right?
+ * a flag that will cause the next CHECK_FOR_INTERRUPTS to invoke

5) I think you need to update CATALOG_VERSION_NO, mostly the committer
will take care of it but just in case.

6) Be consistent with casing "Verify" and "Might"
+# Verify that log output gets to the file
+# might need to retry if logging collector process is slow...


Regards,
Bharath Rupireddy.




Re: Allow users to choose what happens when recovery target is not reached

2021-11-12 Thread Julien Rouhaud
On Fri, Nov 12, 2021 at 6:14 PM Bharath Rupireddy
 wrote:
>
> Currently, the server shuts down with a FATAL error (added by commit
> [1]) when the recovery target isn't reached. This can cause a server
> availability problem, especially in case of disaster recovery (geo
> restores) where the primary was down and the user is doing a PITR on a
> server lying in another region where it had missed to receive few of
> the last WAL files required to reach the recovery target. In this
> case, users might want the server to be available rather than a no
> server. With the commit [1], there's no way to achieve what users
> wanted.

if users don't mind if the recovery target is reached or not isn't it
better to simply don't specify a target and let the recovery go as far
as possible?




Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"

2021-11-12 Thread Bharath Rupireddy
On Sat, Oct 23, 2021 at 1:46 AM Jeff Davis  wrote:
> What do you want to do after the timeout happens? If you want to issue
> a WARNING instead of failing outright, perhaps that makes sense for
> exploratory PITR cases. That could be a simple boolean GUC without
> needing to introduce the timeout logic into the server.

Thanks Jeff. I posted the patch in a separate thread[1] for new GUC
(WARN + promotion or shutdown with FATAL error) in case the recovery
target isn't reached.

[1] - 
https://www.postgresql.org/message-id/flat/CALj2ACWR4iaph7AWCr5-V9dXqpf2p5B%3D3fTyvLfL8VD_E%2Bx0tA%40mail.gmail.com.

Regards,
Bharath Rupireddy.




Allow users to choose what happens when recovery target is not reached

2021-11-12 Thread Bharath Rupireddy
Hi,

Currently, the server shuts down with a FATAL error (added by commit
[1]) when the recovery target isn't reached. This can cause a server
availability problem, especially in case of disaster recovery (geo
restores) where the primary was down and the user is doing a PITR on a
server lying in another region where it had missed to receive few of
the last WAL files required to reach the recovery target. In this
case, users might want the server to be available rather than a no
server. With the commit [1], there's no way to achieve what users
wanted.

There can be many reasons for the last few WAL files not reaching the
target server where the user is performing the PITR. The primary may
have been down before archiving the last few WAL files to the archive
locations, or archive command fails for whatever reasons or network
latency from primary to archive location and archive location to the
target server, or recovery command on the target server fails or users
may have chosen some wrong/futuristic recovery targets etc. If the
PITR fails with FATAL error and we may ask them to restart the server,
but imagine the wastage of compute resources - if there are a 1 TB of
WAL files to be replayed and just last 16MB WAL file is missing,
everything has to be replayed from the beginning.

Here's a proposal(and a patch) to have a GUC so that users can choose
either to emit a warning and promote or shutdown with FATAL error (as
default) when recovery target isn't reached. In reality, users can
choose to shutdown with FATAL error, if strict consistency is the
necessity, otherwise they can choose to get promoted, if availability
is preferred. There is some discussion around this idea in [2].

Thoughts?

[1] - commit dc788668bb269b10a108e87d14fefd1b9301b793
Author: Peter Eisentraut 
Date:   Wed Jan 29 15:43:32 2020 +0100

Fail if recovery target is not reached

Before, if a recovery target is configured, but the archive ended
before the target was reached, recovery would end and the server would
promote without further notice.  That was deemed to be pretty wrong.
With this change, if the recovery target is not reached, it is a fatal
error.

Based-on-patch-by: Leif Gunnar Erlandsen 
Reviewed-by: Kyotaro Horiguchi 
Discussion:
https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b65383...@lako.no

[2] - 
https://www.postgresql.org/message-id/b334d61396e6b0657a63dc38e16d429703fe9b96.camel%40j-davis.com

Regards,
Bharath Rupireddy.


v1-0001-Allow-users-to-choose-what-happens-when-recovery-.patch
Description: Binary data


RE: Logical replication timeout problem

2021-11-12 Thread tanghy.f...@fujitsu.com
On Friday, November 12, 2021 2:24 PM Amit Kapila  
wrote:
> 
> On Thu, Nov 11, 2021 at 11:15 PM Fabrice Chapuis
>  wrote:
> >
> > Hello,
> > Our lab is ready now. Amit,  I compile Postgres 10.18 with your patch.Tang, 
> > I
> used your script to configure logical replication between 2 databases and to
> generate 10 million entries in an unreplicated foo table. On a standalone 
> instance
> no error message appears in log.
> > I activate the physical replication between 2 nodes, and I got following 
> > error:
> >
> > 2021-11-10 10:49:12.297 CET [12126] LOG:  attempt to send keep alive
> message
> > 2021-11-10 10:49:12.297 CET [12126] STATEMENT:  START_REPLICATION
> 0/300 TIMELINE 1
> > 2021-11-10 10:49:15.127 CET [12064] FATAL:  terminating logical replication
> worker due to administrator command
> > 2021-11-10 10:49:15.127 CET [12036] LOG:  worker process: logical 
> > replication
> worker for subscription 16413 (PID 12064) exited with exit code 1
> > 2021-11-10 10:49:15.155 CET [12126] LOG:  attempt to send keep alive
> message
> >
> > This message look like strange because no admin command have been executed
> during data load.
> > I did not find any error related to the timeout.
> > The message coming from the modification made with the patch comes back all
> the time: attempt to send keep alive message. But there is no "sent keep alive
> message".
> >
> > Why logical replication worker exit when physical replication is configured?
> >
> 
> I am also not sure why that happened may be due to
> max_worker_processes reaching its limit. This can happen because it
> seems you configured both publisher and subscriber in the same
> cluster. Tang, did you also see the same problem?
> 

No.
I used the default max_worker_processes value, ran logical replication and
physical replication at the same time. I also changed the data in table on
publisher. But didn't see the same problem.

Regards
Tang