Re: Teach pg_receivewal to use lz4 compression

2021-09-14 Thread Michael Paquier
On Fri, Sep 10, 2021 at 08:21:51AM +, gkokola...@pm.me wrote:
> Agreed. A default value of 5, which is in the middle point of options, has 
> been
> defined and used.
> 
> In addition, the tests have been adjusted to mimic the newly added gzip tests.

Looking at lz4frame.h, there is LZ4F_flush() that allows to compress
immediately any data buffered in the frame context but not compressed
yet.  It seems to me that dir_sync() should be extended to support
LZ4.

 export GZIP_PROGRAM=$(GZIP)
+export LZ4
[...]
+PGAC_PATH_PROGS(LZ4, lz4)
+
 PGAC_PATH_BISON
The part of the test assigning LZ4 is fine, but I'd rather switch to a
logic à-la-gzip, where we just save "lz4" in Makefile.global.in,
saving cycles in ./configure.

+static bool
+is_xlogfilename(const char *filename, bool *ispartial,
+   WalCompressionMethod *wal_compression_method)
I like the set of simplifications you have done here to detection if a
segment is partial and which compression method applies to it.

+   if (compression_method != COMPRESSION_ZLIB && compresslevel != 0)
+   {
+   pg_log_error("can only use --compress together with "
+"--compression-method=gzip");
+#ifndef HAVE_LIBLZ4
+   pg_log_error("this build does not support compression via gzip");
+#endif

s/HAVE_LIBLZ4/HAVE_LIBZ/.

+$primary->command_fails(
+   [
+ 'pg_receivewal', '-D', $stream_dir, '--compression-method', 'lz4',
+ '--compress', '1'
+   ],
+   'failure if --compression-method=lz4 specified with --compress');

This would fail when the code is not built with LZ4 with a non-zero
error code but with an error that is not what we expect.  I think that
you should use $primary->command_fails_like() instead.  That's quite
new, as of de1d4fe.  The matching error pattern will need to change
depending on if we build the code with LZ4 or not.  A simpler method
is to use --compression-method=none, to bypass the first round of
errors and make that build-independent, but that feels incomplete if
you want to tie that to LZ4.

+   pg_log_warning("compressed segment file \"%s\" has incorrect 
header size %lu, skipping",
+  dirent->d_name, consumed_len);
+   LZ4F_freeDecompressionContext(ctx);
I agree that skipping all those cases when calculating the streaming
start point is more consistent.

+   if (r < 0)
+   pg_log_error("could not read compressed file \"%s\": %m",
+fullpath);
+   else
+   pg_log_error("could not read compressed file \"%s\": read 
%d of %lu",
+fullpath, r, sizeof(buf));
Let's same in translation effort here by just using "could not read",
etc. by removing the term "compressed".

+   pg_log_error("can only use --compress together with "
+"--compression-method=gzip");
Better to keep these in a single line to ease grepping.  We don't care
if error strings are longer than the 72-80 character limit.

+/* Size of lz4 input chunk for .lz4 */
+#define LZ4_IN_SIZE  4096
Why this choice?  Does it need to use LZ4_COMPRESSBOUND?

-   if (dir_data->compression > 0)
+   if (dir_data->compression_method == COMPRESSION_ZLIB)
gzclose(gzfp);
else
Hm.  The addition of the header in dir_open_for_write() uses
LZ4F_compressBegin.  Shouldn't we use LZ4F_compressEnd() if
fsync_fname() or fsync_parent_path() fail on top of closing the fd?
That would be more consistent IMO to do so.  The patch does that in
dir_close().  You should do that additionally if there is a failure
when writing the header.

+   pg_log_error("invalid compression-method \"%s\"", optarg);
+   exit(1);
This could be "invalid value \"%s\" for option %s", see
option_parse_int() in fe_utils/option_utils.c.

After running the TAP tests, the LZ4 section is failing as follows:
pg_receivewal: stopped log streaming at 0/4001950 (timeline 1)
pg_receivewal: not renaming "00010004.partial", segment is not 
complete
pg_receivewal: error: could not close file "00010004": 
Undefined error: 0
ok 26 - streaming some WAL using --compression-method=lz4
The third log line I am quoting here looks unexpected to me.  Saying
that, the tests integrate nicely with the existing code.

As mentioned upthread, LZ4-compressed files don't store the file size
by default.  I think that we should document that better in the code
and the documentation, in two ways at least:
- Add some comments mentioning lz4 --content-size, with at least one
in FindStreamingStart().
- Add a new paragraph in the documentation of --compression-method.

The name of the compression method is "LZ4" with upper-case
characters.  Some comments in the code and the tests, as well as the
docs, are not careful about that.
--
Michael


signature.asc
Description: PGP signature


Re: Increase value of OUTER_VAR

2021-09-14 Thread Andrey Lepikhov

On 14/9/21 16:37, Aleksander Alekseev wrote:

Hi Andrey,


only performance issues


That's interesting. Any chance you could share the hardware
description, the configuration file, and steps to reproduce with us?

I didn't control execution time exactly. Because it is a join of two 
empty tables. As I see, this join used most part of 48GB RAM memory, 
planned all day on a typical 6 amd cores computer.
I guess this is caused by sequental traversal of the partition list in 
some places in the optimizer.
If it makes practical sense, I could investigate reasons for such poor 
performance.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Defer selection of asynchronous subplans until the executor initialization stage

2021-09-14 Thread Alexander Pyhalov

Etsuro Fujita писал 2021-08-30 12:52:

On Mon, Aug 30, 2021 at 5:36 PM Andrey V. Lepikhov

To allow async execution in a bit more cases, I modified the patch a
bit further: a ProjectionPath put directly above an async-capable
ForeignPath would also be considered async-capable as ForeignScan can
project and no separate Result is needed in that case, so I modified
mark_async_capable_plan() as such, and added test cases to the
postgres_fdw regression test.  Attached is an updated version of the
patch.



Hi.

The patch looks good to me and seems to work as expected.
--
Best regards,
Alexander Pyhalov,
Postgres Professional




Re: Asymmetric partition-wise JOIN

2021-09-14 Thread Andrey Lepikhov

On 14/9/21 11:37, Andrey V. Lepikhov wrote:

Thank you for this good catch!
The problem was in the adjust_child_relids_multilevel routine. The 
tmp_result variable sometimes points to original required_outer.
This patch adds new ways which optimizer can generate plans. One 
possible way is optimizer reparameterizes an inner by a plain relation 
from the outer (maybe as a result of join of the plain relation and 
partitioned relation). In this case we have to compare tmp_result with 
original pointer to realize, it was changed or not.
The patch in attachment fixes this problem. Additional regression test 
added.


I thought more and realized there isn't necessary to recurse in the 
adjust_child_relids_multilevel() routine if required_outer contains only

normal_relids.
Also, regression tests were improved a bit.

--
regards,
Andrey Lepikhov
Postgres Professional
>From ee627d07282629a85785a63341cf875bfb0decb2 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Fri, 2 Apr 2021 11:02:20 +0500
Subject: [PATCH] Asymmetric partitionwise join.

Teach optimizer to consider partitionwise join of non-partitioned
table with each partition of partitioned table.

Disallow asymmetric machinery for joining of two partitioned (or appended)
relations because it could cause huge consumption of CPU and memory
during reparameterization of NestLoop path.

Change logic of the multilevel child relids adjustment, because this
feature allows the optimizer to plan in new way.
---
 src/backend/optimizer/path/joinpath.c|   9 +
 src/backend/optimizer/path/joinrels.c| 187 
 src/backend/optimizer/plan/setrefs.c |  17 +-
 src/backend/optimizer/util/appendinfo.c  |  44 +-
 src/backend/optimizer/util/pathnode.c|   9 +-
 src/backend/optimizer/util/relnode.c |  19 +-
 src/include/optimizer/paths.h|   7 +-
 src/test/regress/expected/partition_join.out | 425 +++
 src/test/regress/sql/partition_join.sql  | 180 
 9 files changed, 867 insertions(+), 30 deletions(-)

diff --git a/src/backend/optimizer/path/joinpath.c 
b/src/backend/optimizer/path/joinpath.c
index 6407ede12a..32618ebbd5 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -335,6 +335,15 @@ add_paths_to_joinrel(PlannerInfo *root,
if (set_join_pathlist_hook)
set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
   jointype, &extra);
+
+   /*
+* 7. If outer relation is delivered from partition-tables, consider
+* distributing inner relation to every partition-leaf prior to
+* append these leafs.
+*/
+   try_asymmetric_partitionwise_join(root, joinrel,
+ 
outerrel, innerrel,
+ 
jointype, &extra);
 }
 
 /*
diff --git a/src/backend/optimizer/path/joinrels.c 
b/src/backend/optimizer/path/joinrels.c
index 8b69870cf4..9453258f83 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
 
 #include "miscadmin.h"
 #include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
 #include "optimizer/joininfo.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
@@ -1552,6 +1553,192 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo 
*rel1, RelOptInfo *rel2,
}
 }
 
+/*
+ * Build RelOptInfo on JOIN of each partition of the outer relation and the 
inner
+ * relation. Return List of such RelOptInfo's. Return NIL, if at least one of
+ * these JOINs is impossible to build.
+ */
+static List *
+extract_asymmetric_partitionwise_subjoin(PlannerInfo *root,
+   
 RelOptInfo *joinrel,
+   
 AppendPath *append_path,
+   
 RelOptInfo *inner_rel,
+   
 JoinType jointype,
+   
 JoinPathExtraData *extra)
+{
+   List*result = NIL;
+   ListCell*lc;
+
+   foreach (lc, append_path->subpaths)
+   {
+   Path*child_path = lfirst(lc);
+   RelOptInfo  *child_rel = child_path->parent;
+   Relids  child_joinrelids;
+   Relids  parent_relids;
+   RelOptInfo  *child_joinrel;
+   SpecialJoinInfo *child_sjinfo;
+   List*child_restrictlist;
+
+   child_joinrelids = bms_union(child_rel->relids, 
inner_rel->relids);
+   parent_relids = bms_

Re: Bug fix for tab completion of ALTER TABLE ... VALIDATE CONSTRAINT ...

2021-09-14 Thread David Fetter
On Fri, Sep 03, 2021 at 08:27:55PM +0200, Daniel Gustafsson wrote:
> > On 19 May 2021, at 09:53, Michael Paquier  wrote:
> > 
> > On Tue, Apr 27, 2021 at 12:58:52PM +0300, Aleksander Alekseev wrote:
> >> I've noticed there is no tab completion for ALTER TABLE xxx ADD. Here
> >> is an alternative version of the patch that fixes this as well. Not
> >> sure if this should be in the same commit though.
> > 
> > -   /* If we have ALTER TABLE  DROP, provide COLUMN or CONSTRAINT */
> > -   else if (Matches("ALTER", "TABLE", MatchAny, "DROP"))
> > +   /* If we have ALTER TABLE  ADD|DROP, provide COLUMN or CONSTRAINT 
> > */
> > +   else if (Matches("ALTER", "TABLE", MatchAny, "ADD|DROP"))
> > Seems to me that the behavior to not complete with COLUMN or
> > CONSTRAINT for ADD is intentional, as it is possible to specify a
> > constraint or column name without the object type first.  This
> > introduces a inconsistent behavior with what we do for columns with
> > ADD, for one.  So a more consistent approach would be to list columns,
> > constraints, COLUMN and CONSTRAINT in the list of options available
> > after ADD.
> > 
> > +   else if (Matches("ALTER", "TABLE", MatchAny, "VALIDATE", "CONSTRAINT"))
> > +   {
> > +   completion_info_charp = prev3_wd;
> > +   COMPLETE_WITH_QUERY(Query_for_nonvalid_constraint_of_table);
> > +   }
> > Specifying valid constraints is an authorized grammar, so it does not
> > seem that bad to keep things as they are, either.  I would leave that
> > alone.
> 
> This has stalled being marked Waiting on Author since May, and reading the
> above it sounds like marking it Returned with Feedback is the logical next 
> step
> (patch also no longer applies).

Please find attached the next revision :)

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 716e9dde96ef9973fc6b0fc9ca9934df153af9da Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Tue, 14 Sep 2021 15:28:34 -0700
Subject: [PATCH v4] psql: Fix tab completion for ALTER TABLE...

... VALIDATE CONSTRAINT.

Completions now include only constraints which have not yet been
validated.
---
 src/bin/psql/tab-complete.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git src/bin/psql/tab-complete.c src/bin/psql/tab-complete.c
index 5cd5838668..b55166aa64 100644
--- src/bin/psql/tab-complete.c
+++ src/bin/psql/tab-complete.c
@@ -788,6 +788,15 @@ Query_for_index_of_table \
 "   and pg_catalog.quote_ident(c1.relname)='%s'"\
 "   and pg_catalog.pg_table_is_visible(c1.oid)"
 
+/* the silly-looking length condition is just to eat up the current word */
+#define Query_for_constraint_of_table_not_validated \
+"SELECT pg_catalog.quote_ident(conname) "\
+"  FROM pg_catalog.pg_class c1, pg_catalog.pg_constraint con "\
+" WHERE c1.oid=conrelid and (%d = pg_catalog.length('%s'))"\
+"   and pg_catalog.quote_ident(c1.relname)='%s'"\
+"   and pg_catalog.pg_table_is_visible(c1.oid)" \
+"   and NOT con.convalidated"
+
 #define Query_for_all_table_constraints \
 "SELECT pg_catalog.quote_ident(conname) "\
 "  FROM pg_catalog.pg_constraint c "\
@@ -2145,14 +2154,22 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH_ATTR(prev3_wd, "");
 
 	/*
-	 * If we have ALTER TABLE  ALTER|DROP|RENAME|VALIDATE CONSTRAINT,
+	 * If we have ALTER TABLE  ALTER|DROP|RENAME CONSTRAINT,
 	 * provide list of constraints
 	 */
-	else if (Matches("ALTER", "TABLE", MatchAny, "ALTER|DROP|RENAME|VALIDATE", "CONSTRAINT"))
+	else if (Matches("ALTER", "TABLE", MatchAny, "ALTER|DROP|RENAME", "CONSTRAINT"))
 	{
 		completion_info_charp = prev3_wd;
 		COMPLETE_WITH_QUERY(Query_for_constraint_of_table);
 	}
+	/*
+	 * ALTER TABLE  VALIDATE CONSTRAINT 
+	 */
+	else if (Matches("ALTER", "TABLE", MatchAny, "VALIDATE", "CONSTRAINT"))
+	{
+		completion_info_charp = prev3_wd;
+		COMPLETE_WITH_QUERY(Query_for_constraint_of_table_not_validated);
+	}
 	/* ALTER TABLE ALTER [COLUMN]  */
 	else if (Matches("ALTER", "TABLE", MatchAny, "ALTER", "COLUMN", MatchAny) ||
 			 Matches("ALTER", "TABLE", MatchAny, "ALTER", MatchAny))
-- 
2.31.1



Re: parallelizing the archiver

2021-09-14 Thread Julien Rouhaud
On Wed, Sep 15, 2021 at 4:14 AM Stephen Frost  wrote:
>
> > > I'm not proposing to remove existing archive_command. Just deprecate it 
> > > one-WAL-per-call form.
> >
> > Which is a big API beak.
>
> We definitely need to stop being afraid of this.  We completely changed
> around how restores work and made pretty much all of the backup/restore
> tools have to make serious changes when we released v12.

I never said that we should avoid API break at all cost, I said that
if we break the API we should introduce something better.  The
proposal to pass multiple file names to the archive command said
nothing about how to tell which ones were successfully archived and
which ones weren't, which is a big problem in my opinion.  But I think
we should also consider different approach, such as maintaining some
kind of daemon and asynchronously passing all the WAL file names,
waiting for answers.  Or maybe something else.  It's just that simply
"passing multiple file names" doesn't seem like a big enough win to
justify an API break to me.

> I definitely don't think that we should be making assumptions that
> changing archive command to start running things in parallel isn't
> *also* an API break too, in any case.  It is also a change and there's
> definitely a good chance that it'd break some of the archivers out
> there.  If we're going to make a change here, let's make a sensible one.

But doing parallel archiving can and should be controlled with a GUC,
so if your archive_command isn't compatible you can simply just not
use it (on top of having a default of not using parallel archiving, at
least for some times).  It doesn't seem like a big problem.




Re: Signed vs Unsigned (take 2) (src/backend/storage/ipc/procarray.c)

2021-09-14 Thread Fujii Masao




On 2021/09/11 12:21, Fujii Masao wrote:



On 2021/07/23 20:07, Ranier Vilela wrote:

Em sex., 23 de jul. de 2021 às 07:02, Aleksander Alekseev mailto:aleksan...@timescale.com>> escreveu:

    Hi hackers,

    The following review has been posted through the commitfest application:
    make installcheck-world:  tested, passed
    Implements feature:       tested, passed
    Spec compliant:           tested, passed
    Documentation:            tested, passed

    The patch was tested on MacOS against master `80ba4bb3`.

    The new status of this patch is: Ready for Committer


    The second patch seems fine too. I'm attaching both patches to trigger 
cfbot and to double-check them.

Thanks Aleksander, for reviewing this.


I looked at these patches because they are marked as ready for committer.
They don't change any actual behavior, but look valid to me in term of coding.
Barring any objection, I will commit them.



No need to backpatch, why this patch is classified as
refactoring only.


I found this in the commit log in the patch. I agree that these patches
are refactoring ones. But I'm thinking that it's worth doing back-patch,
to make future back-patching easy. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: PG Docs for ALTER SUBSCRIPTION REFRESH PUBLICATION - copy_data option

2021-09-14 Thread Amit Kapila
On Wed, Sep 15, 2021 at 8:49 AM Peter Smith  wrote:
>
> On Tue, Sep 14, 2021 at 8:33 PM Amit Kapila  wrote:
> >
> > On Fri, Jun 25, 2021 at 9:20 AM Peter Smith  wrote:
> > >
> > > But I recently learned that when there are partitions in the
> > > publication, then toggling the value of the PUBLICATION option
> > > "publish_via_partition_root" [3] can also *implicitly* change the list
> > > published tables, and therefore that too might cause any ASRP to make
> > > use of the copy_data value for those implicitly added
> > > partitions/tables.
> > >
> >
> > I have tried the below example in this context but didn't see any
> > effect on changing via_root option.
>
> Thanks for trying to reproduce. I also thought your steps were the
> same as what I'd previously done but it seems like it was a bit
> different. Below are my steps to observe some unexpected COPY
> happening. Actually, now I am no longer sure if this is just a
> documentation issue; perhaps it is a bug.
>

Yeah, this looks odd to me as well.

-- 
With Regards,
Amit Kapila.




Re: Unicode 14.0.0 update

2021-09-14 Thread Michael Paquier
On Tue, Sep 14, 2021 at 05:30:27PM -0400, John Naylor wrote:
> I've attached the patch for including this update in our sources. I'll
> apply it on master after doing some sanity checks. The announcement can be
> found here:
> 
> http://blog.unicode.org/2021/09/announcing-unicode-standard-version-140.html

Thanks for picking this up!
--
Michael


signature.asc
Description: PGP signature


Re: Column Filtering in Logical Replication

2021-09-14 Thread Amit Kapila
On Mon, Sep 6, 2021 at 11:21 PM Alvaro Herrera  wrote:
>
> I pushed the clerical part of this -- namely the addition of
> PublicationTable node and PublicationRelInfo struct.
>

One point to note here is that we are developing a generic grammar for
publications where not only tables but other objects like schema,
sequences, etc. can be specified, see [1]. So, there is some overlap
in the grammar modifications being made by this patch and the work
being done in that other thread. As both the patches are being
developed at the same time, it might be better to be in sync,
otherwise, some of the work needs to be changed. I can see that in the
patch [2] (v28-0002-Added-schema-level-support-for-publication) being
developed there the changes made by the above commit needs to be
changed again to represent a generic object for publication. It is
possible that we can do it some other way but I think it would be
better to coordinate the work in both threads. The other approach is
to continue independently and the later patch can adapt to the earlier
one which is fine too but it might be more work for the later one.

[1] - https://www.postgresql.org/message-id/877603.1629120678%40sss.pgh.pa.us
[2] - 
postgresql.org/message-id/CALDaNm0OudeDeFN7bSWPro0hgKx%3D1zPgcNFWnvU_G6w3mDPX0Q%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: PG Docs for ALTER SUBSCRIPTION REFRESH PUBLICATION - copy_data option

2021-09-14 Thread Peter Smith
On Tue, Sep 14, 2021 at 8:33 PM Amit Kapila  wrote:
>
> On Fri, Jun 25, 2021 at 9:20 AM Peter Smith  wrote:
> >
> > But I recently learned that when there are partitions in the
> > publication, then toggling the value of the PUBLICATION option
> > "publish_via_partition_root" [3] can also *implicitly* change the list
> > published tables, and therefore that too might cause any ASRP to make
> > use of the copy_data value for those implicitly added
> > partitions/tables.
> >
>
> I have tried the below example in this context but didn't see any
> effect on changing via_root option.

Thanks for trying to reproduce. I also thought your steps were the
same as what I'd previously done but it seems like it was a bit
different. Below are my steps to observe some unexpected COPY
happening. Actually, now I am no longer sure if this is just a
documentation issue; perhaps it is a bug.

STEP 1 - create partition tables on both sides
===

[PUB and SUB]

postgres=# create table troot (a int) partition by range(a);
CREATE TABLE
postgres=# create table tless10 partition of troot for values from (1) to (9);
CREATE TABLE
postgres=# create table tmore10 partition of troot for values from (10) to (99);
CREATE TABLE

STEP 2 - insert some data on pub-side
==

[PUB]

postgres=# insert into troot values (1),(2),(3);
INSERT 0 3
postgres=# insert into troot values (11),(12),(13);
INSERT 0 3

postgres=# select * from troot;
 a

  1
  2
  3
 11
 12
 13
(6 rows)

STEP 3 - create a publication on the partition root
==

[PUB]

postgres=# CREATE PUBLICATION pub1 FOR TABLE troot;
CREATE PUBLICATION
postgres=# \dRp+ pub1;
  Publication pub1
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root
--++-+-+-+---+--
 postgres | f  | t   | t   | t   | t | f
Tables:
"public.troot"


STEP 4 - create the subscriber
===

[SUB]

postgres=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=127.0.0.1
port=5432 dbname=postgres' PUBLICATION pub1;
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
postgres=# 2021-09-15 12:45:12.224 AEST [30592] LOG:  logical
replication apply worker for subscription "sub1" has started
2021-09-15 12:45:12.236 AEST [30595] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tless10" has
started
2021-09-15 12:45:12.247 AEST [30598] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tmore10" has
started
2021-09-15 12:45:12.326 AEST [30595] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tless10" has
finished
2021-09-15 12:45:12.332 AEST [30598] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tmore10" has
finished

postgres=# select * from troot;
 a

  1
  2
  3
 11
 12
 13
(6 rows)

// To this point, everything looks OK...

STEP 5 - toggle the publish_via_partition_root flag
==

[PUB]

postgres=# alter publication pub1 set (publish_via_partition_root = true);
ALTER PUBLICATION
postgres=# \dRp+ pub1;
  Publication pub1
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root
--++-+-+-+---+--
 postgres | f  | t   | t   | t   | t | t
Tables:
"public.troot"

// And then refresh the subscriber

[SUB]

postgres=# alter subscription sub1 refresh PUBLICATION;
ALTER SUBSCRIPTION
postgres=# 2021-09-15 12:48:37.927 AEST [3861] LOG:  logical
replication table synchronization worker for subscription "sub1",
table "troot" has started
2021-09-15 12:48:37.977 AEST [3861] LOG:  logical replication table
synchronization worker for subscription "sub1", table "troot" has
finished

// Notice above that another tablesync worker has launched and copied
everything again - BUG??

[SUB]

postgres=# select * from troot;
 a

  1
  2
  3
  1
  2
  3
 11
 12
 13
 11
 12
 13
(12 rows)

// At this point if I would keep toggling the
publish_via_partition_root then each time I do subscription REFRESH
PUBLICATION it will copy the data yet again. For example,

[PUB]

postgres=# alter publication pub1 set (publish_via_partition_root = false);
ALTER PUBLICATION

[SUB]

postgres=# alter subscription sub1 refresh PUBLICATION;
ALTER SUBSCRIPTION
postgres=# 2021-09-15 12:59:02.106 AEST [21709] LOG:  logical
replication table synchronization worker for subscription "sub1",
table "tless10" has started
2021-09-15 12:59:02.120 AEST [21711] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tmore10" has
started
2021-09-15 12:59:02.189 AEST [21709] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tless10" 

Re: Estimating HugePages Requirements?

2021-09-14 Thread Michael Paquier
On Tue, Sep 14, 2021 at 06:00:44PM +, Bossart, Nathan wrote:
> I think I see more support for shared_memory_size_in_huge_pages than
> for huge_pages_needed_for_shared_memory at the moment.  I'll update
> the patch set in the next day or two to use
> shared_memory_size_in_huge_pages unless something changes in the
> meantime.

I have been looking at the patch to add the new GUC flag and the new
sequence for postgres -C, and we could have some TAP tests.  There
were two places that made sense to me: pg_checksums/t/002_actions.pl
and recovery/t/017_shm.pl.  I have chosen the former as it has
coverage across more platforms, and used data_checksums for this
purpose, with an extra positive test to check for the case where a GUC
can be queried while the server is running.

There are four parameters that are updated here:
- shared_memory_size
- wal_segment_size
- data_checksums
- data_directory_mode
That looks sensible, after looking at the full set of GUCs.

Attached is a refreshed patch (commit message is the same as v9 for
now), with some minor tweaks and the tests.

Thoughts?
--
Michael
From e24d151efe45e1bc7048ec72ea66097c94f633be Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 15 Sep 2021 12:02:32 +0900
Subject: [PATCH v10] Provide useful values for 'postgres -C' with 
 runtime-computed GUCs

The -C option is handled before a small subset of GUCs that are
computed at runtime are initialized.  Unfortunately, we cannot move
this handling to after they are initialized without disallowing
'postgres -C' on a running server.  One notable reason for this is
that loadable libraries' _PG_init() functions are called before all
runtime-computed GUCs are initialized, and this is not guaranteed
to be safe to do on running servers.

In order to provide useful values for 'postgres -C' for runtime-
computed GUCs, this change adds a new section for handling just
these GUCs just before shared memory is initialized.  While users
won't be able to use -C for runtime-computed GUCs on running
servers, providing a useful value with this restriction seems
better than not providing a useful value at all.
---
 src/include/utils/guc.h   |  6 
 src/backend/postmaster/postmaster.c   | 50 +++
 src/backend/utils/misc/guc.c  |  8 ++---
 src/bin/pg_checksums/t/002_actions.pl | 21 ++-
 doc/src/sgml/ref/postgres-ref.sgml| 11 --
 5 files changed, 82 insertions(+), 14 deletions(-)

diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index a7c3a4958e..aa18d304ac 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -229,6 +229,12 @@ typedef enum
 
 #define GUC_EXPLAIN  0x10  /* include in explain */
 
+/*
+ * GUC_RUNTIME_COMPUTED is intended for runtime-computed GUCs that are only
+ * available via 'postgres -C' if the server is not running.
+ */
+#define GUC_RUNTIME_COMPUTED  0x20
+
 #define GUC_UNIT   (GUC_UNIT_MEMORY | 
GUC_UNIT_TIME)
 
 
diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index b2fe420c3c..f5b91c7af7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -896,15 +896,31 @@ PostmasterMain(int argc, char *argv[])
if (output_config_variable != NULL)
{
/*
-* "-C guc" was specified, so print GUC's value and exit.  No 
extra
-* permission check is needed because the user is reading 
inside the
-* data dir.
+* If this is a runtime-computed GUC, it hasn't yet been 
initialized,
+* and the present value is not useful.  However, this is a 
convenient
+* place to print the value for most GUCs because it is safe to 
run
+* postmaster startup to this point even if the server is 
already
+* running.  For the handful of runtime-computed GUCs that we 
can't
+* provide meaningful values for yet, we wait until later in
+* postmaster startup to print the value.  We won't be able to 
use -C
+* on running servers for those GUCs, but otherwise this option 
is
+* unusable for them.
 */
-   const char *config_val = GetConfigOption(output_config_variable,
-   
 false, false);
+   int flags = 
GetConfigOptionFlags(output_config_variable, true);
 
-   puts(config_val ? config_val : "");
-   ExitPostmaster(0);
+   if ((flags & GUC_RUNTIME_COMPUTED) == 0)
+   {
+   /*
+* "-C guc" was specified, so print GUC's value and 
exit.  No
+* extra permission check is needed because the user is 
reading
+* inside the data

Re: prevent immature WAL streaming

2021-09-14 Thread Kyotaro Horiguchi
At Tue, 14 Sep 2021 22:32:04 -0300, Alvaro Herrera  
wrote in 
> On 2021-Sep-14, Alvaro Herrera wrote:
> 
> > On 2021-Sep-08, Kyotaro Horiguchi wrote:
> > 
> > > Thanks!  As my understanding the new record add the ability to
> > > cross-check between a teard-off contrecord and the new record inserted
> > > after the teard-off record.  I didn't test the version by myself but
> > > the previous version implemented the essential machinery and that
> > > won't change fundamentally by the new record.
> > > 
> > > So I think the current patch deserves to see the algorithm actually
> > > works against the problem.
> > 
> > Here's a version with the new record type.  It passes check-world, and
> > it seems to work correctly to prevent overwrite of the tail end of a
> > segment containing a broken record.  This is very much WIP still;
> > comments are missing and I haven't tried to implement any sort of
> > verification that the record being aborted is the right one.
> 
> Here's the attachment I forgot earlier.

(I missed the chance to complain about that:p)

Tnaks for the patch!

-   CopyXLogRecordToWAL(rechdr->xl_tot_len, isLogSwitch, rdata,
-   StartPos, EndPos);
+   CopyXLogRecordToWAL(rechdr->xl_tot_len, isLogSwitch,
+   flags & 
XLOG_SET_ABORTED_PARTIAL,
+   rdata, StartPos, 
EndPos);

The new xlog flag XLOG_SET_ABORTED_PARTIAL is used only by
RM_XLOG_ID/XLOG_OVERWRITE_CONTRECORD records, so the flag value is the
equivalent of the record type.  We might instead want a new flag
XLOG_SPECIAL_TREATED_RECORD or something to quickly distinguish
records that need a special treat like XLOG_SWITCH.

  if (flags & XLOG_SPECIAL_TREATED_RECORD)
  {
if (rechdr->xl_rmid == RM_XLOG_ID)
{
   if (info ==  XLOG_SWITCH)
  isLogSwitch = true;
   if (info == XLOG_OVERWRITE_CONTRECORD)
  isOverwrite = true;
}
  }
  ..
  CopyXLogRecrodToWAL(.., isLogSwitch, isOverwrite, rdata, StartPos, EndPos);


+   /* XXX can only happen once in the loop. Verify? */
+   if (set_aborted_partial)
+   pagehdr->xlp_info |= 
XLP_FIRST_IS_ABORTED_PARTIAL;
+

I'm not sure about the reason for the change from the previous patch
(I might be missing something), this sets the flag on the *next* page
of the page where the record starts.  So in the first place we
shouldn't set the flag there.  The page header flags of the first page
is set by AdvanceXLInsertBuffer. If we want to set the flag in the
function, we need to find the page header for the beginning of the
record and make sure that the record is placed at the beginning of the
page.  (It is the reason that I did that in AdvanceXLInsertBuffer..)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: .ready and .done files considered harmful

2021-09-14 Thread Kyotaro Horiguchi
At Tue, 14 Sep 2021 18:07:31 +, "Bossart, Nathan"  
wrote in 
> On 9/14/21, 9:18 AM, "Bossart, Nathan"  wrote:
> > This is an interesting idea, but the "else" block here seems prone to
> > race conditions.  I think we'd have to hold arch_lck to prevent that.
> > But as I mentioned above, if we are okay with depending on the
> > fallback directory scans, I think we can remove the "else" block
> > completely.
> 
> Thinking further, we probably need to hold a lock even when we are
> creating the .ready file to avoid race conditions.

The race condition surely happens, but even if that happens, all
competing processes except one of them detect out-of-order and will
enforce directory scan.  But I'm not sure how it behaves under more
complex situation so I'm not sure I like that behavior.

We could just use another lock for the logic there, but instead
couldn't we merge PgArchGetLastReadySegNo and PgArchSetLastReadySegNo
into one atomic test-and-(check-and-)set function?  Like this.


   XLogFromFileName(xlog, &tli, &this_segno, wal_segment_size);
   if (!PgArchReadySegIsInOrder(this_segno))
  PgArchForceDirScan();

bool
PgArchReadySegIsInOrder(XLogSegNo this_segno)
{
bool in_order = true;

SpinLockAcquire(&PgArch->arch_lck);
if (PgArch->lastReadySegNo + 1 != this_segno)
   in_order = false;
PgArch->lastReadySegNo = this_segno;
SpinLockRelease(&PgArch->arch_lck);

return in_order;
}


By the way, it seems to me that we only need to force directory scan
when notification seg number steps back.  If this is correct, we can
reduce the number how many times we enforce directory scans.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: prevent immature WAL streaming

2021-09-14 Thread Alvaro Herrera
On 2021-Sep-14, Alvaro Herrera wrote:

> On 2021-Sep-08, Kyotaro Horiguchi wrote:
> 
> > Thanks!  As my understanding the new record add the ability to
> > cross-check between a teard-off contrecord and the new record inserted
> > after the teard-off record.  I didn't test the version by myself but
> > the previous version implemented the essential machinery and that
> > won't change fundamentally by the new record.
> > 
> > So I think the current patch deserves to see the algorithm actually
> > works against the problem.
> 
> Here's a version with the new record type.  It passes check-world, and
> it seems to work correctly to prevent overwrite of the tail end of a
> segment containing a broken record.  This is very much WIP still;
> comments are missing and I haven't tried to implement any sort of
> verification that the record being aborted is the right one.

Here's the attachment I forgot earlier.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"[PostgreSQL] is a great group; in my opinion it is THE best open source
development communities in existence anywhere."(Lamar Owen)
>From 3de9660b2e570604412d435023dcbd13355fc123 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 2 Sep 2021 17:21:46 -0400
Subject: [PATCH v5] Implement FIRST_IS_ABORTED_CONTRECORD

---
 src/backend/access/rmgrdesc/xlogdesc.c  | 11 +++
 src/backend/access/transam/xlog.c   | 91 +++--
 src/backend/access/transam/xloginsert.c |  3 +
 src/backend/access/transam/xlogreader.c | 39 ++-
 src/include/access/xlog.h   |  1 +
 src/include/access/xlog_internal.h  | 20 +-
 src/include/access/xlogreader.h |  3 +
 src/include/catalog/pg_control.h|  1 +
 8 files changed, 159 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index e6090a9dad..0be382f8a5 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -139,6 +139,14 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 		 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_OVERWRITE_CONTRECORD)
+	{
+		xl_overwrite_contrecord xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_overwrite_contrecord));
+		appendStringInfo(buf, "lsn %X/%X",
+		 LSN_FORMAT_ARGS(xlrec.overwritten_lsn));
+	}
 }
 
 const char *
@@ -178,6 +186,9 @@ xlog_identify(uint8 info)
 		case XLOG_END_OF_RECOVERY:
 			id = "END_OF_RECOVERY";
 			break;
+		case XLOG_OVERWRITE_CONTRECORD:
+			id = "OVERWRITE_CONTRECORD";
+			break;
 		case XLOG_FPI:
 			id = "FPI";
 			break;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e51a7a749d..73cfd64125 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -586,6 +586,8 @@ typedef struct XLogCtlData
 	XLogRecPtr	replicationSlotMinLSN;	/* oldest LSN needed by any slot */
 
 	XLogSegNo	lastRemovedSegNo;	/* latest removed/recycled XLOG segment */
+	XLogRecPtr	abortedContrecordPtr; /* LSN of incomplete record at end of
+	   * WAL */
 
 	/* Fake LSN counter, for unlogged relations. Protected by ulsn_lck. */
 	XLogRecPtr	unloggedLSN;
@@ -894,6 +896,7 @@ static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
 static void CreateEndOfRecoveryRecord(void);
+static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn);
 static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
 static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
 static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
@@ -950,6 +953,7 @@ static void rm_redo_error_callback(void *arg);
 static int	get_sync_bit(int method);
 
 static void CopyXLogRecordToWAL(int write_len, bool isLogSwitch,
+bool set_aborted_partial,
 XLogRecData *rdata,
 XLogRecPtr StartPos, XLogRecPtr EndPos);
 static void ReserveXLogInsertLocation(int size, XLogRecPtr *StartPos,
@@ -1120,8 +1124,9 @@ XLogInsertRecord(XLogRecData *rdata,
 		 * All the record data, including the header, is now ready to be
 		 * inserted. Copy the record in the space reserved.
 		 */
-		CopyXLogRecordToWAL(rechdr->xl_tot_len, isLogSwitch, rdata,
-			StartPos, EndPos);
+		CopyXLogRecordToWAL(rechdr->xl_tot_len, isLogSwitch,
+			flags & XLOG_SET_ABORTED_PARTIAL,
+			rdata, StartPos, EndPos);
 
 		/*
 		 * Unless record is flagged as not important, update LSN of last
@@ -1504,7 +1509,8 @@ checkXLogConsistency(XLogReaderState *record)
  * area in the WAL.
  */
 static void
-CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
+CopyXLogRecordToWAL(int write_len, bool isLogSwitch,
+	bool set_aborted_partial, XLogRecData *rdata,
 	XLogRecPtr StartPos, XLogRecPtr EndPos)
 {
 	char	   *currpos;
@@ -1560,6 +1566,10 @@ CopyXL

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-09-14 Thread Alvaro Herrera
Hello Melanie

On 2021-Sep-13, Melanie Plageman wrote:

> I also think it makes sense to rename the pg_stat_buffer_actions view to
> pg_stat_buffers and to name the columns using both the buffer action
> type and buffer type -- e.g. shared, strategy, local. This leaves open
> the possibility of counting buffer actions done on other non-shared
> buffers -- like those done while building indexes or those using local
> buffers. The third patch in the set does this (I wanted to see if it
> made sense before fixing it up into the first patch in the set).

What do you think of the idea of having the "shared/strategy/local"
attribute be a column?  So you'd have up to three rows per buffer action
type.  Users wishing to see an aggregate can just aggregate them, just
like they'd do with pg_buffercache.  I think that leads to an easy
decision with regards to this point:

> I attached a patch with the outline of this idea
> (buffer_type_enum_addition.patch). It doesn't work because
> pg_stat_get_buffer_actions() uses the BufferActionType as an index into
> the values array returned. If I wanted to use a combination of the two
> enums as an indexing mechanism (BufferActionType and BufferType), we
> would end up with a tuple having every combination of the two
> enums--some of which aren't valid. It might not make sense to implement
> this. I do think it is useful to think of these stats as a combination
> of a buffer action and a type of buffer.

Does that seem sensible?


(It's weird to have enum values that are there just to indicate what's
the maximum value.  I think that sort of thing is better done by having
a "#define LAST_THING" that takes the last valid value from the enum.
That would free you from having to handle the last value in switch
blocks, for example.  LAST_OCLASS in dependency.h is a precedent on this.)

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"That sort of implies that there are Emacs keystrokes which aren't obscure.
I've been using it daily for 2 years now and have yet to discover any key
sequence which makes any sense."(Paul Thomas)




Re: [Patch] ALTER SYSTEM READ ONLY

2021-09-14 Thread Mark Dilger


> On Jun 16, 2020, at 6:55 AM, amul sul  wrote:
> 
> (2) if the session is idle, we also need the top-level abort
> record to be written immediately, but can't send an error to the client until 
> the next
> command is issued without losing wire protocol synchronization. For now, we 
> just use
> FATAL to kill the session; maybe this can be improved in the future.

Andres,

I'd like to have a patch that tests the impact of a vacuum running for xid 
wraparound purposes, blocked on a pinned page held by the cursor, when another 
session disables WAL.  It would be very interesting to test how the vacuum 
handles that specific change.  I have not figured out the cleanest way to do 
this, though, as we don't as a project yet have a standard way of setting up 
xid exhaustion in a regression test, do we?  The closest I saw to it was your 
work in [1], but that doesn't seem to have made much headway recently, and is 
designed for the TAP testing infrastructure, which isn't useable from inside an 
isolation test.  Do you have a suggestion how best to continue developing out 
the test infrastructure?


Amul,

The most obvious way to test how your ALTER SYSTEM READ ONLY feature interacts 
with concurrent sessions is using the isolation tester in src/test/isolation/, 
but as it stands now, the first permutation that gets a FATAL causes the test 
to abort and all subsequent permutations to not run.  Attached patch v34-0009 
fixes that.

Attached patch v34-0010 adds a test of cursors opened FOR UPDATE interacting 
with a system that is set read-only by a different session.  The expected 
output is worth reviewing to see how this plays out.  I don't see anything in 
there which is obviously wrong, but some of it is a bit clunky.  For example, 
by the time the client sees an error "FATAL:  WAL is now prohibited", the 
system may already have switched back to read-write.  Also, it is a bit strange 
to get one of these errors on an attempted ROLLBACK.  Once again, not wrong as 
such, but clunky.





v34-0009-Make-isolationtester-handle-closed-sessions.patch
Description: Binary data


v34-0010-Test-ALTER-SYSTEM-READ-ONLY-against-cursors.patch
Description: Binary data


[1] 
https://www.postgresql.org/message-id/flat/CAP4vRV5gEHFLB7NwOE6_dyHAeVfkvqF8Z_g5GaCQZNgBAE0Frw%40mail.gmail.com#e10861372aec22119b66756ecbac581c

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: [PATCH] test/ssl: rework the sslfiles Makefile target

2021-09-14 Thread Daniel Gustafsson
> On 15 Sep 2021, at 00:14, Jacob Champion  wrote:
> On Mon, 2021-09-13 at 15:04 +0200, Daniel Gustafsson wrote:

>> -# Convert client.key to encrypted PEM (X.509 text) and DER (X.509 ASN.1) 
>> formats
>> -# to test libpq's support for the sslpassword= option.
>> -ssl/client-encrypted-pem.key: outform := PEM
>> -ssl/client-encrypted-der.key: outform := DER
>> +# Convert client.key to encrypted PEM (X.509 text) and DER (X.509 ASN.1)
>> +# formats to test libpq's support for the sslpassword= option.
>> ssl/client-encrypted-pem.key ssl/client-encrypted-der.key: ssl/client.key
>> -   openssl rsa -in $< -outform $(outform) -aes128 -passout 
>> 'pass:dUmmyP^#+' -out $@
>> +   openssl rsa -in $< -outform PEM -aes256 -passout 'pass:dUmmyP^#+' 
>> -out $@
>> +ssl/client-encrypted-der.key: ssl/client.key
>> +   openssl rsa -in $< -outform DER -passout 'pass:dUmmyP^#+' -out $@
> 
> 1. Should the DER key be AES256 as well?

It should, but then it fails to load by postgres, my email wasn't clear about
this, sorry.  The diff to revert from aes256 (and aes128 for that matter) is to
make the key load at all.

> 2. The ssl/client-encrypted-der.key target for the first recipe should
> be removed; I get a duplication warning from Make.

Interesting, I didn't see that, will check.

> 3. The new client key will need to be included in the patch; the one
> there now is still the AES128 version.

Good point, that's a reason to keep it aes128 until the encrypter DER key in
3.0.0 issue has been fixed.

> And one doc comment:
> 
>> ssl/ subdirectory. The Makefile also contains a rule, "make sslfiles", to
>> -recreate them if you need to make changes.
>> +recreate them if you need to make changes. "make sslfiles-clean" is required
>> +in order to recreate.
> 
> This is only true if you need to rebuild the entire tree; if you just
> want to recreate a single cert pair, you can just touch the config file
> for it (or remove the key, if you want to regenerate the pair) and
> `make sslfiles` again.

Correct.  In my head, "rebuild" is when dealing with individually changed files
and "recreate" means rebuild everything regardless.  Thats just my in my head
though, so clearly the wording should be expanded.  Will do.

--
Daniel Gustafsson   https://vmware.com/





Re: Remove duplicate static function check_permissions in slotfuncs.c and logicalfuncs.c

2021-09-14 Thread Michael Paquier
On Tue, Sep 14, 2021 at 12:57:47PM -0300, Alvaro Herrera wrote:
> The parentheses that commit e3a87b4991cc removed the requirement for are
> those that the committed code still has, starting at the errcode() line.
> The ones in errmsg() were redundant and have never been necessary.

Indeed, thanks!
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] test/ssl: rework the sslfiles Makefile target

2021-09-14 Thread Jacob Champion
On Mon, 2021-09-13 at 15:04 +0200, Daniel Gustafsson wrote:
> A few things noted (and fixed as per the attached, which is v6 squashed and
> rebased on HEAD; commitmessage hasn't been addressed yet) while reviewing:
> 
> * Various comment reflowing to fit within 79 characters
> 
> * Pass through the clean targets into sslfiles.mk rather than rewrite them to
> make clean (even if they are the same in sslfiles.mk).
> 
> * Moved the lists of keys/certs to be one object per line to match the style
> introduced in 01368e5d9.  The previous Makefile was violating that as well, 
> but
> when we're fixing this file for other things we might as well fix that too.

Looks good, thanks!

> * Bumped the password protected key output to AES256 to match the server 
> files,
> since it's more realistic to see than AES128 (and I was fiddling around here
> anyways testing different things, see below).

Few thoughts about this part of the diff:

> -# Convert client.key to encrypted PEM (X.509 text) and DER (X.509 ASN.1) 
> formats
> -# to test libpq's support for the sslpassword= option.
> -ssl/client-encrypted-pem.key: outform := PEM
> -ssl/client-encrypted-der.key: outform := DER
> +# Convert client.key to encrypted PEM (X.509 text) and DER (X.509 ASN.1)
> +# formats to test libpq's support for the sslpassword= option.
>  ssl/client-encrypted-pem.key ssl/client-encrypted-der.key: ssl/client.key
> -   openssl rsa -in $< -outform $(outform) -aes128 -passout 
> 'pass:dUmmyP^#+' -out $@
> +   openssl rsa -in $< -outform PEM -aes256 -passout 'pass:dUmmyP^#+' 
> -out $@
> +ssl/client-encrypted-der.key: ssl/client.key
> +   openssl rsa -in $< -outform DER -passout 'pass:dUmmyP^#+' -out $@

1. Should the DER key be AES256 as well?
2. The ssl/client-encrypted-der.key target for the first recipe should
be removed; I get a duplication warning from Make.
3. The new client key will need to be included in the patch; the one
there now is still the AES128 version.

And one doc comment:

>  ssl/ subdirectory. The Makefile also contains a rule, "make sslfiles", to
> -recreate them if you need to make changes.
> +recreate them if you need to make changes. "make sslfiles-clean" is required
> +in order to recreate.

This is only true if you need to rebuild the entire tree; if you just
want to recreate a single cert pair, you can just touch the config file
for it (or remove the key, if you want to regenerate the pair) and
`make sslfiles` again.

> The submitted patch works for 1.0.1, 1.0.2 and 1.1.1 when running the below
> sequence:
> 
> make check
> make ssfiles-clean
> make sslfiles
> make check
> 
> Testing what's in the tree, recreating the keys/certs and testing against the
> new ones.

Great, thanks!

> In OpenSSL 3.0.0, the final make check on the generated files breaks on the
> encrypted DER key.  The key generates fine, and running "openssl rsa ..
> -check" validates it, but it fails to load into postgres.  Removing the
> explicit selection of cipher makes the test pass again (also included in the
> attached).  I haven't been able to track down exactly what the issue is, but I
> have a suspicion that it's in OpenSSL rather than libpq.  This issue is 
> present
> in master too, so fixing it is orthogonal to this work, but it needs to be
> figured out.
> 
> Another point of interest here is that 3.0.0 put "openssl rsa" in maintenance
> mode, so maybe we'll have to look at supporting "openssl pkeyutl" as well for
> some parts should future bugs remain unfixed in the rsa command.

Good to know. Agreed that it should be a separate patch.

--Jacob


Re: BUG #15293: Stored Procedure Triggered by Logical Replication is Unable to use Notification Events

2021-09-14 Thread Tom Lane
I wrote:
> Hence, I present the attached, which also tweaks things to avoid an
> extra pq_flush in the end-of-command code path, and improves the
> comments to discuss the issue of NOTIFYs sent by procedures.

Hearing no comments, I pushed that.

> I'm inclined to think we should flat-out reject LISTEN in any process
> that is not attached to a frontend, at least until somebody takes the
> trouble to add infrastructure that would let it be useful.  I've not
> done that here though; I'm not quite sure what we should test for.

After a bit of looking around, it seems that MyBackendType addresses
that problem nicely, so I propose the attached.

regards, tom lane

diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 27fbf1f3aa..bf085aa93b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -803,6 +803,23 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 ListenStmt *stmt = (ListenStmt *) parsetree;
 
 CheckRestrictedOperation("LISTEN");
+
+/*
+ * We don't allow LISTEN in background processes, as there is
+ * no mechanism for them to collect NOTIFY messages, so they'd
+ * just block cleanout of the async SLRU indefinitely.
+ * (Authors of custom background workers could bypass this
+ * restriction by calling Async_Listen directly, but then it's
+ * on them to provide some mechanism to process the message
+ * queue.)  Note there seems no reason to forbid UNLISTEN.
+ */
+if (MyBackendType != B_BACKEND)
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+	/* translator: %s is name of a SQL command, eg LISTEN */
+			 errmsg("cannot execute %s within a background process",
+	"LISTEN")));
+
 Async_Listen(stmt->conditionname);
 			}
 			break;


Re: [EXTERNAL] Re: Don't clean up LLVM state when exiting in a bad way

2021-09-14 Thread Jelte Fennema
> So I wonder, isn't the fixed usage issue specific to LLVM 7

That's definitely possible. I was unable to reproduce the issue I shared in my 
original email when postgres was compiled with LLVM 10. 

That's also why I sent an email to the pgsql-pkg-yum mailing list about options 
to use a newer version of LLVM on CentOS 7: 
https://www.postgresql.org/message-id/flat/AM5PR83MB0178475D87EFA290A4D0793DF7FF9%40AM5PR83MB0178.EURPRD83.prod.outlook.com
 (no response so far though)



Re: parallelizing the archiver

2021-09-14 Thread Stephen Frost
Greetings,

* Julien Rouhaud (rjuju...@gmail.com) wrote:
> On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin  wrote:
> > > 10 сент. 2021 г., в 10:52, Julien Rouhaud  написал(а):
> > > Yes, but it also means that it's up to every single archiving tool to
> > > implement a somewhat hackish parallel version of an archive_command,
> > > hoping that core won't break it.

We've got too many archiving tools as it is, if you want my 2c on that.

> > I'm not proposing to remove existing archive_command. Just deprecate it 
> > one-WAL-per-call form.
> 
> Which is a big API beak.

We definitely need to stop being afraid of this.  We completely changed
around how restores work and made pretty much all of the backup/restore
tools have to make serious changes when we released v12.

I definitely don't think that we should be making assumptions that
changing archive command to start running things in parallel isn't
*also* an API break too, in any case.  It is also a change and there's
definitely a good chance that it'd break some of the archivers out
there.  If we're going to make a change here, let's make a sensible one.

> > It's a very simplistic approach. If some GUC is set - archiver will just 
> > feed ready files to stdin of archive command. What fundamental design 
> > changes we need?

Haven't really thought about this proposal but it does sound
interesting.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Getting ERROR "subplan "SubPlan 1" was not initialized" in EXISTS subplan when using for list partition.

2021-09-14 Thread Tom Lane
Zhihong Yu  writes:
> In the fix, isUsedSubplan is used to tell whether any given subplan is used.
> Since only one subplan is used, I wonder if the array can be replaced by
> specifying the subplan is used.

That doesn't seem particularly more convenient.  The point of the bool
array is to merge the results from examination of (possibly) many
AlternativeSubPlans.

regards, tom lane




Re: Getting ERROR "subplan "SubPlan 1" was not initialized" in EXISTS subplan when using for list partition.

2021-09-14 Thread Zhihong Yu
On Tue, Sep 14, 2021 at 10:44 AM Tom Lane  wrote:

> Rajkumar Raghuwanshi  writes:
> > I am getting "ERROR:  subplan "SubPlan 1" was not initialized" error with
> > below test case.
>
> > CREATE TABLE tbl ( c1 int, c2 int, c3 int ) PARTITION BY LIST (c1);
> > create table tbl_null PARTITION OF tbl FOR VALUES IN (null);
> > create table tbl_def PARTITION OF tbl DEFAULT;
> > insert into tbl values (8800,0,0);
> > insert into tbl values (1891,1,1);
> > insert into tbl values (3420,2,0);
> > insert into tbl values (9850,3,0);
> > insert into tbl values (7164,4,4);
> > analyze tbl;
> > explain (costs off) select count(*) from tbl t1 where (exists(select 1
> from
> > tbl t2 where t2.c1 = t1.c2) or c3 < 0);
>
> > postgres=# explain (costs off) select count(*) from tbl t1 where
> > (exists(select 1 from tbl t2 where t2.c1 = t1.c2) or c3 < 0);
> > ERROR:  subplan "SubPlan 1" was not initialized
>
> Nice example.  This is failing since 41efb8340.  It happens because
> we copy the AlternativeSubPlan for the EXISTS into the scan clauses
> for each of t1's partitions.  At setrefs.c time, when
> fix_alternative_subplan() looks at the first of these
> AlternativeSubPlans, it decides it likes the first subplan better,
> so it deletes SubPlan 2 from the root->glob->subplans list.  But when
> it comes to the next copy (which is attached to a partition with a
> different number of rows), it likes the second subplan better, so it
> deletes SubPlan 1 from the root->glob->subplans list.  Now we have
> SubPlan nodes in the tree with no referents in the global list of
> subplans, so kaboom.
>
> The easiest fix would just be to not try to delete unreferenced
> subplans.  The error goes away if I remove the "lfirst(lc2) = NULL"
> statements from fix_alternative_subplan().  However, this is a bit
> annoying since then we will still pay the cost of initializing
> subplans that (in most cases) will never be used.  I'm going to
> look into how painful it is to have setrefs.c remove unused subplans
> only at the end, after it's seen all the AlternativeSubPlans.
>
> regards, tom lane
>
>
> Hi,
In the fix, isUsedSubplan is used to tell whether any given subplan is used.
Since only one subplan is used, I wonder if the array can be replaced by
specifying the subplan is used.

Cheers


Re: Don't clean up LLVM state when exiting in a bad way

2021-09-14 Thread Alexander Lakhin
Hello Andres,
14.09.2021 08:05, Andres Freund wrote:
>
>> With LLVM 9 on the same Centos 7 I don't get such segfault. Also it
>> doesn't happen on different OSes with LLVM 7.
> That just like an llvm bug to me. Rather than the usage issue addressed in 
> this thread.
But Justin seen this strangeness too:
>
> I couldn't crash on ubuntu either, so maybe they have a patch which
> fixes this,
> or maybe RH applied a patch which caused it...
>
The script that Justin presented:
> postgres=# CREATE TABLE t AS SELECT i FROM generate_series(1,99)i;
> VACUUM ANALYZE t;
> postgres=# SET client_min_messages=debug; SET statement_timeout=333;
> SET jit_above_cost=0; SET jit_optimize_above_cost=-1; SET
> jit_inline_above_cost=-1; explain analyze SELECT sum(i) FROM t a
> NATURAL JOIN t b;
causes the server crash on Centos 7 with LLVM 7 (even with the fix
applied) and doesn't crash with LLVM 9 (I used llvm-toolset-9* and
devtoolset-9* packages from
https://buildlogs.centos.org/c7-llvm-toolset-9.0.x86_64/ and
https://buildlogs.centos.org/c7-devtoolset-9.x86_64/ repositories).

The another script:
>
> python3 -c "import pg; db=pg.DB('dbname=postgres host=/tmp
> port=5678'); db.query('SET jit_above_cost=0; SET
> jit_inline_above_cost=-1; SET jit=on; SET client_min_messages=debug');
> db.query('begin'); db.query_formatted('SELECT 1 FROM
> generate_series(1,99)a WHERE a=%s', [1], inline=False);"
>
also causes the server crash with LLVM 7, and doesn't crash with LLVM 9.

So I wonder, isn't the fixed usage issue specific to LLVM 7, which is
not going to be supported as having some bugs?

Best regards,
Alexander




Re: Deduplicate code updating ControleFile's DBState.

2021-09-14 Thread Bossart, Nathan
On 9/13/21, 11:06 PM, "Amul Sul"  wrote:
> The patch is straightforward but the only concern is that in
> StartupXLOG(), SharedRecoveryState now gets updated only with spin
> lock; earlier it also had ControlFileLock in addition to that. AFAICU,
> I don't see any problem there, since until the startup process exists
> other backends could not connect and write a WAL record.

It looks like ebdf5bf intentionally made sure that we hold
ControlFileLock while updating SharedRecoveryInProgress (now
SharedRecoveryState after 4e87c48).  The thread for this change [0]
has some additional details.

As far as the patch goes, I'm not sure why SetControlFileDBState()
needs to be exported, and TBH I don't know if this change is really a
worthwhile improvement.  ISTM the main benefit is that it could help
avoid cases where we update the state but not the time.  However,
there's still nothing preventing that, and I don't get the idea that
it was really a big problem to begin with.

Nathan

[0] 
https://postgr.es/m/CAB7nPqTS5J3-G_zTow0Kc5oqZn877RDDN1Mfcqm2PscAS7FnAw%40mail.gmail.com



Re: SQL/JSON: JSON_TABLE - pg_stat_statements crash

2021-09-14 Thread Andrew Dunstan


On 9/14/21 2:04 PM, Erik Rijkers wrote:
> On 9/14/21 2:53 PM, Andrew Dunstan wrote:
>> On 9/13/21 5:41 AM, Erik Rijkers wrote:
>>> On 9/2/21 8:52 PM, Andrew Dunstan wrote:
>>>
>
> >> [0001-SQL-JSON-functions-v51.patch]
> >> [0002-JSON_TABLE-v51.patch]
> >> [0003-JSON_TABLE-PLAN-DEFAULT-clause-v51.patch]
> >> [0004-JSON_TABLE-PLAN-clause-v51.patch]
>
> Thanks, builds fine now.
>
> But I found that the server crashes on certain forms of SQL when
> postgresql.conf has a 'shared_preload_libraries' that contains module
> 'pg_stat_statements' (my value was:
> 'pg_stat_statements,auth_delay,auto_explain,passwordcheck').  Only
> pg_stat_statements seems to cause the problem.
>
> The offending SQL (I took it from the jsonb_sqljson.sql test file):
>
> testdb=# SELECT JSON_EXISTS(jsonb '{"a": 1, "b": 2}', '$.* ? (@ > $x
> && @ < $y)' PASSING 0 AS x, 2 AS y);
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> Time: 2.551 ms
> !?>
>
> (Of course, that SQL running during regression testing has no problems
> as there is then no pg_stat_statements.)
>
> The statement sometimes succeeds but never very often.
>
> The same crash was there before but I only now saw the connection with
> the 'shared_preload_libraries/pg_stat_statements'.
>
> I seem to remember some things changed in pg_stat_statements but I
> didn't follow and don't know who to CC for it.
>
>


Yeah, I had to make a change in that area, looks like I got it wrong.
I'll follow up. Thanks for the report.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-09-14 Thread Mark Rofail
Dear Alvaro,

We just need to rewrite the scope of the patch so I work on the next
generation. I do not know what was "out of scope" in the current version

/Mark

On Tue, 14 Sep 2021, 8:55 pm Alvaro Herrera, 
wrote:

> On 2021-Sep-14, Daniel Gustafsson wrote:
>
> > Given the above, and that nothing has happened on this thread since this
> note,
> > I think we should mark this Returned with Feedback and await a new
> submission.
> > Does that seem reasonable Alvaro?
>
> It seems reasonable to me.
>
> --
> Álvaro Herrera  Valdivia, Chile  —
> https://www.EnterpriseDB.com/
>


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-09-14 Thread Daniel Gustafsson
> On 14 Sep 2021, at 20:54, Alvaro Herrera  wrote:
> 
> On 2021-Sep-14, Daniel Gustafsson wrote:
> 
>> Given the above, and that nothing has happened on this thread since this 
>> note,
>> I think we should mark this Returned with Feedback and await a new 
>> submission.
>> Does that seem reasonable Alvaro?
> 
> It seems reasonable to me.

Thanks, done that way now.

--
Daniel Gustafsson   https://vmware.com/





Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-09-14 Thread Alvaro Herrera
On 2021-Sep-14, Daniel Gustafsson wrote:

> Given the above, and that nothing has happened on this thread since this note,
> I think we should mark this Returned with Feedback and await a new submission.
> Does that seem reasonable Alvaro?

It seems reasonable to me.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-09-14 Thread Daniel Gustafsson
> On 14 Jul 2021, at 18:07, Alvaro Herrera  wrote:
> 
> On 2021-Jul-14, vignesh C wrote:
> 
>> The patch does not apply on Head anymore, could you rebase and post a
>> patch. I'm changing the status to "Waiting for Author".
> 
> BTW I gave a look at this patch in the March commitfest and concluded it
> still requires some major surgery that I didn't have time for.  I did so
> by re-reading early in the thread to understand what the actual
> requirements were for this feature to work, and it seemed to me that the
> patch started to derail at some point.  I suggest that somebody needs to
> write up exactly what we need, lest the patches end up implementing
> something else.
> 
> I don't have time for this patch during the current commitfest, and I'm
> not sure that I will during the next one.  If somebody else wants to
> spend time with it, ... be my guest.

Given the above, and that nothing has happened on this thread since this note,
I think we should mark this Returned with Feedback and await a new submission.
Does that seem reasonable Alvaro?

--
Daniel Gustafsson   https://vmware.com/





Re: SQL/JSON: JSON_TABLE - pg_stat_statements crash

2021-09-14 Thread Pavel Stehule
út 14. 9. 2021 v 20:04 odesílatel Erik Rijkers  napsal:

> On 9/14/21 2:53 PM, Andrew Dunstan wrote:
> > On 9/13/21 5:41 AM, Erik Rijkers wrote:
> >> On 9/2/21 8:52 PM, Andrew Dunstan wrote:
> >>
>
>  >> [0001-SQL-JSON-functions-v51.patch]
>  >> [0002-JSON_TABLE-v51.patch]
>  >> [0003-JSON_TABLE-PLAN-DEFAULT-clause-v51.patch]
>  >> [0004-JSON_TABLE-PLAN-clause-v51.patch]
>
> Thanks, builds fine now.
>
> But I found that the server crashes on certain forms of SQL when
> postgresql.conf has a 'shared_preload_libraries' that contains module
> 'pg_stat_statements' (my value was:
> 'pg_stat_statements,auth_delay,auto_explain,passwordcheck').  Only
> pg_stat_statements seems to cause the problem.
>
> The offending SQL (I took it from the jsonb_sqljson.sql test file):
>
> testdb=# SELECT JSON_EXISTS(jsonb '{"a": 1, "b": 2}', '$.* ? (@ > $x &&
> @ < $y)' PASSING 0 AS x, 2 AS y);
> server closed the connection unexpectedly
>  This probably means the server terminated abnormally
>  before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> Time: 2.551 ms
> !?>
>
> (Of course, that SQL running during regression testing has no problems
> as there is then no pg_stat_statements.)
>
> The statement sometimes succeeds but never very often.
>
> The same crash was there before but I only now saw the connection with
> the 'shared_preload_libraries/pg_stat_statements'.
>
> I seem to remember some things changed in pg_stat_statements but I
> didn't follow and don't know who to CC for it.
>

These issues are easily debugged - you can run gdb in the outer terminal,
and attach it to the psql session. Then you can run the query.

Probably it will be a problem in pg_stat_statements callbacks - maybe query
processing there doesn't know some new nodes that this patch introduces.

Regards

Pavel



> Thanks,
>
> Erik Rijkers
>


Re: .ready and .done files considered harmful

2021-09-14 Thread Bossart, Nathan
On 9/14/21, 9:18 AM, "Bossart, Nathan"  wrote:
> This is an interesting idea, but the "else" block here seems prone to
> race conditions.  I think we'd have to hold arch_lck to prevent that.
> But as I mentioned above, if we are okay with depending on the
> fallback directory scans, I think we can remove the "else" block
> completely.

Thinking further, we probably need to hold a lock even when we are
creating the .ready file to avoid race conditions.

Nathan



Re: SQL/JSON: JSON_TABLE - pg_stat_statements crash

2021-09-14 Thread Erik Rijkers

On 9/14/21 2:53 PM, Andrew Dunstan wrote:

On 9/13/21 5:41 AM, Erik Rijkers wrote:

On 9/2/21 8:52 PM, Andrew Dunstan wrote:



>> [0001-SQL-JSON-functions-v51.patch]
>> [0002-JSON_TABLE-v51.patch]
>> [0003-JSON_TABLE-PLAN-DEFAULT-clause-v51.patch]
>> [0004-JSON_TABLE-PLAN-clause-v51.patch]

Thanks, builds fine now.

But I found that the server crashes on certain forms of SQL when 
postgresql.conf has a 'shared_preload_libraries' that contains module 
'pg_stat_statements' (my value was: 
'pg_stat_statements,auth_delay,auto_explain,passwordcheck').  Only 
pg_stat_statements seems to cause the problem.


The offending SQL (I took it from the jsonb_sqljson.sql test file):

testdb=# SELECT JSON_EXISTS(jsonb '{"a": 1, "b": 2}', '$.* ? (@ > $x && 
@ < $y)' PASSING 0 AS x, 2 AS y);

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 2.551 ms
!?>

(Of course, that SQL running during regression testing has no problems 
as there is then no pg_stat_statements.)


The statement sometimes succeeds but never very often.

The same crash was there before but I only now saw the connection with 
the 'shared_preload_libraries/pg_stat_statements'.


I seem to remember some things changed in pg_stat_statements but I 
didn't follow and don't know who to CC for it.


Thanks,

Erik Rijkers




Re: Estimating HugePages Requirements?

2021-09-14 Thread Bossart, Nathan
On 9/13/21, 5:49 PM, "Kyotaro Horiguchi"  wrote:
> At Tue, 14 Sep 2021 00:30:22 +, "Bossart, Nathan"  
> wrote in
>> On 9/13/21, 1:25 PM, "Tom Lane"  wrote:
>> > Seems like "huge_pages_needed_for_shared_memory" would be sufficient.
>>
>> I think we are down to either shared_memory_size_in_huge_pages or
>> huge_pages_needed_for_shared_memory.  Robert's argument against
>> huge_pages_needed_for_shared_memory was that it might sound like only
>> part of shared memory uses huge pages and we're only giving the number
>> required for that.  Speaking of which, isn't that technically true?
>> For shared_memory_size_in_huge_pages, the intent is to make it sound
>> like we are providing shared_memory_size in terms of the huge page
>> size, but I think it could also be interpreted as "the amount of
>> shared memory that is currently stored in huge pages."
>>
>> I personally lean towards huge_pages_needed_for_shared_memory because
>> it feels the most clear and direct to me.  I'm not vehemently opposed
>> to shared_memory_size_in_huge_pages, though.  I don't think either one
>> is too misleading.
>
> I like 'in' slightly than 'for' in this context. I stand by Michael
> that that name looks somewhat too long especially considering that
> that name won't be completed on shell command lines, but won't fight
> it, too.  On the other hand the full-spelled name can be thought as
> one can spell it out from memory easily than a name halfway shortened.

I think I see more support for shared_memory_size_in_huge_pages than
for huge_pages_needed_for_shared_memory at the moment.  I'll update
the patch set in the next day or two to use
shared_memory_size_in_huge_pages unless something changes in the
meantime.

Nathan



Re: prevent immature WAL streaming

2021-09-14 Thread Alvaro Herrera
On 2021-Sep-08, Kyotaro Horiguchi wrote:

> Thanks!  As my understanding the new record add the ability to
> cross-check between a teard-off contrecord and the new record inserted
> after the teard-off record.  I didn't test the version by myself but
> the previous version implemented the essential machinery and that
> won't change fundamentally by the new record.
> 
> So I think the current patch deserves to see the algorithm actually
> works against the problem.

Here's a version with the new record type.  It passes check-world, and
it seems to work correctly to prevent overwrite of the tail end of a
segment containing a broken record.  This is very much WIP still;
comments are missing and I haven't tried to implement any sort of
verification that the record being aborted is the right one.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"XML!" Exclaimed C++.  "What are you doing here? You're not a programming
language."
"Tell that to the people who use me," said XML.
https://burningbird.net/the-parable-of-the-languages/




Re: Getting ERROR "subplan "SubPlan 1" was not initialized" in EXISTS subplan when using for list partition.

2021-09-14 Thread Tom Lane
Rajkumar Raghuwanshi  writes:
> I am getting "ERROR:  subplan "SubPlan 1" was not initialized" error with
> below test case.

> CREATE TABLE tbl ( c1 int, c2 int, c3 int ) PARTITION BY LIST (c1);
> create table tbl_null PARTITION OF tbl FOR VALUES IN (null);
> create table tbl_def PARTITION OF tbl DEFAULT;
> insert into tbl values (8800,0,0);
> insert into tbl values (1891,1,1);
> insert into tbl values (3420,2,0);
> insert into tbl values (9850,3,0);
> insert into tbl values (7164,4,4);
> analyze tbl;
> explain (costs off) select count(*) from tbl t1 where (exists(select 1 from
> tbl t2 where t2.c1 = t1.c2) or c3 < 0);

> postgres=# explain (costs off) select count(*) from tbl t1 where
> (exists(select 1 from tbl t2 where t2.c1 = t1.c2) or c3 < 0);
> ERROR:  subplan "SubPlan 1" was not initialized

Nice example.  This is failing since 41efb8340.  It happens because
we copy the AlternativeSubPlan for the EXISTS into the scan clauses
for each of t1's partitions.  At setrefs.c time, when
fix_alternative_subplan() looks at the first of these
AlternativeSubPlans, it decides it likes the first subplan better,
so it deletes SubPlan 2 from the root->glob->subplans list.  But when
it comes to the next copy (which is attached to a partition with a
different number of rows), it likes the second subplan better, so it
deletes SubPlan 1 from the root->glob->subplans list.  Now we have
SubPlan nodes in the tree with no referents in the global list of
subplans, so kaboom.

The easiest fix would just be to not try to delete unreferenced
subplans.  The error goes away if I remove the "lfirst(lc2) = NULL"
statements from fix_alternative_subplan().  However, this is a bit
annoying since then we will still pay the cost of initializing
subplans that (in most cases) will never be used.  I'm going to
look into how painful it is to have setrefs.c remove unused subplans
only at the end, after it's seen all the AlternativeSubPlans.

regards, tom lane




Re: .ready and .done files considered harmful

2021-09-14 Thread Bossart, Nathan
On 9/14/21, 7:23 AM, "Dipesh Pandit"  wrote:
> I agree that when we are creating a .ready file we should compare 
> the current .ready file with the last .ready file to check if this file is 
> created out of order. We can store the state of the last .ready file 
> in shared memory and compare it with the current .ready file. I
> believe that archiver specific shared memory area can be used
> to store the state of the last .ready file unless I am missing
> something and this needs to be stored in a separate shared
> memory area.
>
> With this change, we have the flexibility to move the current archiver
> state out of shared memory and keep it local to archiver. I have 
> incorporated these changes and updated a new patch.

I wonder if this can be simplified even further.  If we don't bother
trying to catch out-of-order .ready files in XLogArchiveNotify() and
just depend on the per-checkpoint/restartpoint directory scans, we can
probably remove lastReadySegNo from archiver state completely.

+   /* Force a directory scan if we are archiving anything but a regular
+* WAL file or if this WAL file is being created out-of-order.
+*/
+   if (!IsXLogFileName(xlog))
+   PgArchForceDirScan();
+   else
+   {
+   TimeLineID tli;
+   XLogSegNo last_segno;
+   XLogSegNo this_segno;
+
+   last_segno = PgArchGetLastReadySegNo();
+   XLogFromFileName(xlog, &tli, &this_segno, wal_segment_size);
+
+   /*
+* Force a directory scan in case if this .ready file created 
out of
+* order.
+*/
+   last_segno++;
+   if (last_segno != this_segno)
+   PgArchForceDirScan();
+
+   PgArchSetLastReadySegNo(this_segno);
+   }

This is an interesting idea, but the "else" block here seems prone to
race conditions.  I think we'd have to hold arch_lck to prevent that.
But as I mentioned above, if we are okay with depending on the
fallback directory scans, I think we can remove the "else" block
completely.

+   /* Initialize the current state of archiver */
+   xlogState.lastSegNo = MaxXLogSegNo;
+   xlogState.lastTli = MaxTimeLineID;

It looks like we have two ways to force a directory scan.  We can
either set force_dir_scan to true, or lastSegNo can be set to
MaxXLogSegNo.  Why not just set force_dir_scan to true here so that we
only have one way to force a directory scan?

+   /*
+* Failed to archive, make sure that 
archiver performs a
+* full directory scan in the next 
cycle to avoid missing
+* the WAL file which could not be 
archived due to some
+* failure in current cycle.
+*/
+   PgArchForceDirScan();

Don't we also need to force a directory scan in the other cases we
return early from pgarch_ArchiverCopyLoop()?  We will have already
advanced the archiver state in pgarch_readyXlog(), so I think we'd end
up skipping files if we didn't.  For example, if archive_command isn't
set, we'll just return, and the next call to pgarch_readyXlog() might
return the next file.

+   /* Continue directory scan until we find a regular WAL 
file */
+   SpinLockAcquire(&PgArch->arch_lck);
+   PgArch->force_dir_scan = true;
+   SpinLockRelease(&PgArch->arch_lck);

nitpick: I think we should just call PgArchForceDirScan() here.

Nathan



Re: Remove duplicate static function check_permissions in slotfuncs.c and logicalfuncs.c

2021-09-14 Thread Alvaro Herrera
On 2021-Sep-14, Michael Paquier wrote:

> On Mon, Sep 13, 2021 at 08:51:18AM +0530, Bharath Rupireddy wrote:
> > On Mon, Sep 13, 2021 at 8:07 AM Michael Paquier  wrote:
> >> On Sun, Sep 12, 2021 at 10:14:36PM -0300, Euler Taveira wrote:
> >>> On Sun, Sep 12, 2021, at 8:02 PM, Bossart, Nathan wrote:
>  nitpick: It looks like there's an extra set of parentheses around
>  errmsg().
> >>>
> >>> Indeed. Even the requirement for extra parenthesis around auxiliary 
> >>> function
> >>> calls was removed in v12 (e3a87b4991cc2d00b7a3082abb54c5f12baedfd1).
> 
> Applied.  Not using those extra parenthesis is the most common
> pattern, so tweaked this way.

The parentheses that commit e3a87b4991cc removed the requirement for are
those that the committed code still has, starting at the errcode() line.
The ones in errmsg() were redundant and have never been necessary.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo" (Platon).




Re: refactoring basebackup.c

2021-09-14 Thread Sergei Kornilov
Hello

I found that in 0001 you propose to rename few options. Probably we could 
rename another option for clarify? I think FAST (it's about some bw limits?) 
and WAIT (wait for what? checkpoint?) option names are confusing.
Could we replace FAST with "CHECKPOINT [fast|spread]" and WAIT to 
WAIT_WAL_ARCHIVED? I think such names would be more descriptive.

-   if (PQserverVersion(conn) >= 10)
-   /* pg_recvlogical doesn't use an exported snapshot, so 
suppress */
-   appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");
+   /* pg_recvlogical doesn't use an exported snapshot, so suppress 
*/
+   if (use_new_option_syntax)
+   AppendStringCommandOption(query, use_new_option_syntax,
+  
"SNAPSHOT", "nothing");
+   else
+   AppendPlainCommandOption(query, use_new_option_syntax,
+
"NOEXPORT_SNAPSHOT");

In 0002, it looks like condition for 9.x releases was lost?

Also my gcc version 8.3.0 is not happy with 
v5-0007-Support-base-backup-targets.patch and produces:

basebackup.c: In function ‘parse_basebackup_options’:
basebackup.c:970:7: error: ‘target_str’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
   errmsg("target '%s' does not accept a target detail",
   ^~

regards, Sergei




Re: Physical replication from x86_64 to ARM64

2021-09-14 Thread Andres Freund
Hi,


On September 14, 2021 7:11:25 AM PDT, Tom Lane  wrote:
>Aleksander Alekseev  writes:
>>> Initial experiments show no observable problems when copying PGDATA or in
>>> fact using physical streaming replication between the two CPU architectures.
>
>> That's an interesting result. The topic of physical replication
>> compatibility interested me much back in 2017 and I raised this question on
>> PGCon [1]. As I recall the compatibility is not guaranteed, nor tested, and
>> not going to be, because the community doesn't have resources for this.
>
>Yeah.  As far as the hardware goes, if you have the same endianness,
>struct alignment rules, and floating-point format [1], then physical
>replication ought to work.  Where things get far stickier is if the
>operating systems aren't identical, because then you have very great
>risk of text sorting rules not being the same, leading to index
>corruption [2].  In modern practice that tends to be a bigger issue
>than the hardware, and we don't have any goo d way to check for it.

I'd also be worried about subtle changes in floating point math results, and 
that subsequently leading to index mismatches. Be that because the hardware 
gives differing results, or because libc differences.

Regards,

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: [BUG] Unexpected action when publishing partition tables

2021-09-14 Thread vignesh C
On Tue, Sep 7, 2021 at 11:38 AM houzj.f...@fujitsu.com
 wrote:
>
> From Tues, Sep 7, 2021 12:02 PM Amit Kapila  wrote:
> > On Mon, Sep 6, 2021 at 1:49 PM houzj.f...@fujitsu.com 
> >  wrote:
> > >
> > > I can reproduce this bug.
> > >
> > > I think the reason is it didn't invalidate all the leaf partitions'
> > > relcache when add a partitioned table to the publication, so the
> > > publication info was not rebuilt.
> > >
> > > The following code only invalidate the target table:
> > > ---
> > > PublicationAddTables
> > > publication_add_relation
> > > /* Invalidate relcache so that publication info is 
> > > rebuilt. */
> > > CacheInvalidateRelcache(targetrel);
> > > ---
> > >
> > > In addition, this problem can happen in both ADD TABLE, DROP TABLE,
> > > and SET TABLE cases, so we need to invalidate the leaf partitions'
> > > recache in all these cases.
> > >
> >
> > Few comments:
> > =
> >   {
> > @@ -664,7 +673,13 @@ PublicationDropTables(Oid pubid, List *rels, bool
> > missing_ok)
> >
> >   ObjectAddressSet(obj, PublicationRelRelationId, prid);
> >   performDeletion(&obj, DROP_CASCADE, 0);
> > +
> > + relids = GetPubPartitionOptionRelations(relids, PUBLICATION_PART_LEAF,
> > + relid);
> >   }
> > +
> > + /* Invalidate relcache so that publication info is rebuilt. */
> > + InvalidatePublicationRels(relids);
> >  }
> >
> > We already register the invalidation for the main table in
> > RemovePublicationRelById which is called via performDeletion. I think it is
> > better to perform invalidation for partitions at that place.
> > Similarly is there a reason for not doing invalidations of partitions in
> > publication_add_relation()?
>
> Thanks for the comment. I originally intended to reduce the number of invalid
> message when add/drop serval tables while each table has lots of partitions 
> which
> could exceed the MAX_RELCACHE_INVAL_MSGS. But that seems a rare case, so ,
> I changed the code as suggested.
>
> Attach new version patches which addressed the comment.

Thanks for fixing this issue. The bug gets fixed by the patch, I did
not find any issues in my testing.
I just had one minor comment:

We could clean the table at the end by calling DROP TABLE testpub_parted2:
+-- still fail, because parent's publication replicates updates
+UPDATE testpub_parted2 SET a = 2;
+ERROR:  cannot update table "testpub_parted2" because it does not
have a replica identity and publishes updates
+HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.
+ALTER PUBLICATION testpub_forparted DROP TABLE testpub_parted;
+-- works again, because update is no longer replicated
+UPDATE testpub_parted2 SET a = 2;

Regards,
Vignesh




Re: refactoring basebackup.c

2021-09-14 Thread Dilip Kumar
On Mon, Sep 13, 2021 at 9:42 PM Robert Haas  wrote:
>
> On Mon, Sep 13, 2021 at 7:19 AM Dilip Kumar  wrote:
> > Seems like nothing has been done about the issue reported in [1]
> >
> > This one line change shall fix the issue,
>
> Oops. Try this version.

Thanks, this version works fine.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: .ready and .done files considered harmful

2021-09-14 Thread Dipesh Pandit
Thanks for the feedback.

> The latest post on this thread contained a link to this one, and it
> made me want to rewind to this point in the discussion. Suppose we
> have the following alternative scenario:
>
> Let's say step 1 looks for WAL file 10, but 10.ready doesn't exist
> yet.  The following directory scan ends up finding 12.ready.  Just
> before we update the PgArch state, XLogArchiveNotify() is called and
> creates 11.ready.  However, pg_readyXlog() has already decided to
> return WAL segment 12 and update the state to look for 13 next.
>
> Now, if I'm not mistaken, using <= doesn't help at all.
>
> In my opinion, the problem here is that the natural way to ask "is
> this file being archived out of order?" is to ask yourself "is the
> file that I'm marking as ready for archiving now the one that
> immediately follows the last one I marked as ready for archiving?" and
> then invert the result. That is, if I last marked 10 as ready, and now
> I'm marking 11 as ready, then it's in order, but if I'm now marking
> anything else whatsoever, then it's out of order. But that's not what
> this does. Instead of comparing what it's doing now to what it did
> last, it compares what it did now to what the archiver did last.

I agree that when we are creating a .ready file we should compare
the current .ready file with the last .ready file to check if this file is
created out of order. We can store the state of the last .ready file
in shared memory and compare it with the current .ready file. I
believe that archiver specific shared memory area can be used
to store the state of the last .ready file unless I am missing
something and this needs to be stored in a separate shared
memory area.

With this change, we have the flexibility to move the current archiver
state out of shared memory and keep it local to archiver. I have
incorporated these changes and updated a new patch.


> > And it's really not obvious that that's correct. I think that the
> > above argument actually demonstrates a flaw in the logic, but even if
> > not, or even if it's too small a flaw to be a problem in practice, it
> > seems a lot harder to reason about.
>
> I certainly agree that it's harder to reason about.  If we were to go
> the keep-trying-the-next-file route, we could probably minimize a lot
> of the handling for these rare cases by banking on the "fallback"
> directory scans.  Provided we believe these situations are extremely
> rare, some extra delay for an archive every once in a while might be
> acceptable.

+1. We are forcing a directory scan at the checkpoint and it will make sure
that any missing file gets archived within the checkpoint boundaries.

Please find the attached patch.

Thanks,
Dipesh
From f05b223b368e40594f0ed8440c0704fb7b970ee0 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit 
Date: Wed, 8 Sep 2021 21:47:16 +0530
Subject: [PATCH] keep trying the next file approach

---
 src/backend/access/transam/xlog.c|  20 +++
 src/backend/access/transam/xlogarchive.c |  25 
 src/backend/postmaster/pgarch.c  | 234 ++-
 src/include/access/xlogdefs.h|   2 +
 src/include/postmaster/pgarch.h  |   5 +
 5 files changed, 254 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e51a7a7..7dd4b96 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9285,6 +9285,16 @@ CreateCheckPoint(int flags)
 	RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);
 
 	/*
+	 * Force the archiver to perform a directory scan.
+	 *
+	 * Ordinarily, this should not be needed, but it seems like a good idea
+	 * to make sure we scan the archive_status directory every once in a
+	 * while to make sure we haven't left anything behind.  Calling it here
+	 * ensures we do a directory scan at least once per checkpoint.
+	 */
+	PgArchForceDirScan();
+
+	/*
 	 * Make more log segments if needed.  (Do this after recycling old log
 	 * segments, since that may supply some of the needed files.)
 	 */
@@ -9650,6 +9660,16 @@ CreateRestartPoint(int flags)
 	RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr);
 
 	/*
+	 * Force the archiver to perform a directory scan.
+	 *
+	 * Ordinarily, this should not be needed, but it seems like a good idea
+	 * to make sure we scan the archive_status directory every once in a
+	 * while to make sure we haven't left anything behind.  Calling it here
+	 * ensures we do a directory scan at least once per restartpoint.
+	 */
+	PgArchForceDirScan();
+
+	/*
 	 * Make more log segments if needed.  (Do this after recycling old log
 	 * segments, since that may supply some of the needed files.)
 	 */
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 26b023e..7756b87 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -489,6 +489,31 @@ XLogArchiveNotify(const char *xlog)
 		

Re: Physical replication from x86_64 to ARM64

2021-09-14 Thread Tom Lane
Aleksander Alekseev  writes:
>> Initial experiments show no observable problems when copying PGDATA or in
>> fact using physical streaming replication between the two CPU architectures.

> That's an interesting result. The topic of physical replication
> compatibility interested me much back in 2017 and I raised this question on
> PGCon [1]. As I recall the compatibility is not guaranteed, nor tested, and
> not going to be, because the community doesn't have resources for this.

Yeah.  As far as the hardware goes, if you have the same endianness,
struct alignment rules, and floating-point format [1], then physical
replication ought to work.  Where things get far stickier is if the
operating systems aren't identical, because then you have very great
risk of text sorting rules not being the same, leading to index
corruption [2].  In modern practice that tends to be a bigger issue
than the hardware, and we don't have any good way to check for it.

regards, tom lane

[1] all of which are checked by pg_control fields, btw
[2] https://wiki.postgresql.org/wiki/Locale_data_changes




Re: Supply restore_command to pg_rewind via CLI argument

2021-09-14 Thread Andrey Borodin



> 14 сент. 2021 г., в 18:41, Daniel Gustafsson  написал(а):
> 
>>> Besides this patch looks good and is ready for committer IMV.
> 
> A variant of this patch was originally objected against by Michael, and as 
> this
> version is marked Ready for Committer I would like to hear his opinions on
> whether the new evidence changes anything.

I skimmed the thread for reasoning. --target-restore-command was rejected on 
the following grounds

> Do we actually need --target-restore-command at all? It seems to me
> that we have all we need with --restore-target-wal, and that's not
> really instinctive to pass down a command via another command..

Currently we know that --restore-target-wal is not enough if postgresql.conf 
does not reside within PGDATA.

Best regards, Andrey Borodin.



Re: Increase value of OUTER_VAR

2021-09-14 Thread Tom Lane
"Andrey V. Lepikhov"  writes:
> Also, as a test, I used two tables with 1E5 partitions each. I tried to 
> do plain SELECT, JOIN, join with plain table. No errors found, only 
> performance issues. But it is a subject for another research.

Yeah, there's no expectation that the performance would be any
good yet ;-)

regards, tom lane




Re: Supply restore_command to pg_rewind via CLI argument

2021-09-14 Thread Daniel Gustafsson
>> Besides this patch looks good and is ready for committer IMV.

A variant of this patch was originally objected against by Michael, and as this
version is marked Ready for Committer I would like to hear his opinions on
whether the new evidence changes anything.

--
Daniel Gustafsson   https://vmware.com/





Re: Getting ERROR "subplan "SubPlan 1" was not initialized" in EXISTS subplan when using for list partition.

2021-09-14 Thread Ranier Vilela
Em ter., 14 de set. de 2021 às 08:49, Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> escreveu:

> Hi,
>
> I am getting "ERROR:  subplan "SubPlan 1" was not initialized" error with
> below test case.
>
> CREATE TABLE tbl ( c1 int, c2 int, c3 int ) PARTITION BY LIST (c1);
> create table tbl_null PARTITION OF tbl FOR VALUES IN (null);
> create table tbl_def PARTITION OF tbl DEFAULT;
> insert into tbl values (8800,0,0);
> insert into tbl values (1891,1,1);
> insert into tbl values (3420,2,0);
> insert into tbl values (9850,3,0);
> insert into tbl values (7164,4,4);
> analyze tbl;
> explain (costs off) select count(*) from tbl t1 where (exists(select 1
> from tbl t2 where t2.c1 = t1.c2) or c3 < 0);
>
> postgres=# explain (costs off) select count(*) from tbl t1 where
> (exists(select 1 from tbl t2 where t2.c1 = t1.c2) or c3 < 0);
> ERROR:  subplan "SubPlan 1" was not initialized
>
Not sure if that helps, but below backtrace at Windows 64.

00 postgres!ExecInitSubPlan(struct SubPlan * subplan = 0x`021b4ed8,
struct PlanState * parent = 0x`0219ff90)+0x93
[C:\dll\postgres\postgres_head\src\backend\executor\nodeSubplan.c @ 804]
01 postgres!ExecInitExprRec(struct Expr * node = 0x`021b4ed8,
struct ExprState * state = 0x`021a0ba0, unsigned int64 * resv =
0x`021a0ba8, bool * resnull = 0x`021a0ba5)+0x1447
[C:\dll\postgres\postgres_head\src\backend\executor\execExpr.c @ 1424]
02 postgres!ExecInitExprRec(struct Expr * node = 0x`021b4ea8,
struct ExprState * state = 0x`021a0ba0, unsigned int64 * resv =
0x`021a0ba8, bool * resnull = 0x`021a0ba5)+0x1176
[C:\dll\postgres\postgres_head\src\backend\executor\execExpr.c @ 1364]
03 postgres!ExecInitQual(struct List * qual = 0x`021b5198, struct
PlanState * parent = 0x`0219ff90)+0x197
[C:\dll\postgres\postgres_head\src\backend\executor\execExpr.c @ 256]
04 postgres!ExecInitSeqScan(struct SeqScan * node = 0x`021b3dd8,
struct EState * estate = 0x`0219f2c8, int eflags = 0n17)+0x105
[C:\dll\postgres\postgres_head\src\backend\executor\nodeSeqscan.c @ 171]
05 postgres!ExecInitNode(struct Plan * node = 0x`021b3dd8, struct
EState * estate = 0x`0219f2c8, int eflags = 0n17)+0x1bb
[C:\dll\postgres\postgres_head\src\backend\executor\execProcnode.c @ 209]
06 postgres!ExecInitAppend(struct Append * node = 0x`021b3c78,
struct EState * estate = 0x`0219f2c8, int eflags = 0n17)+0x301
[C:\dll\postgres\postgres_head\src\backend\executor\nodeAppend.c @ 232]
07 postgres!ExecInitNode(struct Plan * node = 0x`021b3c78, struct
EState * estate = 0x`0219f2c8, int eflags = 0n17)+0xf8
[C:\dll\postgres\postgres_head\src\backend\executor\execProcnode.c @ 181]
08 postgres!ExecInitAgg(struct Agg * node = 0x`021b4688, struct
EState * estate = 0x`0219f2c8, int eflags = 0n17)+0x559
[C:\dll\postgres\postgres_head\src\backend\executor\nodeAgg.c @ 3383]
09 postgres!ExecInitNode(struct Plan * node = 0x`021b4688, struct
EState * estate = 0x`0219f2c8, int eflags = 0n17)+0x58a
[C:\dll\postgres\postgres_head\src\backend\executor\execProcnode.c @ 340]
0a postgres!InitPlan(struct QueryDesc * queryDesc = 0x`021b5e48,
int eflags = 0n17)+0x490
[C:\dll\postgres\postgres_head\src\backend\executor\execMain.c @ 936]
0b postgres!standard_ExecutorStart(struct QueryDesc * queryDesc =
0x`021b5e48, int eflags = 0n17)+0x242
[C:\dll\postgres\postgres_head\src\backend\executor\execMain.c @ 265]
0c postgres!ExecutorStart(struct QueryDesc * queryDesc =
0x`021b5e48, int eflags = 0n1)+0x4a
[C:\dll\postgres\postgres_head\src\backend\executor\execMain.c @ 144]
0d postgres!ExplainOnePlan(struct PlannedStmt * plannedstmt =
0x`021b5db8, struct IntoClause * into = 0x`, struct
ExplainState * es = 0x`021831f8, char * queryString =
0x`00999348 "explain (costs off) select count(*) from tbl t1 where
(exists(select 1 from tbl t2 where t2.c1 = t1.c2) or c3 < 0);", struct
ParamListInfoData * params = 0x`, struct QueryEnvironment *
queryEnv = 0x`, union _LARGE_INTEGER * planduration =
0x`007ff160 {5127}, struct BufferUsage * bufusage =
0x`)+0x197
[C:\dll\postgres\postgres_head\src\backend\commands\explain.c @ 582]
0e postgres!ExplainOneQuery(struct Query * query = 0x`0099a5d0, int
cursorOptions = 0n2048, struct IntoClause * into = 0x`,
struct ExplainState * es = 0x`021831f8, char * queryString =
0x`00999348 "explain (costs off) select count(*) from tbl t1 where
(exists(select 1 from tbl t2 where t2.c1 = t1.c2) or c3 < 0);", struct
ParamListInfoData * params = 0x`, struct QueryEnvironment *
queryEnv = 0x`)+0x210
[C:\dll\postgres\postgres_head\src\backend\commands\explain.c @ 413]
0f postgres!ExplainQuery(struct ParseState * pstate = 0x`0099de20,
struct ExplainStmt * 

Re: Small documentation improvement for ALTER SUBSCRIPTION

2021-09-14 Thread Daniel Gustafsson
> On 14 Sep 2021, at 11:57, Amit Kapila  wrote:

> LGTM as well. Peter E., Daniel, does any one of you is intending to
> push this? If not, I can take care of this.

No worries, I can pick it up.

--
Daniel Gustafsson   https://vmware.com/





Re: Physical replication from x86_64 to ARM64

2021-09-14 Thread Aleksander Alekseev
Hi Jan,

> Initial experiments show no observable problems when copying PGDATA or in
fact using physical streaming replication between the two CPU architectures.

That's an interesting result. The topic of physical replication
compatibility interested me much back in 2017 and I raised this question on
PGCon [1]. As I recall the compatibility is not guaranteed, nor tested, and
not going to be, because the community doesn't have resources for this. The
consensus was that to migrate without downtime the user has to use logical
replication. Thus what you observe should be considered a hack, and if
something will go wrong, you are on your own.

Of course, there is a possibility that something has changed in the past
four years. I'm sure somebody on the mailing list will correct me in this
case.

[1] https://wiki.postgresql.org/wiki/PgCon_2017_Developer_Meeting

-- 
Best regards,
Aleksander Alekseev


Getting ERROR "subplan "SubPlan 1" was not initialized" in EXISTS subplan when using for list partition.

2021-09-14 Thread Rajkumar Raghuwanshi
Hi,

I am getting "ERROR:  subplan "SubPlan 1" was not initialized" error with
below test case.

CREATE TABLE tbl ( c1 int, c2 int, c3 int ) PARTITION BY LIST (c1);
create table tbl_null PARTITION OF tbl FOR VALUES IN (null);
create table tbl_def PARTITION OF tbl DEFAULT;
insert into tbl values (8800,0,0);
insert into tbl values (1891,1,1);
insert into tbl values (3420,2,0);
insert into tbl values (9850,3,0);
insert into tbl values (7164,4,4);
analyze tbl;
explain (costs off) select count(*) from tbl t1 where (exists(select 1 from
tbl t2 where t2.c1 = t1.c2) or c3 < 0);

postgres=# explain (costs off) select count(*) from tbl t1 where
(exists(select 1 from tbl t2 where t2.c1 = t1.c2) or c3 < 0);
ERROR:  subplan "SubPlan 1" was not initialized

Thanks & Regards,
Rajkumar Raghuwanshi


Re: Increase value of OUTER_VAR

2021-09-14 Thread Aleksander Alekseev
Hi Andrey,

> only performance issues

That's interesting. Any chance you could share the hardware
description, the configuration file, and steps to reproduce with us?

-- 
Best regards,
Aleksander Alekseev




Physical replication from x86_64 to ARM64

2021-09-14 Thread Jan Mußler
Fellow Postgres Admins and Developers,

With the arrival of ARM compute nodes on AWS and an existing fleet of
Postgres clusters running on x86_64 nodes the question arises how to
migrate existing Postgres clusters to ARM64 nodes, ideally with zero
downtime, as one is used to.

Initial experiments show no observable problems when copying PGDATA or in
fact using physical streaming replication between the two CPU
architectures. In our case Postgres is using Docker based on Ubuntu 18.04
base images and PGDG packages for Postgres 13. On top of that, we checked
existing indexes with the amcheck
 extension, which did
not reveal any issues.

However experiments are not valid to exclude all corner cases, thus we are
curious to hear other input on that matter, as we believe this is of
relevance to a bigger audience and ARM is not unlikely to be available on
other non AWS platforms going forward.

It is our understanding that AWS RDS in fact for Postgres 12 and Postgres
13 allows the change from x86 nodes to ARM nodes on the fly, which gives us
some indication that if done right, both platforms are indeed compatible.

Looking forward to your input and discussion points!


-- 
Jan Mußler
Engineering Manager - Team Acid & Team Aruha | Zalando SE


Re: resowner module README needs update?

2021-09-14 Thread Michael Paquier
On Tue, Sep 14, 2021 at 12:18:21PM +0900, Amit Langote wrote:
> Patch updated.  Given the new text, I thought it might be better to
> move the paragraph right next to the description of the ResourceOwner
> API at the beginning of the section, because the context seems clearer
> that way.  Thoughts?

No objections here.
--
Michael


signature.asc
Description: PGP signature


Re: PG Docs for ALTER SUBSCRIPTION REFRESH PUBLICATION - copy_data option

2021-09-14 Thread Amit Kapila
On Fri, Jun 25, 2021 at 9:20 AM Peter Smith  wrote:
>
> But I recently learned that when there are partitions in the
> publication, then toggling the value of the PUBLICATION option
> "publish_via_partition_root" [3] can also *implicitly* change the list
> published tables, and therefore that too might cause any ASRP to make
> use of the copy_data value for those implicitly added
> partitions/tables.
>

I have tried the below example in this context but didn't see any
effect on changing via_root option.

Set up on both publisher and subscriber:
=
CREATE TABLE tab2 (a int PRIMARY KEY, b text) PARTITION BY LIST (a);
CREATE TABLE tab2_1 (b text, a int NOT NULL);
ALTER TABLE tab2 ATTACH PARTITION tab2_1 FOR VALUES IN (0, 1, 2, 3);
CREATE TABLE tab2_2 PARTITION OF tab2 FOR VALUES IN (5, 6);

Publisher:
==
CREATE PUBLICATION pub_viaroot FOR TABLE tab2_2;
postgres=# INSERT INTO tab2 VALUES (1), (0), (3), (5);
INSERT 0 4
postgres=# select * from tab2_1;
 b | a
---+---
   | 1
   | 0
   | 3
(3 rows)
postgres=# select * from tab2_2;
 a | b
---+---
 5 |
(1 row)


Subscriber:
==
CREATE SUBSCRIPTION sub_viaroot CONNECTION 'host=localhost port=5432
dbname=postgres' PUBLICATION pub_viaroot;
postgres=# select * from tab2_2;
 a | b
---+---
 5 |
(1 row)
postgres=# select * from tab2_1;
 b | a
---+---
(0 rows)

So, by this step, we can see the partition which is not subscribed is
not copied. Now, let's toggle via_root option.
Publisher
=
Alter Publication pub_viaroot Set (publish_via_partition_root = true);

Subscriber
==
postgres=# Alter Subscription sub_viaroot Refresh Publication;
ALTER SUBSCRIPTION
postgres=# select * from tab2_2;
 a | b
---+---
 5 |
(1 row)
postgres=# select * from tab2_1;
 b | a
---+---
(0 rows)

As per your explanation, one can expect the data in tab2_1 in the last
step. Can you explain with example?

-- 
With Regards,
Amit Kapila.




Re: Small documentation improvement for ALTER SUBSCRIPTION

2021-09-14 Thread Amit Kapila
On Wed, Sep 8, 2021 at 5:11 PM Masahiko Sawada  wrote:
>
> On Tue, Sep 7, 2021 at 9:01 PM Daniel Gustafsson  wrote:
> >
> > > On 7 Sep 2021, at 13:36, Peter Eisentraut 
> > >  wrote:
> > >
> > > On 12.08.21 04:52, Masahiko Sawada wrote:
> > >> On Wed, Aug 11, 2021 at 5:42 PM Daniel Gustafsson  
> > >> wrote:
> > >>>
> >  On 11 Aug 2021, at 09:57, Masahiko Sawada  
> >  wrote:
> > >>>
> >  Additionally, refresh options as described in
> >  refresh_option of
> >  REFRESH PUBLICATION may be specified,
> >  except in the case of DROP PUBLICATION.
> > >>>
> > >>> Since this paragraph is under the literal option “refresh”, which takes 
> > >>> a
> > >>> value, I still find your original patch to be the clearest.
> > >> Yeah, I prefer my original patch over this idea. On the other hand, I
> > >> can see the point of review comment on it that Amit pointed out[1].
> > >
> > > How about this:
> > >
> > > -  Additionally, refresh options as described
> > > -  under REFRESH PUBLICATION may be specified.
> > > +  Additionally, the options described under REFRESH
> > > +  PUBLICATION may be specified, to control the implicit 
> > > refresh
> > > +  operation.
> >
> > LGTM.
>
> +1
>
> Attached the patch.
>

LGTM as well. Peter E., Daniel, does any one of you is intending to
push this? If not, I can take care of this.

-- 
With Regards,
Amit Kapila.




Re: [BUG] Failed Assertion in ReorderBufferChangeMemoryUpdate()

2021-09-14 Thread Amit Kapila
On Mon, Sep 13, 2021 at 10:42 PM Tom Lane  wrote:
>
> Alvaro Herrera  writes:
> > Am I the only that that thinks this code is doing far too much in a
> > PG_CATCH block?
>
> You mean the one in ReorderBufferProcessTXN?  Yeah, that is mighty
> ugly.  It might be all right given that it almost immediately does
> AbortCurrentTransaction, since that should basically clean up
> whatever is wrong.  But I'm still full of questions after a brief
> look at it.
>
> * What is the sense in calling RollbackAndReleaseCurrentSubTransaction
> after having done AbortCurrentTransaction?
>

It is required for the cases where we are starting internal
sub-transaction (when decoding via SQL SRF). The first
AbortCurrentTransaction will make the current subtransaction state to
TBLOCK_SUBABORT and then RollbackAndReleaseCurrentSubTransaction will
pop and release the current subtransaction. Note that the same
sequence is performed outside catch and if we comment out
RollbackAndReleaseCurrentSubTransaction, there are a lot of failures
in test_decoding. I have manually debugged and checked that if we
don't call RollbackAndReleaseCurrentSubTransaction in the catch block,
it will retain the transaction in a bad state which on the next
operation leads to a FATAL error.

> * Is it really sane that we re-throw the error in some cases and
> not others?  What is likely to catch that thrown error, and is it
> going to be prepared for us having already aborted the transaction?
>

There are two different ways which deal with the re-thrown error. For
WALSender processes, we don't start internal subtransaction and on
error, they simply exist. For SQL SRFs, we start internal transactions
which will be released in the catch block, and then the top
transaction will be aborted after we re-throw the error.

The cases where we don't immediately rethrow are specifically for
streaming transactions where we might have already sent some changes
to the subscriber and we want to continue processing and send the
abort to subscribers.

-- 
With Regards,
Amit Kapila.




Re: Added schema level support for publication.

2021-09-14 Thread vignesh C
On Tue, Sep 14, 2021 at 6:31 AM houzj.f...@fujitsu.com
 wrote:
>
> From Sun, Sept 12, 2021 11:13 PM vignesh C  wrote:
> > On Fri, Sep 10, 2021 at 11:21 AM Hou Zhijie  wrote:
> > > Attach the without-flag version and add comments about the pubobj_name.
> >
> > Thanks for the changes, the suggested changes make the parsing code simpler.
> > I have merged the changes to the main patch. Attached v27 patch has the
> > changes for the same.
>
> Hi,
>
> I have some suggestions for the documents and comments of the new syntax.
>
> IMO, the document could be clearer in the following format.
> --
> Synopsis
> CREATE PUBLICATION name
> [ FOR ALL TABLES
>   | FOR publication object [, ... ] ]
> [ WITH ( publication_parameter [= value] [, ... ] ) ]
>
> where publication object is one of:
>
> TABLE [ ONLY ] table_name [ * ] [, ... ]
> ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]
> --
>
> Attach a diff(based on v27-*) which change the doc and comments like the
> following.

Thanks for the comments and the changes, I have made a few changes and
merged it into the v28 patch attached at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm0OudeDeFN7bSWPro0hgKx%3D1zPgcNFWnvU_G6w3mDPX0Q%40mail.gmail.com

Regards,
Vignesh




Re: Added schema level support for publication.

2021-09-14 Thread vignesh C
On Mon, Sep 13, 2021 at 7:06 PM tanghy.f...@fujitsu.com
 wrote:
>
> On Sunday, September 12, 2021 11:13 PM vignesh C  wrote:
> >
> > Thanks for the changes, the suggested changes make the parsing code
> > simpler. I have merged the changes to the main patch. Attached v27
> > patch has the changes for the same.
> >
>
> Thanks for your new patch. I had a look at the changes related to document 
> and tried
> the patch. Here are several comments.
>
> 1.
>
> You must own the publication to use ALTER PUBLICATION.
> Adding a table to a publication additionally requires owning that table.
> +   The ADD SCHEMA and SET SCHEMA to a
> +   publication requires the invoking user to be a superuser.
> To alter the owner, you must also be a direct or indirect member of the 
> new
> owning role. The new owner must have CREATE privilege 
> on
> the database.  Also, the new owner of a FOR ALL TABLES
> publication must be a superuser.  However, a superuser can change the
> ownership of a publication regardless of these restrictions.
>
>
> ADD SCHEMA
> ->
> ADD ALL TABLES IN SCHEMA
>
> SET SCHEMA
> ->
> SET ALL TABLES IN SCHEMA

Modified

> 2.
> +  
> +   ADD some tables and schemas to the publication:
> +
> +ALTER PUBLICATION production_publication ADD TABLE users, departments, ALL 
> TABLES IN SCHEMA production;
> +
> +  
>
> ADD some tables and schemas to the publication:
> ->
> Add some tables and schemas to the publication:

Modified

> 3.
> +  
> +   Drop some schema from the publication:
> +
> +ALTER PUBLICATION production_quarterly_publication DROP ALL TABLES IN SCHEMA 
> production_july;
> +
> +  
>
> Drop some schema from the publication:
> ->
> Drop some schemas from the publication:

Modified

> 4.
> +   The catalog pg_publication_namespace contains the
> +   mapping between schemas and publications in the database.  This is a
> +   many-to-many mapping.
>
> There are two Spaces at the end of the paragraph.

I felt this is ok, as both single space and double space are used at
various places.

> 5.
> Adding a table and the schema where the table belonged to is not supported. 
> But
> it didn't report error message when I try to add them in the same statement by
> using 'ALTER PUBLICATION'.
>
> For example:
> postgres=# create publication pub;
> CREATE PUBLICATION
> postgres=# alter publication pub add  all tables in schema s1, table s1.tbl;
> ALTER PUBLICATION
> postgres=# \dRp+
>   Publication pub
>   Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root
> --++-+-+-+---+--
>  postgres | f  | t   | t   | t   | t | f
> Tables:
> "s1.tbl"
> Tables from schemas:
> "s1"
>
> It didn't check if table 's1.tbl' is member of schema 's1'.

Modified.

> 6.
> I think if I use 'ALTER PUBLICATION ... SET', both the list of tables and the
> list of all tables in schemas should be reset. The publication should only
> contain the tables and all tables in schemas which user specified. If user 
> only
> specified all tables in schema, and didn't specify tables, the tables which 
> used
> to be part of the publication should be dropped, too. But currently, if I 
> didn't
> specify tables, the list of tables wouldn't be set to empty. Thoughts?
>
> For example:
> postgres=# create publication pub for table s1.tbl;
> CREATE PUBLICATION
> postgres=# alter publication pub set all tables in schema s2;
> ALTER PUBLICATION
> postgres=# \dRp+
>   Publication pub
>   Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root
> --++-+-+-+---+--
>  postgres | f  | t   | t   | t   | t | f
> Tables:
> "s1.tbl"
> Tables from schemas:
> "s2"

Modified

I have fixed the comments in the v28 patch attached at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm0OudeDeFN7bSWPro0hgKx%3D1zPgcNFWnvU_G6w3mDPX0Q%40mail.gmail.com

Regards,
Vignesh