Re: O(n^2) system calls in RemoveOldXlogFiles()

2021-01-11 Thread Michael Paquier
On Tue, Jan 12, 2021 at 07:15:21PM +1300, Thomas Munro wrote:
> Hah, I even knew that, apparently, but forgot.  Adding Michael who
> wrote a patch.  It'd be nice to fix this, at least in 14.

Yeah, this rings a bell.  I never went back to it even if the thing
looks rather clean at quick glance (not tested), but I may be able
to spend some cycles on that.  I don't think that's critical enough
for a backpatch, so doing something only on HEAD is fine by me.
What's your take?
--
Michael


signature.asc
Description: PGP signature


RE: Disable WAL logging to speed up data loading

2021-01-11 Thread tsunakawa.ta...@fujitsu.com
From: Osumi, Takamichi/大墨 昂道 
> Oops, sorry for this careless mistake.
> Fixed. And again, regression test produces no failure.

Now correct.  (Remains ready for committer.)


Regards
Takayuki Tsunakawa



Re: Fix a typo in SearchCatCache function comment

2021-01-11 Thread Michael Paquier
On Tue, Jan 12, 2021 at 12:17:20PM +0530, Bharath Rupireddy wrote:
> While reviewing the patch in [1], I found that SearchCatCache function
> uses SearchCatCacheInternal as it's function name in the comment. It
> looks like a typo, attaching a very small patch to correct it.

Good catch.  I'll go fix it tomorrow if nobody objects.
--
Michael


signature.asc
Description: PGP signature


Re: Reduce the number of special cases to build contrib modules on windows

2021-01-11 Thread Michael Paquier
On Wed, Dec 30, 2020 at 10:07:29AM +1300, David Rowley wrote:
> -#ifdef LOWER_NODE
> +/*
> + * Below we ignore the fact that LOWER_NODE is defined when compiling with
> + * MSVC.  The reason for this is that earlier versions of the MSVC build
> + * scripts failed to define LOWER_NODE.  More recent version of the MSVC
> + * build scripts parse makefiles which results in LOWER_NODE now being
> + * defined.  We check for _MSC_VER here so as not to break pg_upgrade when
> + * upgrading from versions MSVC versions where LOWER_NODE was not defined.
> + */
> +#if defined(LOWER_NODE) && !defined(_MSC_VER)
>  #include 
>  #define TOLOWER(x)   tolower((unsigned char) (x))
>  #else

While on it, do you think that it would be more readable if we remove
completely LOWER_NODE and use only a check based on _MSC_VER for those
two files in ltree?  This could also be handled as a separate change.

> + foreach my $line (split /\n/, $mf)
> + {
> + if ($line =~ /^[A-Za-z0-9_]*\.o:\s(.*)/)
> + {
> + foreach my $file (split /\s+/, $1)
> + {
> + foreach my $proj (@projects)
> + {
> + 
> $proj->AddFileConditional("$subdir/$n/$file");
> + }
> + }
> + }
> + }
Looking closer at this change, I don't think that this is completely
correct and that could become a trap.  This is adding quite a bit of
complexity to take care of contrib_extrasource getting empty, and it
actually overlaps with the handling of OBJS done in AddDir(), no?
--
Michael


signature.asc
Description: PGP signature


RE: Disable WAL logging to speed up data loading

2021-01-11 Thread osumi.takami...@fujitsu.com
On Tuesday, January 12, 2021 12:52 PM Takayuki/綱川 貴之 
 wrote:
> From: Osumi, Takamichi/大墨 昂道 
> > I updated the patch following this discussion, and fixed the
> > documentation as well.
> 
> 
> +   (rmid == RM_GIST_ID && info == RM_GIST_ID) ||
> +   (rmid == RM_GENERIC_ID)))
> 
> info is wrong for GiST, and one pair of parenthesis is unnecessary.  The above
> would be:
> 
> +   (rmid == RM_GIST_ID && info ==
> XLOG_GIST_ASSIGN_LSN) ||
> +   rmid == RM_GENERIC_ID))
Oops, sorry for this careless mistake.
Fixed. And again, regression test produces no failure.

Best Regards,
Takamichi Osumi


disable_WAL_logging_v07.patch
Description: disable_WAL_logging_v07.patch


Fix a typo in SearchCatCache function comment

2021-01-11 Thread Bharath Rupireddy
Hi,

While reviewing the patch in [1], I found that SearchCatCache function
uses SearchCatCacheInternal as it's function name in the comment. It
looks like a typo, attaching a very small patch to correct it.

Thoughts?

[1] - 
https://www.postgresql.org/message-id/CAD21AoAwxbd-zXXUAeJ2FBRHr%2B%3DbfMUHoN7xJuXcwu1sFi1-sQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v1-0001-Use-correct-function-name-in-SearchCatCache-comme.patch
Description: Binary data


RE: Add Nullif case for eval_const_expressions_mutator

2021-01-11 Thread Hou, Zhijie
> > I think this patch should be about a tenth the size.  Try modeling it
> > on the T_SubscriptingRef-etc case, ie, use ece_generic_processing and
> > then ece_evaluate_expr to cover the generic cases.  OpExpr is common
> > enough to deserve specially optimized code, but NullIf isn't, so shorter
> is better.
> 
> Thanks for the review.
> 
> Attaching v2 patch , which followed the suggestion to use
> ece_generic_processing and ece_evaluate_expr to simplify the code.
> 
> Please have a check.

Sorry, I found the code still be simplified better.
Attaching the new patch.

Best regards,
houzj





v2_1-0001-add-nullif-case-for-eval_const_expressions.patch
Description: v2_1-0001-add-nullif-case-for-eval_const_expressions.patch


Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-11 Thread Amit Kapila
On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy
 wrote:
>
> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila  wrote:
> >
> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
> >  wrote:
> > >
> > > Hi,
> > >
> > > While providing thoughts on the design in [1], I found a strange
> > > behaviour with the $subject. The use case is shown below as a sequence
> > > of steps that need to be run on publisher and subscriber to arrive at
> > > the strange behaviour.  In step 5, the table is dropped from the
> > > publication and in step 6, the refresh publication is run on the
> > > subscriber, from here onwards, the expectation is that no further
> > > inserts into the publisher table have to be replicated on to the
> > > subscriber, but the opposite happens i.e. the inserts are still
> > > replicated to the subscriber. ISTM as a bug. Let me know if I'm
> > > missing anything.
> > >
> >
> > Did you try to investigate what's going on? Can you please check what
> > is the behavior if, after step-5, you restart the subscriber and
> > separately try creating a new subscription (maybe on a different
> > server) for that publication after step-5 and see if that allows the
> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> > remove such dropped rels and stop their corresponding apply workers
> > which should stop the further replication of such relations but that
> > doesn't seem to be happening in your case.
>
> Here's my analysis:
> 1) in the publisher, alter publication drop table successfully
> removes(PublicationDropTables) the table from the catalogue
> pg_publication_rel
> 2) in the subscriber, alter subscription refresh publication
> successfully removes the table from the catalogue pg_subscription_rel
> (AlterSubscription_refresh->RemoveSubscriptionRel)
> so far so good
>

Here, it should register the worker to stop on commit, and then on
commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
Once the apply worker is stopped, the corresponding WALSender will
also be stopped. Something here is not happening as per expected
behavior.

> 3) after the insertion into the table in the publisher(remember that
> it's dropped from the publication in (1)), the walsender process is
> unable detect that the table has been dropped from the publication
> i.e. it doesn't look at the pg_publication_rel catalogue or some
> other, but it only does is_publishable_relation() check which returns
> true in pgoutput_change(). Maybe the walsender should look at the
> catalogue pg_publication_rel in is_publishable_relation()?
>

We must be somewhere checking pg_publication_rel before sending the
decoded change because otherwise, we would have sent the changes for
the table which are not even part of this publication. I think you can
try to create a separate table that is not part of the publication
under test and see how the changes for that are filtered.

-- 
With Regards,
Amit Kapila.




Re: O(n^2) system calls in RemoveOldXlogFiles()

2021-01-11 Thread Thomas Munro
On Tue, Jan 12, 2021 at 6:55 PM Andres Freund  wrote:
> I found this before as well: 
> https://postgr.es/m/CAB7nPqTB3VcKSSrW2Qj59tYYR2H4+n=5pzbdwou+x9iqvnm...@mail.gmail.com

Hah, I even knew that, apparently, but forgot.  Adding Michael who
wrote a patch.  It'd be nice to fix this, at least in 14.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-11 Thread Bharath Rupireddy
On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila  wrote:
>
> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
>  wrote:
> >
> > Hi,
> >
> > While providing thoughts on the design in [1], I found a strange
> > behaviour with the $subject. The use case is shown below as a sequence
> > of steps that need to be run on publisher and subscriber to arrive at
> > the strange behaviour.  In step 5, the table is dropped from the
> > publication and in step 6, the refresh publication is run on the
> > subscriber, from here onwards, the expectation is that no further
> > inserts into the publisher table have to be replicated on to the
> > subscriber, but the opposite happens i.e. the inserts are still
> > replicated to the subscriber. ISTM as a bug. Let me know if I'm
> > missing anything.
> >
>
> Did you try to investigate what's going on? Can you please check what
> is the behavior if, after step-5, you restart the subscriber and
> separately try creating a new subscription (maybe on a different
> server) for that publication after step-5 and see if that allows the
> relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> remove such dropped rels and stop their corresponding apply workers
> which should stop the further replication of such relations but that
> doesn't seem to be happening in your case.

Here's my analysis:
1) in the publisher, alter publication drop table successfully
removes(PublicationDropTables) the table from the catalogue
pg_publication_rel
2) in the subscriber, alter subscription refresh publication
successfully removes the table from the catalogue pg_subscription_rel
(AlterSubscription_refresh->RemoveSubscriptionRel)
so far so good
3) after the insertion into the table in the publisher(remember that
it's dropped from the publication in (1)), the walsender process is
unable detect that the table has been dropped from the publication
i.e. it doesn't look at the pg_publication_rel catalogue or some
other, but it only does is_publishable_relation() check which returns
true in pgoutput_change(). Maybe the walsender should look at the
catalogue pg_publication_rel in is_publishable_relation()?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: O(n^2) system calls in RemoveOldXlogFiles()

2021-01-11 Thread Andres Freund
Hi,

On 2021-01-11 16:35:56 +1300, Thomas Munro wrote:
> I noticed that RemoveXlogFile() has this code:
> 
> /*
>  * Before deleting the file, see if it can be recycled as a future log
>  * segment. Only recycle normal files, pg_standby for example can 
> create
>  * symbolic links pointing to a separate archive directory.
>  */
> if (wal_recycle &&
> endlogSegNo <= recycleSegNo &&
> lstat(path, ) == 0 && S_ISREG(statbuf.st_mode) &&
> InstallXLogFileSegment(, path,
>true,
> recycleSegNo, true))
> {
> ereport(DEBUG2,
> (errmsg("recycled write-ahead log file 
> \"%s\"",
> segname)));
> CheckpointStats.ckpt_segs_recycled++;
> /* Needn't recheck that slot on future iterations */
> endlogSegNo++;
> }
> 
> I didn't check the migration history of this code but it seems that
> endlogSegNo doesn't currently have the right scoping to achieve the
> goal of that last comment, so checkpoints finish up repeatedly search
> for the next free slot, starting at the low end each time, like so:
> 
> stat("pg_wal/0001004F", {st_mode=S_IFREG|0600,
> st_size=16777216, ...}) = 0
> ...
> stat("pg_wal/00010073", 0x7fff98b9e060) = -1 ENOENT
> (No such file or directory)
> 
> stat("pg_wal/0001004F", {st_mode=S_IFREG|0600,
> st_size=16777216, ...}) = 0
> ...
> stat("pg_wal/00010074", 0x7fff98b9e060) = -1 ENOENT
> (No such file or directory)
> 
> ... and so on until we've recycled all our recyclable segments.  Ouch.

I found this before as well: 
https://postgr.es/m/CAB7nPqTB3VcKSSrW2Qj59tYYR2H4+n=5pzbdwou+x9iqvnm...@mail.gmail.com

I did put a hastily rebased version of that commit in my aio branch
during development: 
https://github.com/anarazel/postgres/commit/b3cc8adacf7860add8cc62ec373ac955d9d12992

Greetings,

Andres Freund




"has_column_privilege()" issue with attnums and non-existent columns

2021-01-11 Thread Ian Lawrence Barwick
Greetings

Consider a table set up as follows:

CREATE TABLE foo (id INT, val TEXT);
ALTER TABLE foo DROP COLUMN val;

When specifying the name of a dropped or non-existent column, the function
"has_column_privilege()" works as expected:

postgres=# SELECT has_column_privilege('foo', 'val', 'SELECT');
ERROR:  column "val" of relation "foo" does not exist

postgres=# SELECT has_column_privilege('foo', 'bar', 'SELECT');
ERROR:  column "bar" of relation "foo" does not exist

However when specifying a dropped or non-existent attnum, it returns
"TRUE", which is unexpected:

postgres=# SELECT has_column_privilege('foo', 2::int2, 'SELECT');
 has_column_privilege
--
 t
(1 row)

postgres=# SELECT has_column_privilege('foo', 999::int2, 'SELECT');
 has_column_privilege
--
 t
(1 row)

The comment on the relevant code section in "src/backend/utils/adt/acl.c"
(related to "column_privilege_check()") indicate that NULL is the intended
return value in these cases:

 Likewise, the variants that take an integer attnum
 return NULL (rather than throwing an error) if there is no such
 pg_attribute entry.  All variants return NULL if an attisdropped
 column is selected.

The unexpected "TRUE" value is a result of "column_privilege_check()" returning
TRUE if the user has table-level privileges.  This returns a valid result with
the function variants where the column name is specified, as the calling
function will have already performed a check of the column through its call to
"convert_column_name()".  However when the attnum is specified, the status of
the column never gets checked.

Attached patch resolves this by having "column_privilege_check()" callers
indicate whether they have checked the column. If this is the case, and if the
user as table-level privileges, "column_privilege_check()" can return as before
with no further lookups needed.

If the user has table-level privileges but the column was not checked (i.e
attnum provided), the column lookup is performed.

The regression tests did contain checks for dropped/non-existent attnums,
however one group was acting on a table where the user had no table-level
privilege, giving a fall-through to the column-level check; the other group
contained a check which was just plain wrong.

This issue appears to have been present since "has_column_privilege()" was
introduced in PostreSQL 8.4 (commit 7449427a1e6a099bc7e76164cb99a01d5e87237b),
so falls into the "obscure, extreme corner-case" category, OTOH seems worth
backpatching to supported versions.

The second patch adds a bunch of missing static prototypes to "acl.c",
on general
principles.

I'll add this to the next commitfest.


Regards

Ian Barwick


-- 
EnterpriseDB: https://www.enterprisedb.com
commit 7e22f2d245888c41b66ae214c226b4a101f27a89
Author: Ian Barwick 
Date:   Tue Jan 12 13:40:31 2021 +0900

Fix has_column_privilege() with attnums and non-existent columns

The existence of a column should be confirmed even if the user has
the table-level privilege, otherwise the function will happily report
the user has privilege on a dropped or non-existent column if an
invalid attnum is provided.

diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index c7f029e218..be5649b7ac 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -2447,7 +2447,7 @@ has_any_column_privilege_id_id(PG_FUNCTION_ARGS)
  */
 static int
 column_privilege_check(Oid tableoid, AttrNumber attnum,
-	   Oid roleid, AclMode mode)
+	   Oid roleid, AclMode mode, bool column_checked)
 {
 	AclResult	aclresult;
 	HeapTuple	attTuple;
@@ -2472,7 +2472,11 @@ column_privilege_check(Oid tableoid, AttrNumber attnum,
 
 	aclresult = pg_class_aclcheck(tableoid, roleid, mode);
 
-	if (aclresult == ACLCHECK_OK)
+	/*
+	 * Table-level privilege is present and the column has been checked by the
+	 * caller.
+	 */
+	if (aclresult == ACLCHECK_OK && column_checked == true)
 		return true;
 
 	/*
@@ -2493,6 +2497,16 @@ column_privilege_check(Oid tableoid, AttrNumber attnum,
 	}
 	ReleaseSysCache(attTuple);
 
+	/*
+	 * If the table level privilege exists, and the existence of the column has
+	 * been confirmed, we can skip the per-column check.
+	 */
+	if (aclresult == ACLCHECK_OK)
+		return true;
+
+	/*
+	 * No table privilege, so try per-column privileges.
+	 */
 	aclresult = pg_attribute_aclcheck(tableoid, attnum, roleid, mode);
 
 	return (aclresult == ACLCHECK_OK);
@@ -2521,7 +2535,7 @@ has_column_privilege_name_name_name(PG_FUNCTION_ARGS)
 	colattnum = convert_column_name(tableoid, column);
 	mode = convert_column_priv_string(priv_type_text);
 
-	privresult = column_privilege_check(tableoid, colattnum, roleid, mode);
+	privresult = column_privilege_check(tableoid, colattnum, roleid, mode, true);
 	if (privresult < 0)
 		PG_RETURN_NULL();
 	PG_RETURN_BOOL(privresult);
@@ -2548,7 +2562,8 

Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-11 Thread japin


On Tue, 12 Jan 2021 at 13:39, Amit Kapila wrote:
> On Tue, Jan 12, 2021 at 9:58 AM japin  wrote:
>>
>>
>> On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote:
>> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
>> >  wrote:
>> >>
>> >> Hi,
>> >>
>> >> While providing thoughts on the design in [1], I found a strange
>> >> behaviour with the $subject. The use case is shown below as a sequence
>> >> of steps that need to be run on publisher and subscriber to arrive at
>> >> the strange behaviour.  In step 5, the table is dropped from the
>> >> publication and in step 6, the refresh publication is run on the
>> >> subscriber, from here onwards, the expectation is that no further
>> >> inserts into the publisher table have to be replicated on to the
>> >> subscriber, but the opposite happens i.e. the inserts are still
>> >> replicated to the subscriber. ISTM as a bug. Let me know if I'm
>> >> missing anything.
>> >>
>> >
>> > Did you try to investigate what's going on? Can you please check what
>> > is the behavior if, after step-5, you restart the subscriber and
>> > separately try creating a new subscription (maybe on a different
>> > server) for that publication after step-5 and see if that allows the
>> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we
>> > remove such dropped rels and stop their corresponding apply workers
>> > which should stop the further replication of such relations but that
>> > doesn't seem to be happening in your case.
>>
>> If we restart the subscriber after step-5, it will not replicate the records.
>>
>> As I said in [1], if we don't insert a new data in step-5, it will not
>> replicate the records.
>>
>
> Hmm, but in Bharath's test, it is replicating the Inserts in step-7
> and step-9 as well. Are you seeing something different?
>

Yes, however if we don't Inserts in step-5, the Inserts in step-7 and
step-9 will not replicate.


-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: libpq compression

2021-01-11 Thread Konstantin Knizhnik




On 11.01.2021 20:38, Tomas Vondra wrote:

On 1/11/21 2:53 PM, Konstantin Knizhnik wrote:

...

New version of libpq compression patch is attached.
It can be also be found at g...@github.com:postgrespro/libpq_compression.git


Seems it bit-rotted already, so here's a slightly fixed version.

1) Fixes the MSVC makefile. The list of files is sorted alphabetically,
so I've added the file at the end.

2) Fixes duplicate OID. It's a good practice to assign OIDs from the end
of the range, to prevent collisions during development.


Thank you


Other than that, I wonder what's the easiest way to run all tests with
compression enabled. ISTM it'd be nice to add pg_regress option forcing
a particular compression algorithm to be used, or something similar. I'd
like a convenient way to pass this through a valgrind, for example. Or
how do we get this tested on a buildfarm?
I run regression tests with PG_COMPRESSION environment variable set to 
"true".
Do we need some other way (like pg_regress options to run tests with 
compression enabled?



I'm not convinced it's very user-friendly to not have a psql option
enabling compression. It's true it can be enabled in a connection
string, but I doubt many people will notice that.

The sgml docs need a bit more love / formatting. The lines in libpq.sgml
are far too long, and there are no tags whatsoever. Presumably zlib/zstd
should be marked as , and so on.


regards


Thank you, I will fix it.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-11 Thread Amit Kapila
On Tue, Jan 12, 2021 at 9:58 AM japin  wrote:
>
>
> On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote:
> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
> >  wrote:
> >>
> >> Hi,
> >>
> >> While providing thoughts on the design in [1], I found a strange
> >> behaviour with the $subject. The use case is shown below as a sequence
> >> of steps that need to be run on publisher and subscriber to arrive at
> >> the strange behaviour.  In step 5, the table is dropped from the
> >> publication and in step 6, the refresh publication is run on the
> >> subscriber, from here onwards, the expectation is that no further
> >> inserts into the publisher table have to be replicated on to the
> >> subscriber, but the opposite happens i.e. the inserts are still
> >> replicated to the subscriber. ISTM as a bug. Let me know if I'm
> >> missing anything.
> >>
> >
> > Did you try to investigate what's going on? Can you please check what
> > is the behavior if, after step-5, you restart the subscriber and
> > separately try creating a new subscription (maybe on a different
> > server) for that publication after step-5 and see if that allows the
> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> > remove such dropped rels and stop their corresponding apply workers
> > which should stop the further replication of such relations but that
> > doesn't seem to be happening in your case.
>
> If we restart the subscriber after step-5, it will not replicate the records.
>
> As I said in [1], if we don't insert a new data in step-5, it will not
> replicate the records.
>

Hmm, but in Bharath's test, it is replicating the Inserts in step-7
and step-9 as well. Are you seeing something different?

> In both cases, the AlterSubscription_refresh() call RemoveSubscriptionRel()
> and logicalrep_worker_stop_at_commit().  However, if we insert a data in
> step-5, it doesn't work as expected.  Any thoughts?
>

I think the data inserted in step-5 might be visible because we have
stopped the apply process after that but it is not clear why the data
inserted in steps 7 and 9 is getting replicated. I think due to some
reason apply worker is not getting stopped even after Refresh
Publication statement is finished due to which the data is being
replicated after that as well and after restart the apply worker won't
be restarted so the data replication doesn't happen.

-- 
With Regards,
Amit Kapila.




Re: Deleting older versions in unique indexes to avoid page splits

2021-01-11 Thread Peter Geoghegan
On Sun, Jan 10, 2021 at 4:06 PM Peter Geoghegan  wrote:
> Attached is v13, which has this tweak, and other miscellaneous cleanup
> based on review from both Victor and Heikki. I consider this version
> of the patch to be committable. I intend to commit something close to
> it in the next week, probably no later than Thursday. I still haven't
> got to the bottom of the shellsort question raised by Heikki. I intend
> to do further performance validation before committing the patch.

I benchmarked the patch with one array and without the shellsort
specialization using two patches on top of v13, both of which are
attached.

This benchmark was similar to other low cardinality index benchmarks
I've run in the past (with indexes named fiver, tenner, score, plus a
pgbench_accounts INCLUDE unique index instead of the regular primary
key). I used pgbench scale 500, for 30 minutes, no rate limit. One run
with 16 clients, another with 32 clients.

Original v13:

patch.r1c32.bench.out: "tps = 32709.772257 (including connections
establishing)" "latency average = 0.974 ms" "latency stddev = 0.514
ms"
patch.r1c16.bench.out: "tps = 34670.929998 (including connections
establishing)" "latency average = 0.461 ms" "latency stddev = 0.314
ms"

v13 + attached simplifying patches:

patch.r1c32.bench.out: "tps = 31848.632770 (including connections
establishing)" "latency average = 1.000 ms" "latency stddev = 0.548
ms"
patch.r1c16.bench.out: "tps = 33864.099333 (including connections
establishing)" "latency average = 0.472 ms" "latency stddev = 0.399
ms"

Clearly the optimization work still has some value, since we're
looking at about a 2% - 3% increase in throughput here. This seems
like it makes the cost of retaining the optimizations acceptable.

The difference is much less visible with a rate-limit, which is rather
more realistic. To some extent the sort is hot here because the
patched version of Postgres updates tuples as fast as it can, and must
therefore delete from the index as fast as it can. The sort itself was
consistently near the top consumer of CPU cycles according to "perf
top", even if it didn't get as bad as what I saw in earlier versions
of the patch months ago.

There are actually two sorts involved here (not just the heapam.c
shellsort). We need to sort the deltids array twice -- once in
heapam.c, and a second time (to restore the original leaf-page-wise
order) in nbtpage.c, using qsort(). I'm pretty sure that the latter
sort also matters, though it matters less than the heapam.c shellsort.

I'm going to proceed with committing the original version of the patch
-- I feel that this settles it. The code would be a bit tidier without
two arrays or the shellsort, but it really doesn't make that much
difference. Whereas the benefit is quite visible, and will be
something that all varieties of index tuple deletion see a performance
benefit from (not just bottom-up deletion).

-- 
Peter Geoghegan


0006-Experiment-One-array-again.patch
Description: Binary data


0007-Experiment-Remove-shellsort-just-use-qsort.patch
Description: Binary data


RE: Add Nullif case for eval_const_expressions_mutator

2021-01-11 Thread Hou, Zhijie
> > I notice that there are no Nullif case in eval_const_expression.
> > Since Nullif is similar to Opexpr and is easy to implement, I try add
> > this case in eval_const_expressions_mutator.
> 
> I think this patch should be about a tenth the size.  Try modeling it on
> the T_SubscriptingRef-etc case, ie, use ece_generic_processing and then
> ece_evaluate_expr to cover the generic cases.  OpExpr is common enough to
> deserve specially optimized code, but NullIf isn't, so shorter is better.

Thanks for the review.

Attaching v2 patch , which followed the suggestion 
to use ece_generic_processing and ece_evaluate_expr to simplify the code.

Please have a check.

Best regards,
houzj





v2-0001-add-nullif-case-for-eval_const_expressions.patch
Description: v2-0001-add-nullif-case-for-eval_const_expressions.patch


Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-11 Thread japin


On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote:
> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
>  wrote:
>>
>> Hi,
>>
>> While providing thoughts on the design in [1], I found a strange
>> behaviour with the $subject. The use case is shown below as a sequence
>> of steps that need to be run on publisher and subscriber to arrive at
>> the strange behaviour.  In step 5, the table is dropped from the
>> publication and in step 6, the refresh publication is run on the
>> subscriber, from here onwards, the expectation is that no further
>> inserts into the publisher table have to be replicated on to the
>> subscriber, but the opposite happens i.e. the inserts are still
>> replicated to the subscriber. ISTM as a bug. Let me know if I'm
>> missing anything.
>>
>
> Did you try to investigate what's going on? Can you please check what
> is the behavior if, after step-5, you restart the subscriber and
> separately try creating a new subscription (maybe on a different
> server) for that publication after step-5 and see if that allows the
> relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> remove such dropped rels and stop their corresponding apply workers
> which should stop the further replication of such relations but that
> doesn't seem to be happening in your case.

If we restart the subscriber after step-5, it will not replicate the records.

As I said in [1], if we don't insert a new data in step-5, it will not
replicate the records.

In both cases, the AlterSubscription_refresh() call RemoveSubscriptionRel()
and logicalrep_worker_stop_at_commit().  However, if we insert a data in
step-5, it doesn't work as expected.  Any thoughts?

[1] 
https://www.postgresql.org/message-id/a7a618fb-f87c-439c-90a3-93cf9e734...@hotmail.com

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: pg_upgrade test for binary compatibility of core data types

2021-01-11 Thread Justin Pryzby
On Mon, Jan 11, 2021 at 03:28:08PM +0100, Peter Eisentraut wrote:
> On 2020-12-27 20:07, Justin Pryzby wrote:
> > On Wed, Dec 16, 2020 at 11:22:23AM -0600, Justin Pryzby wrote:
> > > On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote:
> > > > I meant to notice if the binary format is accidentally changed again, 
> > > > which was
> > > > what happened here:
> > > > 7c15cef86 Base information_schema.sql_identifier domain on name, not 
> > > > varchar.
> > > > 
> > > > I added a table to the regression tests so it's processed by pg_upgrade 
> > > > tests,
> > > > run like:
> > > > 
> > > > | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 
> > > > oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin
> > > 
> > > Per cfbot, this avoids testing ::xml (support for which may not be 
> > > enabled)
> > > And also now tests oid types.
> > > 
> > > I think the per-version hacks should be grouped by logical change, rather 
> > > than
> > > by version.  Which I've started doing here.
> > 
> > rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f
> 
> I think these patches could use some in-place documentation of what they are
> trying to achieve and how they do it.  The required information is spread
> over a lengthy thread.  No one wants to read that.  Add commit messages to
> the patches.

0001 patch fixes pg_upgrade/test.sh, which was disfunctional.
Portions of the first patch were independently handled by commits 52202bb39,
fa744697c, 091866724.  So this is rebased on those.
I guess updating this script should be a part of a beta-checklist somewhere,
since I guess nobody will want to backpatch changes for testing older releases.

0002 allows detecting the information_schema problem that was introduced at:
7c15cef86 Base information_schema.sql_identifier domain on name, not varchar.

+-- Create a table with different data types, to exercise binary compatibility
+-- during pg_upgrade test

If binary compatibility is changed I expect this will error, crash, at least
return wrong data, and thereby fail tests.

-- 
Justin

On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote:
> I checked that if I cherry-pick 0002 to v11, and comment out
> old_11_check_for_sql_identifier_data_type_usage(), then pg_upgrade/test.sh
> detects the original problem:
> pg_dump: error: Error message from server: ERROR:  invalid memory alloc 
> request size 18446744073709551613
> 
> I understand the buildfarm has its own cross-version-upgrade test, which I
> think would catch this on its own.
> 
> These all seem to complicate use of pg_upgrade/test.sh, so 0001 is needed to
> allow testing upgrade from older releases.
> 
> e78900afd217fa3eaa77c51e23a94c1466af421c Create by default sql/ and expected/ 
> for output directory in pg_regress
> 40b132c1afbb4b1494aa8e48cc35ec98d2b90777 In the pg_upgrade test suite, don't 
> write to src/test/regress.
> fc49e24fa69a15efacd5b8958115ed9c43c48f9a Make WAL segment size configurable 
> at initdb time.
> c37b3d08ca6873f9d4eaf24c72a90a550970cbb8 Allow group access on PGDATA
> da9b580d89903fee871cf54845ffa2b26bda2e11 Refactor dir/file permissions

>From b3f829ab0fd880962d43eac0222bdaab2b8070f4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 5 Dec 2020 22:31:19 -0600
Subject: [PATCH v4 1/3] WIP: pg_upgrade/test.sh: changes needed to allow
 testing upgrade to v14dev from v9.5-v13

---
 src/bin/pg_upgrade/test.sh | 93 +++---
 1 file changed, 86 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
index ca923ba01b..b36fca4233 100644
--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -177,18 +177,97 @@ if "$MAKE" -C "$oldsrc" installcheck-parallel; then
 		esac
 		fix_sql="$fix_sql
  DROP FUNCTION IF EXISTS
-	public.oldstyle_length(integer, text);	-- last in 9.6
+	public.oldstyle_length(integer, text);"	# last in 9.6 -- commit 5ded4bd21
+		fix_sql="$fix_sql
  DROP FUNCTION IF EXISTS
-	public.putenv(text);	-- last in v13
- DROP OPERATOR IF EXISTS	-- last in v13
-	public.#@# (pg_catalog.int8, NONE),
-	public.#%# (pg_catalog.int8, NONE),
-	public.!=- (pg_catalog.int8, NONE),
+	public.putenv(text);"	# last in v13
+		# last in v13 commit 76f412ab3
+		# public.!=- This one is only needed for v11+ ??
+		# Note, until v10, operators could only be dropped one at a time
+		fix_sql="$fix_sql
+ DROP OPERATOR IF EXISTS
+	public.#@# (pg_catalog.int8, NONE);"
+		fix_sql="$fix_sql
+ DROP OPERATOR IF EXISTS
+	public.#%# (pg_catalog.int8, NONE);"
+		fix_sql="$fix_sql
+ DROP OPERATOR IF EXISTS
+	public.!=- (pg_catalog.int8, NONE);"
+		fix_sql="$fix_sql
+ DROP OPERATOR IF EXISTS
 	public.#@%# (pg_catalog.int8, NONE);"
+
+		# commit 068503c76511cdb0080bab689662a20e86b9c845
+		case $oldpgversion in
+			10)
+fix_sql="$fix_sql
+	DROP TRANSFORM FOR integer LANGUAGE sql CASCADE;"
+;;
+		esac
+
+		# commit 

Re: [POC] Fast COPY FROM command for the table with foreign partitions

2021-01-11 Thread Andrey V. Lepikhov

On 1/11/21 11:16 PM, Tomas Vondra wrote:

Hi Andrey,

Unfortunately, this no longer applies :-( I tried to apply this on top
of c532d15ddd (from 2020/12/30) but even that has non-trivial conflicts.

Can you send a rebased version?

regards


Applied on 044aa9e70e.

--
regards,
Andrey Lepikhov
Postgres Professional
>From f8e0cd305c691108313c2365cc4576e4d5e0bd38 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Tue, 12 Jan 2021 08:54:45 +0500
Subject: [PATCH 2/2] Fast COPY FROM into the foreign or sharded table.

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible and foreign table has non-zero number of columns.

FDWAPI was extended by next routines:
* BeginForeignCopy
* EndForeignCopy
* ExecForeignCopy

BeginForeignCopy and EndForeignCopy initialize and free
the CopyState of bulk COPY. The ExecForeignCopy routine send
'COPY ... FROM STDIN' command to the foreign server, in iterative
manner send tuples by CopyTo() machinery, send EOF to this connection.

Code that constructed list of columns for a given foreign relation
in the deparseAnalyzeSql() routine is separated to the deparseRelColumnList().
It is reused in the deparseCopyFromSql().

Added TAP-tests on the specific corner cases of COPY FROM STDIN operation.

By the analogy of CopyFrom() the CopyState structure was extended
with data_dest_cb callback. It is used for send text representation
of a tuple to a custom destination.
The PgFdwModifyState structure is extended with the cstate field.
It is needed for avoid repeated initialization of CopyState. ALso for this
reason CopyTo() routine was split into the set of routines CopyToStart()/
CopyTo()/CopyToFinish().

Enum CopyInsertMethod removed. This logic implements by ri_usesMultiInsert
field of the ResultRelInfo sructure.

Discussion:
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote
---
 contrib/postgres_fdw/deparse.c|  60 ++--
 .../postgres_fdw/expected/postgres_fdw.out|  46 ++-
 contrib/postgres_fdw/postgres_fdw.c   | 130 ++
 contrib/postgres_fdw/postgres_fdw.h   |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |  45 ++
 doc/src/sgml/fdwhandler.sgml  |  73 ++
 src/backend/commands/copy.c   |   4 +-
 src/backend/commands/copyfrom.c   | 126 ++---
 src/backend/commands/copyto.c |  84 ---
 src/backend/executor/execMain.c   |   8 +-
 src/backend/executor/execPartition.c  |  27 +++-
 src/include/commands/copy.h   |   8 +-
 src/include/foreign/fdwapi.h  |  15 ++
 13 files changed, 533 insertions(+), 94 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4eb1e..b1ca479a65 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -184,6 +184,8 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 	deparse_expr_cxt *context);
+static List *deparseRelColumnList(StringInfo buf, Relation rel,
+  bool enclose_in_parens);
 
 /*
  * Helper functions
@@ -1763,6 +1765,20 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 		 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * Deparse COPY FROM into given buf.
+ * We need to use list of parameters at each query.
+ */
+void
+deparseCopyFromSql(StringInfo buf, Relation rel)
+{
+	appendStringInfoString(buf, "COPY ");
+	deparseRelation(buf, rel);
+	(void) deparseRelColumnList(buf, rel, true);
+
+	appendStringInfoString(buf, " FROM STDIN ");
+}
+
 /*
  * deparse remote UPDATE statement
  *
@@ -2066,6 +2082,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
  */
 void
 deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+{
+	appendStringInfoString(buf, "SELECT ");
+	*retrieved_attrs = deparseRelColumnList(buf, rel, false);
+
+	/* Don't generate bad syntax for zero-column relation. */
+	if (list_length(*retrieved_attrs) == 0)
+		appendStringInfoString(buf, "NULL");
+
+	/*
+	 * Construct FROM clause
+	 */
+	appendStringInfoString(buf, " FROM ");
+	deparseRelation(buf, rel);
+}
+
+/*
+ * Construct the list of columns of given foreign relation in the order they
+ * appear in the tuple descriptor of the relation. Ignore any dropped columns.
+ * Use column names on the foreign server instead of local names.
+ *
+ * Optionally enclose the list in parantheses.
+ */
+static List *
+deparseRelColumnList(StringInfo buf, Relation rel, bool enclose_in_parens)
 {
 	Oid			relid = RelationGetRelid(rel);
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -2074,10 +2114,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List 

Re: logical replication worker accesses catalogs in error context callback

2021-01-11 Thread Masahiko Sawada
On Mon, Jan 11, 2021 at 4:46 PM Bharath Rupireddy
 wrote:
>
> On Mon, Jan 11, 2021 at 10:56 AM Masahiko Sawada  
> wrote:
> > Agreed. Attached the updated patch.
>
> Thanks for the updated patch. Looks like the comment crosses the 80
> char limit, I have corrected it. And also changed the variable name
> from remotetypeid to remotetypid, so that logicalrep_typmap_gettypname
> will not cross the 80 char limit. And also added a commit message.
> Attaching v3 patch, please have a look. Both make check and make
> check-world passes.

Thanks! The change looks good to me.

>
> > > I quickly searched in places where error callbacks are being used, I
> > > think we need a similar kind of fix for conversion_error_callback in
> > > postgres_fdw.c, because get_attname and get_rel_name are being used
> > > which do catalogue lookups. ISTM that all the other error callbacks
> > > are good in the sense that they are not doing sys catalogue lookups.
> >
> > Indeed. If we need to disallow the catalog lookup during executing
> > error callbacks we might want to have an assertion checking that in
> > SearchCatCacheInternal(), in addition to Assert(IsTransactionState()).
>
> I tried to add(as attached in
> v1-0001-Avoid-Sys-Cache-Lookups-in-Error-Callbacks.patch) the
> Assert(!error_context_stack); in SearchCatCacheInternal, initdb itself
> fails [1]. That means, we are doing a bunch of catalogue access from
> error context callbacks. Given this, I'm not quite sure whether we can
> have such an assertion in SearchCatCacheInternal.

I think checking !error_context_stack is not a correct check if we're
executing an error context callback since it's a stack to store
callbacks. It can be non-NULL by setting an error callback, see
setup_parser_errposition_callback() for instance. Probably we need to
check if (recursion_depth > 0) and elevel. Attached a patch for that
as an example.

>
> Although unrelated to what we are discussing here -  when I looked at
> SearchCatCacheInternal, I found that the function SearchCatCache has
> SearchCatCacheInternal in the function comment, I think we should
> correct it. Thoughts? If okay, I will post a separate patch for this
> minor comment fix.

Perhaps you mean this?

/*
 *  SearchCatCacheInternal
 *
 *  This call searches a system cache for a tuple, opening the relation
 *  if necessary (on the first access to a particular cache).
 *
 *  The result is NULL if not found, or a pointer to a HeapTuple in
 *  the cache.  The caller must not modify the tuple, and must call
 *  ReleaseCatCache() when done with it.
 *
 * The search key values should be expressed as Datums of the key columns'
 * datatype(s).  (Pass zeroes for any unused parameters.)  As a special
 * exception, the passed-in key for a NAME column can be just a C string;
 * the caller need not go to the trouble of converting it to a fully
 * null-padded NAME.
 */
HeapTuple
SearchCatCache(CatCache *cache,

Looking at commit 141fd1b66 it intentionally changed to
SearchCatCacheInternal from SearchCatCache but it seems to me that
it's a typo.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/


avoid_sys_cache_lookup_in_error_callback_v2.patch
Description: Binary data


RE: Disable WAL logging to speed up data loading

2021-01-11 Thread tsunakawa.ta...@fujitsu.com
From: Osumi, Takamichi/大墨 昂道 
> I updated the patch following this discussion, and fixed the documentation as
> well.


+ (rmid == RM_GIST_ID && info == RM_GIST_ID) ||
+ (rmid == RM_GENERIC_ID)))

info is wrong for GiST, and one pair of parenthesis is unnecessary.  The above 
would be:

+ (rmid == RM_GIST_ID && info == XLOG_GIST_ASSIGN_LSN) ||
+ rmid == RM_GENERIC_ID))


Regards
Takayuki Tsunakawa



Re: [POC] Fast COPY FROM command for the table with foreign partitions

2021-01-11 Thread Andrey V. Lepikhov

On 1/11/21 4:59 PM, Tang, Haiying wrote:

Hi Andrey,

I had a general look at this extension feature, I think it's beneficial for 
some application scenarios of PostgreSQL. So I did 7 performance cases test on 
your patch(v13). The results are really good. As you can see below we can get 
7-10 times improvement with this patch.

PSA test_copy_from.sql shows my test cases detail(I didn't attach my data file 
since it's too big).

Below are the test results:
'Test No' corresponds to the number(0 1...6) in attached test_copy_from.sql.
%reg=(Patched-Unpatched)/Unpatched), Unit is millisecond.

|Test No| Test Case 
  |Patched(ms)  | Unpatched(ms) |%reg   |
|---|-|-|---|---|
|0  |COPY FROM insertion into the partitioned table(parition is foreign 
table)| 102483.223  |  1083300.907  |  -91% |
|1  |COPY FROM insertion into the partitioned table(parition is foreign 
partition)| 104779.893  |  1207320.287  |  -91% |
|2  |COPY FROM insertion into the foreign table(without partition)  
  | 100268.730  |  1077309.158  |  -91% |
|3  |COPY FROM insertion into the partitioned table(part of foreign 
partitions)   | 104110.620  |  1134781.855  |  -91% |
|4  |COPY FROM insertion into the partitioned table with constraint(part of 
foreign partition)| 136356.201  |  1238539.603  |  -89% |
|5  |COPY FROM insertion into the foreign table with constraint(without 
partition)| 136818.262  |  1189921.742  |  -89% |
|6  |\copy insertion into the partitioned table with constraint.
  | 140368.072  |  1242689.924  |  -89% |

If there is any question on my tests, please feel free to ask.

Best Regard,
Tang

Thank you for this work.
Sometimes before i suggested additional optimization [1] which can 
additionally speed up COPY by 2-4 times. Maybe you can perform the 
benchmark for this solution too?


[1] 
https://www.postgresql.org/message-id/da7ed3f5-b596-2549-3710-4cc2a602ec17%40postgrespro.ru


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Movement of restart_lsn position movement of logical replication slots is very slow

2021-01-11 Thread Jammie
Hi Amit,
Thanks for the response .
Can you please let me know what pg_current_wal_lsn returns ?

is this position the LSN of the next log record to be created, or is it the
LSN of the last log record already created and inserted in the log?

The document says
- it returns current WAL write location.

Regards
Shailesh

On Thu, 24 Dec, 2020, 7:43 pm Amit Kapila,  wrote:

> On Thu, Dec 24, 2020 at 7:30 PM Jammie  wrote:
> >
> > Sorry dont have the debug setup handy. However the sql commands now
> works though to move the restart_lsn of the slots in standlone code from
> psql.
> >
> >  A few followup questions.
> >
> > What is catalog_xmin in the pg_replication_slots ? and how is it playing
> role in moving the restart_lsn of the slot.
> >
> > I am just checking possibility that if a special transaction can cause
> private slot to stale ?
> >
>
> Yeah, it is possible if there is some old transaction is active in the
> system. The restart_lsn is lsn required by the oldesttxn. But it is
> strange that it affects only one of the slots.
>
> --
> With Regards,
> Amit Kapila.
>


Re: Key management with tests

2021-01-11 Thread Neil Chen
Hi Stephen,

On Tue, Jan 12, 2021 at 10:47 AM Stephen Frost  wrote:

>
> This is an interesting question but ultimately I don't think we should
> be looking at this from the perspective of allowing arbitrary changes to
> the page format.  The challenge is that much of the page format, today,
> is defined by a C struct and changing the way that works would require a
> great deal of code to be modified and turn this into a massive effort,
> assuming we wish to have the same compiled binary able to work with both
> unencrypted and encrypted clusters, which I do believe is a requirement.
>
> The thought that I had was to, instead, try to figure out if we could
> fudge some space by, say, putting a 128-bit 'hole' at the end of the
> page and just move pd_special back, effectively making the page seem
> 'smaller' to all of the code that uses it, except for the code that
> knows how to do the decryption.  I ran into some trouble with that but
> haven't quite sorted out what happened yet.  Other ideas would be to put
> it before pd_special, or maybe somewhere else, but a lot depends on the
> code's expectations.
>
>
I agree that we should not make too many changes to affect the use of
unencrypted clusters. But as a personal opinion only, I don't think it's a
good idea to add some "implicit" tricks. To provide an inspiration, can we
add a flag to mark whether the page format has been changed:

--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -181,8 +185,9 @@ typedef PageHeaderData *PageHeader;
 #define PD_PAGE_FULL 0x0002 /* not enough free space for new tuple? */
 #define PD_ALL_VISIBLE 0x0004 /* all tuples on page are visible to
  * everyone */
+#define PD_PAGE_ENCRYPTED 0x0008 /* Is page encrypted? */

-#define PD_VALID_FLAG_BITS 0x0007 /* OR of all valid pd_flags bits */
+#define PD_VALID_FLAG_BITS 0x000F /* OR of all valid pd_flags bits */

 /*
  * Page layout version number 0 is for pre-7.3 Postgres releases.
@@ -389,6 +394,13 @@ PageValidateSpecialPointer(Page page)
 #define PageClearAllVisible(page) \
  (((PageHeader) (page))->pd_flags &= ~PD_ALL_VISIBLE)

+#define PageIsEncrypted(page) \
+ (((PageHeader) (page))->pd_flags & PD_PAGE_ENCRYPTED)
+#define PageSetEncrypted(page) \
+ (((PageHeader) (page))->pd_flags |= PD_PAGE_ENCRYPTED)
+#define PageClearEncrypted(page) \
+ (((PageHeader) (page))->pd_flags &= ~PD_PAGE_ENCRYPTED)
+
 #define PageIsPrunable(page, oldestxmin) \
 ( \
  AssertMacro(TransactionIdIsNormal(oldestxmin)), \


In this way, I think it has little effect on the unencrypted cluster, and
we can also modify the page format as we wish. Of course, it's also
possible that I didn't understand your design correctly, or there's
something wrong with my idea. :D

-- 
There is no royal road to learning.
HighGo Software Co.


Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-11 Thread Amit Kapila
On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
> While providing thoughts on the design in [1], I found a strange
> behaviour with the $subject. The use case is shown below as a sequence
> of steps that need to be run on publisher and subscriber to arrive at
> the strange behaviour.  In step 5, the table is dropped from the
> publication and in step 6, the refresh publication is run on the
> subscriber, from here onwards, the expectation is that no further
> inserts into the publisher table have to be replicated on to the
> subscriber, but the opposite happens i.e. the inserts are still
> replicated to the subscriber. ISTM as a bug. Let me know if I'm
> missing anything.
>

Did you try to investigate what's going on? Can you please check what
is the behavior if, after step-5, you restart the subscriber and
separately try creating a new subscription (maybe on a different
server) for that publication after step-5 and see if that allows the
relation to be replicated? AFAIU, in AlterSubscription_refresh, we
remove such dropped rels and stop their corresponding apply workers
which should stop the further replication of such relations but that
doesn't seem to be happening in your case.

-- 
With Regards,
Amit Kapila.




Re: [Patch] Optimize dropping of relation buffers using dlist

2021-01-11 Thread Amit Kapila
On Fri, Jan 8, 2021 at 7:03 AM Kyotaro Horiguchi
 wrote:
>
> At Thu, 7 Jan 2021 09:25:22 +, "k.jami...@fujitsu.com" 
>  wrote in:
> > > Thanks for the detailed tests. NBuffers/32 seems like an appropriate
> > > value for the threshold based on these results. I would like to
> > > slightly modify part of the commit message in the first patch as below
> > > [1], otherwise, I am fine with the changes. Unless you or anyone else
> > > has any more comments, I am planning to push the 0001 and 0002
> > > sometime next week.
> > >
> > > [1]
> > > "The recovery path of DropRelFileNodeBuffers() is optimized so that
> > > scanning of the whole buffer pool can be avoided when the number of
> > > blocks to be truncated in a relation is below a certain threshold. For
> > > such cases, we find the buffers by doing lookups in BufMapping table.
> > > This improves the performance by more than 100 times in many cases
> > > when several small tables (tested with 1000 relations) are truncated
> > > and where the server is configured with a large value of shared
> > > buffers (greater than 100GB)."
> >
> > Thank you for taking a look at the results of the tests. And it's also
> > consistent with the results from Tang too.
> > The commit message LGTM.
>
> +1.
>

I have pushed the 0001.

-- 
With Regards,
Amit Kapila.




RE: Disable WAL logging to speed up data loading

2021-01-11 Thread osumi.takami...@fujitsu.com
Hi

On Mon, Jan 11, 2021 9:14 AM Tsunakawa, Takayuki  
wrote:
> From: Masahiko Sawada 
> > I think it's better to have index AM (and perhaps table AM) control it
> > instead of filtering in XLogInsert(). Because otherwise third-party
> > access methods that use LSN like gist indexes cannot write WAL records
> > at all when wal_level = none even if they want.
> 
> Hm, third-party extensions use RM_GENERIC_ID, so it should probably be
> allowed in XLogInsert() as well instead of allowing control in some other way.
> It's not unnatural that WAL records for in-core AMs are filtered in 
> XLogInsert().
I updated the patch following this discussion,
and fixed the documentation as well.
No failure is found during make check-world.


Best Regards,
Takamichi Osumi


disable_WAL_logging_v06.patch
Description: disable_WAL_logging_v06.patch


RE: POC: postgres_fdw insert batching

2021-01-11 Thread tsunakawa.ta...@fujitsu.com
From: Tomas Vondra 
> Attached is a v6 of this patch, rebased to current master and with some minor
> improvements (mostly comments and renaming the "end" struct field to
> "values_end" which I think is more descriptive).

Thank you very much.  In fact, my initial patches used values_end, and I 
changed it to len considering that it may be used for bulk UPDATEand DELETE in 
the future.  But I think values_end is easier to understand its role, too.


Regards
Takayuki Tsunakawa



Re: pg_dump, ATTACH, and independently restorable child partitions

2021-01-11 Thread Tom Lane
Justin Pryzby  writes:
> [ v5-0001-pg_dump-output-separate-object-for-ALTER-TABLE.AT.patch ]

Pushed with mostly cosmetic edits.

One thing I changed that isn't cosmetic is that I set the ArchiveEntry's
owner to be the owner of the child table.  Although we aren't going to
do any sort of ALTER OWNER on this, it's still important that the owner
be marked as someone who has the right permissions to issue the ALTER.
The default case is that the original user will issue the ATTACH, which
basically only works if you run the restore as superuser.  It looks to
me like you copied this decision from the INDEX ATTACH code, which is
just as broken --- I'm going to go fix/backpatch that momentarily.

Another thing that bothers me still is that it's not real clear that
this code plays nicely with selective dumps, because it's not doing
anything to set the dobj.dump field in a non-default way (which in
turn means that the dobj.dump test in dumpTableAttach can never fire).
It seems like it might be possible to emit a TABLE ATTACH object
even though one or both of the referenced tables didn't get dumped.
In some desultory testing I couldn't get that to actually happen, but
maybe I just didn't push on it in the right way.  I'd be happier about
this if we set the flags with something along the lines of

attachinfo->dobj.dump = attachinfo->parentTbl->dobj.dump &
attachinfo->partitionTbl->dobj.dump;

regards, tom lane




Re: Moving other hex functions to /common

2021-01-11 Thread Michael Paquier
On Mon, Jan 11, 2021 at 11:27:30AM -0500, Bruce Momjian wrote:
> Sure, I realize the elog/pg_log is odd, but the alternatives seem worse.

I guess that it depends on the use cases.  If there is no need to
worry about shared libraries, elog/pg_log would do just fine.

> You can take ownership of my hex patch and just add to it.  I know you
> already did the hex length part, and have other ideas of what you want.

OK, thanks.  I have been looking at it, and tweaked the patch as per
the attached.  That's basically what you did on the following points:
- Use size_t for the arguments, uint64 as return result.
- Leave ECPG out of the equation.
- Use pg_log()/elog() to report failures in src/common/, rather than
error codes.
- Renamed the arguments of encode.c to src, srclen, dst, dstlen.

The two only things that were not present are the set of checks for
overflows, and the adjustments for varlena.c.  The first point makes
the code of encode.c safer, as previously the code would issue a FATAL
*after* writing out-of-bound data.  Now it issues an ERROR before any
overwrite is done, and I have added some assertions as an extra safety
net.  For the second, I think that it makes the allocation pattern
easier to follow, similarly to checksum manifests.

Thoughts?
--
Michael
From 3e5c9fa42ff981933bc6eeede92335ae89e94e21 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 12 Jan 2021 11:21:42 +0900
Subject: [PATCH v5] Refactor hex code

---
 src/include/common/hex.h  |  25 +++
 src/include/common/hex_decode.h   |  16 --
 src/include/utils/builtins.h  |   3 -
 src/backend/replication/backup_manifest.c |  28 ++--
 src/backend/utils/adt/encode.c|  96 ++-
 src/backend/utils/adt/varlena.c   |  16 +-
 src/common/Makefile   |   2 +-
 src/common/hex.c  | 196 ++
 src/common/hex_decode.c   | 106 
 src/tools/msvc/Mkvcbuild.pm   |   2 +-
 10 files changed, 308 insertions(+), 182 deletions(-)
 create mode 100644 src/include/common/hex.h
 delete mode 100644 src/include/common/hex_decode.h
 create mode 100644 src/common/hex.c
 delete mode 100644 src/common/hex_decode.c

diff --git a/src/include/common/hex.h b/src/include/common/hex.h
new file mode 100644
index 00..3c3c956bb6
--- /dev/null
+++ b/src/include/common/hex.h
@@ -0,0 +1,25 @@
+/*
+ *
+ *	hex.h
+ *	  Encoding and decoding routines for hex strings.
+ *
+ *	Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ *	Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		  src/include/common/hex.h
+ *
+ *
+ */
+
+#ifndef COMMON_HEX_H
+#define COMMON_HEX_H
+
+extern uint64 pg_hex_decode(const char *src, size_t srclen,
+			char *dst, size_t dstlen);
+extern uint64 pg_hex_encode(const char *src, size_t srclen,
+			char *dst, size_t dstlen);
+extern uint64 pg_hex_enc_len(size_t srclen);
+extern uint64 pg_hex_dec_len(size_t srclen);
+
+#endif			/* COMMON_HEX_H */
diff --git a/src/include/common/hex_decode.h b/src/include/common/hex_decode.h
deleted file mode 100644
index 29ab248458..00
--- a/src/include/common/hex_decode.h
+++ /dev/null
@@ -1,16 +0,0 @@
-/*
- *	hex_decode.h
- *		hex decoding
- *
- *	Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
- *	Portions Copyright (c) 1994, Regents of the University of California
- *
- *	src/include/common/hex_decode.h
- */
-#ifndef COMMON_HEX_DECODE_H
-#define COMMON_HEX_DECODE_H
-
-extern uint64 hex_decode(const char *src, size_t len, char *dst);
-
-
-#endif			/* COMMON_HEX_DECODE_H */
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 6f739a8822..27d2f2ffb3 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -31,9 +31,6 @@ extern void domain_check(Datum value, bool isnull, Oid domainType,
 extern int	errdatatype(Oid datatypeOid);
 extern int	errdomainconstraint(Oid datatypeOid, const char *conname);
 
-/* encode.c */
-extern uint64 hex_encode(const char *src, size_t len, char *dst);
-
 /* int.c */
 extern int2vector *buildint2vector(const int16 *int2s, int n);
 
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 8af94610b3..0df5186828 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -13,11 +13,11 @@
 #include "postgres.h"
 
 #include "access/timeline.h"
+#include "common/hex.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "replication/backup_manifest.h"
-#include "utils/builtins.h"
 #include "utils/json.h"
 
 static void AppendStringToManifest(backup_manifest_info *manifest, char *s);
@@ -150,10 +150,12 @@ 

Re: Key management with tests

2021-01-11 Thread Bruce Momjian
On Tue, Jan 12, 2021 at 09:32:54AM +0900, Masahiko Sawada wrote:
> On Tue, Jan 12, 2021 at 3:23 AM Stephen Frost  wrote:
> > Right, or ensure that the actual IV used is distinct (such as by using
> > another bit in the IV to distinguish logged-vs-unlogged), but it seems
> > saner to just use a different key, ultimately.
> 
> Agreed.
> 
> I think we also need to consider how to make sure nonce is unique when
> making a page dirty by updating hint bits. Hint bit update changes the
> page contents but doesn't change the page lsn if we already write a
> full page write. In the PoC patch, I logged a dummy WAL record
> (XLOG_NOOP) just to move the page lsn forward, but since this is
> required even when changing the page is not the first time since the
> last checkpoint we might end up logging too many dummy WAL records.

This says:


https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Other_requirements

wal_log_hints will be enabled automatically in encryption mode. 

Does that help?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion

2021-01-11 Thread Fujii Masao
On Tue, Jan 12, 2021 at 10:00 AM Masahiko Sawada  wrote:
>
> On Mon, Jan 11, 2021 at 11:00 PM Peter Eisentraut
>  wrote:
> >
> > On 2021-01-05 10:56, Masahiko Sawada wrote:
> > > BTW according to the documentation, the options of DECLARE statement
> > > (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive.
> > >
> > > DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
> > >  CURSOR [ { WITH | WITHOUT } HOLD ] FOR query
> > >
> > > But I realized that these options are actually order-insensitive. For
> > > instance, we can declare a cursor like:
> > >
> > > =# declare abc scroll binary cursor for select * from pg_class;
> > > DECLARE CURSOR
> > >
> > > The both parser code and documentation has been unchanged from 2003.
> > > Is it a documentation bug?
> >
> > According to the SQL standard, the ordering of the cursor properties is
> > fixed.  Even if the PostgreSQL parser offers more flexibility, I think
> > we should continue to encourage writing the clauses in the standard order.
>
> Thanks for your comment. Agreed.
>
> So regarding the tab completion for DECLARE statement, perhaps it
> would be better to follow the documentation?

IMO yes because it's less confusing to make the document and
tab-completion consistent.

Regards,

-- 
Fujii Masao




Re: POC: postgres_fdw insert batching

2021-01-11 Thread Tomas Vondra

Hi,

Attached is a v6 of this patch, rebased to current master and with some 
minor improvements (mostly comments and renaming the "end" struct field 
to "values_end" which I think is more descriptive).


The one thing that keeps bugging me is convert_prep_stmt_params - it 
dies the right thing, but the code is somewhat confusing.



AFAICS the discussions about making this use COPY and/or libpq 
pipelining (neither of which is committed yet) ended with the conclusion 
that those changes are somewhat independent, and that it's worth getting 
this committed in the current form. Barring objections, I'll push this 
within the next couple days.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 1e4a99c6d4a5221dadc9e7a9922bdd9e3ebe1310 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Tue, 12 Jan 2021 01:36:01 +0100
Subject: [PATCH] Add bulk insert for foreign tables

---
 contrib/postgres_fdw/deparse.c|  43 ++-
 .../postgres_fdw/expected/postgres_fdw.out| 116 ++-
 contrib/postgres_fdw/option.c |  14 +
 contrib/postgres_fdw/postgres_fdw.c   | 291 ++
 contrib/postgres_fdw/postgres_fdw.h   |   5 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  91 ++
 doc/src/sgml/fdwhandler.sgml  |  89 +-
 doc/src/sgml/postgres-fdw.sgml|  13 +
 src/backend/executor/execPartition.c  |  11 +
 src/backend/executor/nodeModifyTable.c| 161 ++
 src/backend/nodes/list.c  |  15 +
 src/include/executor/execPartition.h  |   1 +
 src/include/foreign/fdwapi.h  |  10 +
 src/include/nodes/execnodes.h |   6 +
 src/include/nodes/pg_list.h   |  15 +
 15 files changed, 815 insertions(+), 66 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4eb1e..2d38ab25cb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1711,7 +1711,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
  Index rtindex, Relation rel,
  List *targetAttrs, bool doNothing,
  List *withCheckOptionList, List *returningList,
- List **retrieved_attrs)
+ List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1754,6 +1754,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1763,6 +1764,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 		 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * rebuild remote INSERT statement
+ *
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+ int values_end_len, int num_cols,
+ int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c11092f8cc..96bad17ded 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8911,7 +8911,7 @@ DO $d$
 END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, 

Re: list of extended statistics on psql

2021-01-11 Thread Tatsuro Yamada

Hi Tomas,

On 2021/01/09 9:01, Tomas Vondra wrote:

On 1/8/21 1:14 AM, Tomas Vondra wrote:

On 1/8/21 12:52 AM, Tatsuro Yamada wrote:

On 2021/01/08 0:56, Tomas Vondra wrote:

On 1/7/21 3:47 PM, Alvaro Herrera wrote:

On 2021-Jan-07, Tomas Vondra wrote:

On 1/7/21 1:46 AM, Tatsuro Yamada wrote:



I overlooked the check for MCV in the logic building query
because I created the patch as a new feature on PG14.
I'm not sure whether we should do back patch or not. However, I'll
add the check on the next patch because it is useful if you decide to
do the back patch on PG10, 11, 12, and 13.


BTW perhaps a quick look at the other \d commands would show if
there are
precedents. I didn't have time for that.


Yes, we do promise that new psql works with older servers.



Yeah, makes sense. That means we need add the check for 12 / MCV.



Ah, I got it.
I fixed the patch to work with older servers to add the checking
versions. And I tested \dX command on older servers (PG10 - 13).
These results look fine.

0001:
   Added the check code to handle pre-PG12. It has not MCV and
    pg_statistic_ext_data.
0002:
   This patch is the same as the previous patch (not changed).

Please find the attached files.



OK, thanks. I'll take a look and probably push tomorrow. FWIW I plan to
squash the patches into a single commit.



Attached is a patch I plan to commit - 0001 is the last submitted
version with a couple minor tweaks, mostly in docs/comments, and small
rework of branching to be more like the other functions in describe.c.


Thanks for revising the patch.
I reviewed the 0001, and the branching and comments look good to me.
However, I added an alias name in processSQLNamePattern() on the patch:
s/"stxname"/"es.stxname"/



While working on that, I realized that 'defined' might be a bit
ambiguous, I initially thought it means 'NOT NULL' (which it does not).
I propose to change it to 'requested' instead. Tatsuro, do you agree, or
do you think 'defined' is better?


Regarding the status of extended stats, I think the followings:

 - "defined": it shows the extended stats defined only. We can't know
  whether it needs to analyze or not. I agree this name was
   ambiguous. Therefore we should replace it with a more suitable
  name.
 - "requested": it shows the extended stats needs something. Of course,
  we know it needs to ANALYZE because we can create the patch.
  However, I feel there is a little ambiguity for DBA.
  To solve this, it would be better to write an explanation of
  the status in the document. For example,

==
The column of the kind of extended stats (e. g. Ndistinct) shows some statuses.
"requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE
 was finished, and the planner can use it. NULL means that it doesn't exists.
==

What do you think? :-D


Thanks,
Tatsuro Yamada
>From 3d2f4ef2ecba9fd7987df665237add6fc4ec03c1 Mon Sep 17 00:00:00 2001
From: Tatsuro Yamada 
Date: Thu, 7 Jan 2021 14:28:20 +0900
Subject: [PATCH 1/2] psql \dX: list extended statistics objects

The new command lists extended statistics objects, possibly with their
sizes. All past releases with extended statistics are supported.

Author: Tatsuro Yamada
Reviewed-by: Julien Rouhaud, Alvaro Herrera, Tomas Vondra
Discussion: 
https://postgr.es/m/c027a541-5856-75a5-0868-341301e1624b%40nttcom.co.jp_1
---
 doc/src/sgml/ref/psql-ref.sgml  |  14 +++
 src/bin/psql/command.c  |   3 +
 src/bin/psql/describe.c | 150 
 src/bin/psql/describe.h |   3 +
 src/bin/psql/help.c |   1 +
 src/bin/psql/tab-complete.c |   4 +-
 src/test/regress/expected/stats_ext.out |  94 +++
 src/test/regress/sql/stats_ext.sql  |  31 +
 8 files changed, 299 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 221a967bfe..d01acc92b8 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1918,6 +1918,20 @@ testdb=
 
 
   
+  
+  
+\dX [ pattern ]
+
+
+Lists extended statistics.
+If pattern
+is specified, only those extended statistics whose names match the
+pattern are listed.
+If + is appended to the command name, each extended
+statistics is listed with its size.
+
+
+  
 
   
 \dy[+] [ pattern ]
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 303e7c3ad8..c5ebc1c3f4 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -928,6 +928,9 @@ exec_command_d(PsqlScanState scan_state, bool 
active_branch, const char *cmd)
else
success = listExtensions(pattern);
break;
+  

Re: A failure of standby to follow timeline switch

2021-01-11 Thread Fujii Masao
On Sat, Jan 9, 2021 at 5:08 AM Alvaro Herrera  wrote:
>
> Masao-san: Are you intending to act as committer for these?  Since the
> bug is mine I can look into it, but since you already did all the
> reviewing work, I'm good with you giving it the final push.

Thanks! I'm thinking to push the patch.


> 0001 looks good to me; let's get that one committed quickly so that we
> can focus on the interesting stuff.  While the implementation of
> find_in_log is quite dumb (not this patch's fault), it seems sufficient
> to deal with small log files.  We can improve the implementation later,
> if needed, but we have to get the API right on the first try.
>
> 0003: The fix looks good to me.  I verified that the test fails without
> the fix, and it passes with the fix.

Yes.


> The test added in 0002 is a bit optimistic regarding timing, as well as
> potentially slow; it loops 1000 times and sleeps 100 milliseconds each
> time.  In a very slow server (valgrind or clobber_cache animals) this
> could not be sufficient time, while on fast servers it may end up
> waiting longer than needed.  Maybe we can do something like this:

On second thought, I think that the regression test should be in
004_timeline_switch.pl instead of 001_stream_rep.pl because it's
the test about timeline switch. Also I'm thinking that it's better to
test the timeline switch by checking whether some data is successfully
replicatead like the existing regression test for timeline switch in
004_timeline_switch.pl does, instead of finding the specific message
in the log file. I attached the POC patch. Thought?

Regards,

-- 
Fujii Masao


v5-0001-Move-TAP-log-searching-feature-to-common-modules.patch
Description: Binary data


Re: libpq compression

2021-01-11 Thread Justin Pryzby
On Mon, Jan 11, 2021 at 04:53:51PM +0300, Konstantin Knizhnik wrote:
> On 09.01.2021 23:31, Justin Pryzby wrote:
> > I suggest that there should be an enum of algorithms, which is constant 
> > across
> > all servers.  They would be unconditionally included and not #ifdef 
> > depending
> > on compilation options.
> 
> I do not think that it is possible (even right now, it is possible to build
> Postgres without zlib support).
> Also if new compression algorithms  are added, then in any case we have to
> somehow handle situation when
> old client is connected to new server and visa versa.

I mean an enum of all compression supported in the master branch, starting with
ZLIB = 1.  I think this applies to libpq compression (which requires client and
server to handle a common compression algorithm), and pg_dump (output of which
needs to be read by pg_restore), but maybe not TOAST, the patch for which
supports extensions, with dynamically allocated OIDs.

> > In config.sgml, it says that libpq_compression defaults to on (on the server
> > side), but in libpq.sgml it says that it defaults to off (on the client 
> > side).
> > Is that what's intended ?  I would've thought the defaults would match, or 
> > that
> > the server would enforce a default more conservative than the client's (the 
> > DBA
> > should probably have to explicitly enable compression, and need to "opt-in"
> > rather than "opt-out").
> 
> Yes, it is intended behavior:  libpq_compression GUC allows to prohibit
> compression requests fro clients if due to some reasons (security, CPU
> consumption is not desired).
> But by default server should support compression if it is requested by
> client.

It's not clear to me if that's true..   It may be what's convenient for you,
especially during development, but that doesn't mean it's safe or efficient or
what's generally desirable to everyone.

> But client should not request compression by default: it makes sense only for
> queries returning large result sets or transferring a lot of data (liek
> COPY).

I think you're making assumptions about everyone's use of the tools, and it's
better if the DBA makes that determination.  The clients aren't generally under
the admin's control, and if they don't request compression, then it's not used.
If they request compression, then the DBA still has control over whether it's
allowed.

We agree that it should be disabled by default, but I suggest that it's most
flexible if client's makes the request and allow the server to decide.  By
default the server should deny/ignore the request, with the client gracefully
falling back to no compression.

Compression would have little effect on most queries, especially at default
level=1.

> > Maybe instead of a boolean, this should be a list of permitted compression
> > algorithms.  This allows the admin to set a "policy" rather than using the
> > server's hard-coded preferences.  This could be important to disable an
> > algorithm at run-time if there's a vulnerability found, or performance 
> > problem,
> > or buggy client, or for diagnostics, or performance testing.  Actually, I 
> > think
> > it may be important to allow the admin to disable not just an algorithm, but
> > also its options.  It should be possible for the server to allow "zstd"
> > compression but not "zstd --long", or zstd:99 or arbitrarily large window
> > sizes.  This seems similar to SSL cipher strings.  I think we'd want to be 
> > able
> > to allow (or prohibit) at least alg:* (with options) or alg (with default
> > options).
> 
> Sorry, may be you are looking not at the latest version of the patch?
> Right now "compression" parameter accepts not only boolean values but also
> list of suggested algorithms with optional compression level, like
> "zstd:1,zlib"

You're talking about the client compression param.  I'm suggesting that the
server should allow fine-grained control of what compression algs are permitted
at *runtime*.  This would allow distributions to compile with all compression
libraries enabled at configure time, and still allow an DBA to disable one
without recompiling.

> > Your patch documents "libpq_compression=auto" but that doesn't work:
> > WARNING: none of specified algirthms auto is supported by client
> > I guess you mean "any" which is what's implemented.
> > I suggest to just remove that.
> It is some inconsistency with documentation.
> It seems to me that I have already fixed it. "auto" was renamed to "any",
> > I think maybe your patch should include a way to trivially change the 
> > client's
> > compile-time default:
> > "if (!conn->compression) conn->compression = DefaultCompression"
> 
> It can be done using PG_COMPRESSION environment variable.
> Do we need some other mechanism for it?

That's possible with environment variable or connection string.

I'm proposing to simplify change of its default when recompiling, just like 
this one:
src/interfaces/libpq/fe-connect.c:#define DefaultHost   "localhost"

Think 

Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion

2021-01-11 Thread Masahiko Sawada
On Mon, Jan 11, 2021 at 11:00 PM Peter Eisentraut
 wrote:
>
> On 2021-01-05 10:56, Masahiko Sawada wrote:
> > BTW according to the documentation, the options of DECLARE statement
> > (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive.
> >
> > DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
> >  CURSOR [ { WITH | WITHOUT } HOLD ] FOR query
> >
> > But I realized that these options are actually order-insensitive. For
> > instance, we can declare a cursor like:
> >
> > =# declare abc scroll binary cursor for select * from pg_class;
> > DECLARE CURSOR
> >
> > The both parser code and documentation has been unchanged from 2003.
> > Is it a documentation bug?
>
> According to the SQL standard, the ordering of the cursor properties is
> fixed.  Even if the PostgreSQL parser offers more flexibility, I think
> we should continue to encourage writing the clauses in the standard order.

Thanks for your comment. Agreed.

So regarding the tab completion for DECLARE statement, perhaps it
would be better to follow the documentation? In the current proposed
patch, we complete it with the options in any order.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Re: Huge memory consumption on partitioned table with FKs

2021-01-11 Thread Keisuke Kuroda
Hi Amit-san,

> Thanks for checking.  Indeed, it should have been added to the January
> commit-fest.  I've added it to the March one:
>
> https://commitfest.postgresql.org/32/2930/

Thank you for your quick response!

-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: Key management with tests

2021-01-11 Thread Masahiko Sawada
On Tue, Jan 12, 2021 at 3:23 AM Stephen Frost  wrote:
>
> Greetings,
>
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Mon, Jan 11, 2021 at 12:54:49PM -0500, Stephen Frost wrote:
> > > Although, another approach and one that I've discussed a bit with Bruce,
> > > is to have more keys- such as a key for temporary files, and perhaps
> > > even a key for logged relations and a different for unlogged..  Or
> >
> > Yes, we have to make sure the nonce (computed as LSN/pageno) is never
> > reused, so if we have several LSN usage "spaces", they need different
> > data keys.
>
> Right, or ensure that the actual IV used is distinct (such as by using
> another bit in the IV to distinguish logged-vs-unlogged), but it seems
> saner to just use a different key, ultimately.

Agreed.

I think we also need to consider how to make sure nonce is unique when
making a page dirty by updating hint bits. Hint bit update changes the
page contents but doesn't change the page lsn if we already write a
full page write. In the PoC patch, I logged a dummy WAL record
(XLOG_NOOP) just to move the page lsn forward, but since this is
required even when changing the page is not the first time since the
last checkpoint we might end up logging too many dummy WAL records.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Outdated replication protocol error?

2021-01-11 Thread Jeff Davis
Commit 5ee2197767 (about 4 years old) introduced the error:

  "IDENTIFY_SYSTEM has not been run before START_REPLICATION"

But it seems like running START_REPLICATION first works (at least in
the simple case).

We should either:

1. Document that IDENTIFY_SYSTEM must always be run before
START_REPLICATION, and always issue a WARNING if that's not done (an
ERROR might break existing applications); or

2. If the commit is out of date and no longer needed, or if it's easy
enough to fix, just remove the error (and Assert a valid
ThisTimeLineID).

Regards,
Jeff Davis







Re: Implement for window functions

2021-01-11 Thread Tom Lane
I started to look through this patch, and the first thing I'm wondering
about is why bother with a new pg_proc column, ie why not just apply the
behavior to any window function?  The fact that the SQL committee
restricts this syntax to a few window functions is just their usual
design tic of creating one-off syntax that could be much more general.
We've not hesitated to generalize in similar situations in the past.

The main thing I can see against that position is that it's not very
clear what to do if the window function has more than one window-ized
argument --- or at least, the only plausible interpretation would be
to ignore rows in which any of those arguments is null, which this
implementation is incapable of doing (since we don't know exactly
which arguments the function will try to treat as window-ized).
However, having a pronulltreatment column isn't helping that
situation at all: somebody could mark a multi-input window function
as ignoring nulls, and we'd silently do the wrong thing in any case
where those inputs weren't nulls at exactly the same rows.

My thought, therefore, is to drop pg_proc.pronulltreatment and instead
enforce an implementation restriction that when IGNORE NULLS is specified, 
WinGetFuncArgInPartition and WinGetFuncArgInFrame throw an error if
asked about any argument position other than the first one.  As long
as only the first argument is window-ized, the implementation you have
here will act correctly.  If anybody ever finds that annoying, they can
figure out how to relax the restriction at that time.

The need for a TREAT NULLS option to CREATE FUNCTION would thereby
also go away, which is good because I don't think this patch has
fully implemented that (notably, I don't see any pg_dump changes).

As far as the actual implementation goes:

* The undocumented relationship between "relpos" (which used to be
constant and now isn't) and "target" and "step" makes my head hurt.
I'm sure this could be redesigned to be simpler, or if not, at
least it should be commented a lot more thoroughly.

* I'm quite concerned about performance; it looks like this will
degrade to O(N^2) in practical situations, which isn't going to
satisfy anyone.  I think we need to track how many nulls we've
already seen so that we aren't re-visiting earlier rows over and
over.  That should make it possible to un-disable the set_mark
optimization, which is something that's independently catastrophic
for performance.  While I've not stopped to design this fully, maybe
we could keep state along the lines of "there are j rows with null
values of the window-ized argument before row k of the partition."
Updating that by dead reckoning as we navigate would be enough to
fix the O(N^2) problem for typical scenarios.  I think.

regards, tom lane




Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2021-01-11 Thread Tomas Vondra




On 1/11/21 10:00 PM, Anastasia Lubennikova wrote:

On 11.01.2021 01:35, Tomas Vondra wrote:

Hi,

I started looking at this patch again, hoping to get it committed in 
this CF, but I think there's a regression in handling TOAST tables 
(compared to the v3 patch submitted by Pavan in February 2019).


The test I'm running a very simple test (see test.sql):

1) start a transaction
2) create a table with a text column
3) copy freeze data into it
4) use pg_visibility to see how many blocks are all_visible both in the
   main table and it's TOAST table

For v3 patch (applied on top of 278584b526 and 
s/hi_options/ti_options) I get this:


   pages   NOT all_visible
  --
  main   637 0
  toast    50001 3

There was some discussion about relcache invalidations causing a 
couple TOAST pages not be marked as all_visible, etc.


However, for this patch on master I get this

   pages   NOT all_visible
  --
  main   637 0
  toast    50001 50001

So no pages in TOAST are marked as all_visible. I haven't investigated 
what's causing this, but IMO that needs fixing to make ths patch RFC.


Attached is the test script I'm using, and a v5 of the patch - rebased 
on current master, with some minor tweaks to comments etc.




Thank you for attaching the test script. I reproduced the problem. This 
regression occurs because TOAST internally uses heap_insert().

You have asked upthread about adding this optimization to heap_insert().

I wrote a quick fix, see the attached patch 0002. The TOAST test passes 
now, but I haven't tested performance or any other use-cases yet.

I'm going to test it properly in a couple of days and share results.



Thanks. I think it's important to make this work for TOAST tables - it 
often stores most of the data, and it was working in v3 of the patch. I 
haven't looked into the details, but if it's really just due to TOAST 
using heap_insert, I'd say it just confirms the importance of tweaking 
heap_insert too.


With this change a lot of new code is repeated in heap_insert() and 
heap_multi_insert(). I think it's fine, because these functions already 
have a lot in common.




Understood. IMHO a bit of redundancy is not a big issue, but I haven't 
looked at the code yet. Let's get it working first, then we can decide 
if some refactoring is appropriate.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Deleting older versions in unique indexes to avoid page splits

2021-01-11 Thread Victor Yegorov
пн, 11 янв. 2021 г. в 22:10, Peter Geoghegan :

> I imagine that this happened because you have track_io_timing=on in
> postgresql.conf. Doesn't the same failure take place with the current
> master branch?
>
> I have my own way of running the locally installed Postgres when I
> want "make installcheck" to pass that specifically accounts for this
> (and a few other similar things).
>

Oh, right, haven't thought of this. Thanks for pointing that out.

Now everything looks good!


-- 
Victor Yegorov


Re: Deleting older versions in unique indexes to avoid page splits

2021-01-11 Thread Peter Geoghegan
On Mon, Jan 11, 2021 at 12:19 PM Victor Yegorov  wrote:
> (see attached diff). It doesn't look like the fault of this patch, though.
>
> I suppose you plan to send another revision before committing this.
> Therefore I didn't perform any tests here, will wait for the next version.

I imagine that this happened because you have track_io_timing=on in
postgresql.conf. Doesn't the same failure take place with the current
master branch?

I have my own way of running the locally installed Postgres when I
want "make installcheck" to pass that specifically accounts for this
(and a few other similar things).

-- 
Peter Geoghegan




Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2021-01-11 Thread Anastasia Lubennikova

On 11.01.2021 01:35, Tomas Vondra wrote:

Hi,

I started looking at this patch again, hoping to get it committed in 
this CF, but I think there's a regression in handling TOAST tables 
(compared to the v3 patch submitted by Pavan in February 2019).


The test I'm running a very simple test (see test.sql):

1) start a transaction
2) create a table with a text column
3) copy freeze data into it
4) use pg_visibility to see how many blocks are all_visible both in the
   main table and it's TOAST table

For v3 patch (applied on top of 278584b526 and 
s/hi_options/ti_options) I get this:


   pages   NOT all_visible
  --
  main   637 0
  toast    50001 3

There was some discussion about relcache invalidations causing a 
couple TOAST pages not be marked as all_visible, etc.


However, for this patch on master I get this

   pages   NOT all_visible
  --
  main   637 0
  toast    50001 50001

So no pages in TOAST are marked as all_visible. I haven't investigated 
what's causing this, but IMO that needs fixing to make ths patch RFC.


Attached is the test script I'm using, and a v5 of the patch - rebased 
on current master, with some minor tweaks to comments etc.




Thank you for attaching the test script. I reproduced the problem. This 
regression occurs because TOAST internally uses heap_insert().

You have asked upthread about adding this optimization to heap_insert().

I wrote a quick fix, see the attached patch 0002. The TOAST test passes 
now, but I haven't tested performance or any other use-cases yet.

I'm going to test it properly in a couple of days and share results.

With this change a lot of new code is repeated in heap_insert() and 
heap_multi_insert(). I think it's fine, because these functions already 
have a lot in common.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 6ccaa4f526b38fbdd2f3a38ac3bd51e96fb140b6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 10 Jan 2021 20:30:29 +0100
Subject: [PATCH] Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE

Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the
visibility map. Until now it only marked individual tuples as frozen,
but page-level flags were not updated.

This is a fairly old patch, and multiple people worked on it. The first
version was written by Jeff Janes, and then reworked by Pavan Deolasee
and Anastasia Lubennikova.

Author: Pavan Deolasee, Anastasia Lubennikova, Jeff Janes
Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada, Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii
Discussion: https://postgr.es/m/caboikdn-ptgv0mzntrk2q8otfuuajqaymgmkdu1dckftuxv...@mail.gmail.com
Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com
---
 .../pg_visibility/expected/pg_visibility.out  | 64 +++
 contrib/pg_visibility/sql/pg_visibility.sql   | 77 +++
 src/backend/access/heap/heapam.c  | 76 --
 src/backend/access/heap/hio.c | 17 
 src/include/access/heapam_xlog.h  |  3 +
 5 files changed, 229 insertions(+), 8 deletions(-)

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index ca4b6e186b..0017e3415c 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+---+-+
+ 0 | t   | t
+ 1 | t   | t
+ 2 | t   | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+---+-+
+ 0 | f   | f
+ 1 | f   | f
+ 2 | t   | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate 

Re: Key management with tests

2021-01-11 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Mon, Jan 11, 2021 at 02:19:22PM -0500, Stephen Frost wrote:
> > outputs from the GCM encryption and is what provides the integrity /
> > authentication of the encrypted data to be able to detect if it's been
> > modified.  Unfortunately, while the page checksum will continue to be
> > used and available for checking against disk corruption, it's not
> > sufficient.  Hence, ideally, we'd find a spot to stick the 128-bit tag
> > on each page.
> 
> Agreed.  Would checksums be of any value with GCM?

The value would be to allow testing of the database integrity, to the
amount allowed by the checksum, to be done without having access to the
encryption keys, and because there's not much else we'd be using those
bits for if we didn't.

> > Given that, clearly, it's not possible to go from an unencrypted cluster
> > to an encrypted cluster without rewriting the entire cluster, we aren't
> > bound to maintain the on-disk page format, we should be able to
> > accomadate including the tag somewhere.  Unfortuantely, it doesn't seem
> > quite as trivial as I'd hoped since there are parts of the code which
> > make assumptions about the page beyond perhaps what they should be, but
> > I'm still hopeful that it won't be *too* hard to do.
> 
> OK, thanks.  Are there other page improvements we should make when we
> are requiring a page rewrite?

This is an interesting question but ultimately I don't think we should
be looking at this from the perspective of allowing arbitrary changes to
the page format.  The challenge is that much of the page format, today,
is defined by a C struct and changing the way that works would require a
great deal of code to be modified and turn this into a massive effort,
assuming we wish to have the same compiled binary able to work with both
unencrypted and encrypted clusters, which I do believe is a requirement.

The thought that I had was to, instead, try to figure out if we could
fudge some space by, say, putting a 128-bit 'hole' at the end of the
page and just move pd_special back, effectively making the page seem
'smaller' to all of the code that uses it, except for the code that
knows how to do the decryption.  I ran into some trouble with that but
haven't quite sorted out what happened yet.  Other ideas would be to put
it before pd_special, or maybe somewhere else, but a lot depends on the
code's expectations.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposal: Global Index

2021-01-11 Thread Bruce Momjian
On Mon, Jan 11, 2021 at 12:05:43PM -0800, Peter Geoghegan wrote:
> On Mon, Jan 11, 2021 at 11:25 AM Bruce Momjian  wrote:
> > Once you layer on all the places a global index will be worse than just
> > creating a single large table, or a partitioned table with an index per
> > child, there might not be much usefulness left.  A POC patch might tell
> > us that, and might allow us to mark it as "not wanted".
> 
> I'm confused. Of course it's true to some degree that having a global
> index "defeats the purpose" of having a partitioned table. But only to
> a degree. And for some users it will make the difference between using
> partitioning and not using partitioning -- they simply won't be able
> to tolerate not having it available (e.g. because of a requirement for
> a unique constraint that does not cover the partitioning key).

Yes, that is a good point.  For those cases, I think we need to look at
the code complexity/overhead of supporting that feature.  There are
going to be a few cases it is a win, but will the code complexity be
worth it?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Key management with tests

2021-01-11 Thread Bruce Momjian
On Mon, Jan 11, 2021 at 02:19:22PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Mon, Jan 11, 2021 at 01:23:27PM -0500, Stephen Frost wrote:
> > > Yes, and it avoids the issue of using a single key for too much, which
> > > is also a concern.  The remaining larger issues are to figure out a
> > > place to put the tag for each page, and the relatively simple matter of
> > > programming a mechanism to cache the keys we're commonly using (current
> > > key for encryption, recently used keys for decryption) since we'll
> > > eventually get to a point of having written out more data than we are
> > > going to keep keys in memory for.
> > 
> > I thought the LSN range would be stored with the keys, so there is no
> > need to tag the LSN on each page.
> 
> Yes, LSN range would be stored with the keys in some fashion (maybe just
> the start of a particular LSN range would be in the filename of the key
> for that range...).  The 'tag' that I'm referring to there is one of the

Oh, that tag, yes, we need to add that to each page.  I thought you mean
an LSN-range-key tag.

> outputs from the GCM encryption and is what provides the integrity /
> authentication of the encrypted data to be able to detect if it's been
> modified.  Unfortunately, while the page checksum will continue to be
> used and available for checking against disk corruption, it's not
> sufficient.  Hence, ideally, we'd find a spot to stick the 128-bit tag
> on each page.

Agreed.  Would checksums be of any value with GCM?

> Given that, clearly, it's not possible to go from an unencrypted cluster
> to an encrypted cluster without rewriting the entire cluster, we aren't
> bound to maintain the on-disk page format, we should be able to
> accomadate including the tag somewhere.  Unfortuantely, it doesn't seem
> quite as trivial as I'd hoped since there are parts of the code which
> make assumptions about the page beyond perhaps what they should be, but
> I'm still hopeful that it won't be *too* hard to do.

OK, thanks.  Are there other page improvements we should make when we
are requiring a page rewrite?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Deleting older versions in unique indexes to avoid page splits

2021-01-11 Thread Victor Yegorov
пн, 11 янв. 2021 г. в 01:07, Peter Geoghegan :

>
> Attached is v13, which has this tweak, and other miscellaneous cleanup
> based on review from both Victor and Heikki. I consider this version
> of the patch to be committable. I intend to commit something close to
> it in the next week, probably no later than Thursday. I still haven't
> got to the bottom of the shellsort question raised by Heikki. I intend
> to do further performance validation before committing the patch. I
> will look into the shellsort thing again as part of this final
> performance validation work -- perhaps I can get rid of the
> specialized shellsort implementation entirely, simplifying the state
> structs added to tableam.h. (As I said before, it seems best to
> address this last of all to avoid making performance validation even
> more complicated.)
>

I've checked this version quickly. It applies and compiles without issues.
`make check` and `make check-world` reported no issue.

But `make installcheck-world` failed on:
…
test explain  ... FAILED   22 ms
test event_trigger... ok  178 ms
test fast_default ... ok  262 ms
test stats... ok  586 ms


 1 of 202 tests failed.


(see attached diff). It doesn't look like the fault of this patch, though.

I suppose you plan to send another revision before committing this.
Therefore I didn't perform any tests here, will wait for the next version.


-- 
Victor Yegorov


20210111-v13-regression.diffs
Description: Binary data


Re: Proposal: Global Index

2021-01-11 Thread Peter Geoghegan
On Mon, Jan 11, 2021 at 11:25 AM Bruce Momjian  wrote:
> Once you layer on all the places a global index will be worse than just
> creating a single large table, or a partitioned table with an index per
> child, there might not be much usefulness left.  A POC patch might tell
> us that, and might allow us to mark it as "not wanted".

I'm confused. Of course it's true to some degree that having a global
index "defeats the purpose" of having a partitioned table. But only to
a degree. And for some users it will make the difference between using
partitioning and not using partitioning -- they simply won't be able
to tolerate not having it available (e.g. because of a requirement for
a unique constraint that does not cover the partitioning key).

-- 
Peter Geoghegan




Re: Proposal: Global Index

2021-01-11 Thread Bruce Momjian
On Mon, Jan 11, 2021 at 11:01:20AM -0800, Peter Geoghegan wrote:
> However, it probably would be okay if a global index feature performed
> poorly in scenarios where partitions get lots of UPDATEs that produce
> lots of index bloat and cause lots of LP_DEAD line pointers to
> accumulate in heap pages. It is probably reasonable to just expect
> users to not do that if they want to get acceptable performance while
> using a global index. Especially since it probably is not so bad if
> the index bloat situation gets out of hand for just one of the
> partitions (say the most recent one) every once in a while. You at
> least don't have the same crazy I/O multiplier effect that you
> described.

Once you layer on all the places a global index will be worse than just
creating a single large table, or a partitioned table with an index per
child, there might not be much usefulness left.  A POC patch might tell
us that, and might allow us to mark it as "not wanted".

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Proposal: Global Index

2021-01-11 Thread Bruce Momjian
On Mon, Jan 11, 2021 at 01:37:02PM -0500, Robert Haas wrote:
> However, there is a VACUUM amplification effect to worry about here
...
> That's not necessarily a death sentence for every use case, but it's
> going to be pretty bad for tables that are big and heavily updated.

Yeah, I had not really gotten that far in my thinking, but someone is
going to need to create a POC and then we need to test it to see if it
offers a reasonably valuable feature.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Key management with tests

2021-01-11 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Mon, Jan 11, 2021 at 01:23:27PM -0500, Stephen Frost wrote:
> > Yes, and it avoids the issue of using a single key for too much, which
> > is also a concern.  The remaining larger issues are to figure out a
> > place to put the tag for each page, and the relatively simple matter of
> > programming a mechanism to cache the keys we're commonly using (current
> > key for encryption, recently used keys for decryption) since we'll
> > eventually get to a point of having written out more data than we are
> > going to keep keys in memory for.
> 
> I thought the LSN range would be stored with the keys, so there is no
> need to tag the LSN on each page.

Yes, LSN range would be stored with the keys in some fashion (maybe just
the start of a particular LSN range would be in the filename of the key
for that range...).  The 'tag' that I'm referring to there is one of the
outputs from the GCM encryption and is what provides the integrity /
authentication of the encrypted data to be able to detect if it's been
modified.  Unfortunately, while the page checksum will continue to be
used and available for checking against disk corruption, it's not
sufficient.  Hence, ideally, we'd find a spot to stick the 128-bit tag
on each page.

Given that, clearly, it's not possible to go from an unencrypted cluster
to an encrypted cluster without rewriting the entire cluster, we aren't
bound to maintain the on-disk page format, we should be able to
accomadate including the tag somewhere.  Unfortuantely, it doesn't seem
quite as trivial as I'd hoped since there are parts of the code which
make assumptions about the page beyond perhaps what they should be, but
I'm still hopeful that it won't be *too* hard to do.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Key management with tests

2021-01-11 Thread Bruce Momjian
On Mon, Jan 11, 2021 at 01:23:27PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Mon, Jan 11, 2021 at 12:54:49PM -0500, Stephen Frost wrote:
> > > Although, another approach and one that I've discussed a bit with Bruce,
> > > is to have more keys- such as a key for temporary files, and perhaps
> > > even a key for logged relations and a different for unlogged..  Or
> > 
> > Yes, we have to make sure the nonce (computed as LSN/pageno) is never
> > reused, so if we have several LSN usage "spaces", they need different
> > data keys. 
> 
> Right, or ensure that the actual IV used is distinct (such as by using
> another bit in the IV to distinguish logged-vs-unlogged), but it seems
> saner to just use a different key, ultimately.

Yes, we have eight unused bit in the Nonce right now.

> > > perhaps sets of keys for each which automatically are rotating every X
> > > number of GB based on the LSN...  Which is a big part of why key
> > > management is such an important part of this effort.
> > 
> > Yes, this would avoid the need to failover to a standby for data key
> > rotation.
> 
> Yes, and it avoids the issue of using a single key for too much, which
> is also a concern.  The remaining larger issues are to figure out a
> place to put the tag for each page, and the relatively simple matter of
> programming a mechanism to cache the keys we're commonly using (current
> key for encryption, recently used keys for decryption) since we'll
> eventually get to a point of having written out more data than we are
> going to keep keys in memory for.

I thought the LSN range would be stored with the keys, so there is no
need to tag the LSN on each page.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Proposal: Global Index

2021-01-11 Thread Peter Geoghegan
On Mon, Jan 11, 2021 at 10:37 AM Robert Haas  wrote:
> I actually think the idea of lazily deleting the index entries is
> pretty good, but it won't work if the way the global index is
> implemented is by adding a tableoid column.

Perhaps there is an opportunity to apply some of the infrastructure
that Masahiko Sawada has been working on, that makes VACUUM more
incremental in certain specific scenarios:

https://postgr.es/m/cad21aod0ske11fmw4jd4renawbmcw1wasvnwpjvw3tvqpoq...@mail.gmail.com

I think that VACUUM can be taught to skip the ambulkdelete() step for
indexes in many common scenarios. Global indexes might be one place in
which that's almost essential.

> However, there is a VACUUM amplification effect to worry about here
> which Wenjing seems not to be considering.

> That's not necessarily a death sentence for every use case, but it's
> going to be pretty bad for tables that are big and heavily updated.

The main way in which index vacuuming is currently a death sentence
for this design (as you put it) is that it's an all-or-nothing thing.
Presumably you'll need to VACUUM the entire global index for each
partition that receives even one UPDATE. That seems pretty extreme,
and probably not acceptable. In a way it's not really a new problem,
but the fact remains: it makes global indexes much less valuable.

However, it probably would be okay if a global index feature performed
poorly in scenarios where partitions get lots of UPDATEs that produce
lots of index bloat and cause lots of LP_DEAD line pointers to
accumulate in heap pages. It is probably reasonable to just expect
users to not do that if they want to get acceptable performance while
using a global index. Especially since it probably is not so bad if
the index bloat situation gets out of hand for just one of the
partitions (say the most recent one) every once in a while. You at
least don't have the same crazy I/O multiplier effect that you
described.

-- 
Peter Geoghegan




Re: Proposal: Global Index

2021-01-11 Thread Robert Haas
On Mon, Jan 11, 2021 at 12:46 PM Bruce Momjian  wrote:
> > For 1) The DETACH old child table can be finished immediately, global index 
> > can be kept valid after DETACH is completed, and the cleanup of garbage 
> > data in global index can be deferred to VACUUM.
> > This is similar to the global index optimization done by Oracle12c.
> > For 2) ATTACH new empty child table can also be completed immediately.
> > If this is the case, many of the advantages of partitioned tables will be 
> > retained, while the advantages of global indexes will be gained.
>
> Yes, we can keep the index rows for the deleted partition and clean them
> up later, but what is the advantage of partitioning then?  Just heap
> deletion quickly?  Is that enough of a value?

I actually think the idea of lazily deleting the index entries is
pretty good, but it won't work if the way the global index is
implemented is by adding a tableoid column. Because then, I might
detach a partition and later reattach it and the old index entries are
still there but the table contents might have changed. Worse yet, the
table might be dropped and the table OID reused for a completely
unrelated table with completely unrelated contents, which could then
be attached as a new partition.

One of the big selling points of global indexes is that they allow you
to enforce uniqueness on a column unrelated to the partitioning
column. Another is that you can look up a value by doing a single
index scan on the global index rather than an index scan per
partition. Those things are no less valuable for performing index
deletion lazily.

However, there is a VACUUM amplification effect to worry about here
which Wenjing seems not to be considering. Suppose I have a table
which is not partitioned and it is 1TB in size with an index that is
128GB in size. To vacuum the table, I need to do 1TB + 128GB of I/O.
Now, suppose I now partition the table into 1024 partitions each with
its own local index. Each partition is 1GB in size and the index on
each partition is 128MB in size. To vacuum an individual partition
requires 1GB + 128MB of I/O, so to vacuum all the partitions requires
the same amount of total I/O as before. But, now suppose that I have a
single global index instead of a local index per partition. First, how
big will that index be? It will not be 128GB, but somewhat bigger,
because it needs extra space for every indexed tuple. Let's say 140GB.
Furthermore, it will need to be vacuumed whenever any child is
vacuumed, because it contains some index entries from every child. So
the total I/O to vacuum all partitions is now 1GB * 1024 + 140GB *
1024 = 141TB, which is a heck of a lot worse than the 1.125TB I
required with the unpartitioned table or the locally partitioned
table.

That's not necessarily a death sentence for every use case, but it's
going to be pretty bad for tables that are big and heavily updated.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Proposal: Global Index

2021-01-11 Thread Tom Lane
Bruce Momjian  writes:
> On Mon, Jan 11, 2021 at 07:40:18PM +0800, 曾文旌 wrote:
>> This is indeed a typical scenario for a partitioned table.
>> there are two basic operations
>> 1) Monthly DETACH old child table
>> 2) Monthly ATTACH new child table
>> 
>> For 1) The DETACH old child table can be finished immediately, global index 
>> can be kept valid after DETACH is completed, and the cleanup of garbage data 
>> in global index can be deferred to VACUUM.

> Yes, we can keep the index rows for the deleted partition and clean them
> up later, but what is the advantage of partitioning then?  Just heap
> deletion quickly?  Is that enough of a value?

More to the point, you still have a massive index cleanup operation to do.
Deferred or not, that's going to take a lot of cycles, and it will leave
you with a bloated global index.  I find it hard to believe that this
approach will seem like an improvement over doing partitioning the way
we do it now.

regards, tom lane




Re: Key management with tests

2021-01-11 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Mon, Jan 11, 2021 at 12:54:49PM -0500, Stephen Frost wrote:
> > Although, another approach and one that I've discussed a bit with Bruce,
> > is to have more keys- such as a key for temporary files, and perhaps
> > even a key for logged relations and a different for unlogged..  Or
> 
> Yes, we have to make sure the nonce (computed as LSN/pageno) is never
> reused, so if we have several LSN usage "spaces", they need different
> data keys. 

Right, or ensure that the actual IV used is distinct (such as by using
another bit in the IV to distinguish logged-vs-unlogged), but it seems
saner to just use a different key, ultimately.

> > perhaps sets of keys for each which automatically are rotating every X
> > number of GB based on the LSN...  Which is a big part of why key
> > management is such an important part of this effort.
> 
> Yes, this would avoid the need to failover to a standby for data key
> rotation.

Yes, and it avoids the issue of using a single key for too much, which
is also a concern.  The remaining larger issues are to figure out a
place to put the tag for each page, and the relatively simple matter of
programming a mechanism to cache the keys we're commonly using (current
key for encryption, recently used keys for decryption) since we'll
eventually get to a point of having written out more data than we are
going to keep keys in memory for.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [POC] Fast COPY FROM command for the table with foreign partitions

2021-01-11 Thread Tomas Vondra
Hi Andrey,

Unfortunately, this no longer applies :-( I tried to apply this on top
of c532d15ddd (from 2020/12/30) but even that has non-trivial conflicts.

Can you send a rebased version?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Key management with tests

2021-01-11 Thread Bruce Momjian
On Mon, Jan 11, 2021 at 12:54:49PM -0500, Stephen Frost wrote:
> Although, another approach and one that I've discussed a bit with Bruce,
> is to have more keys- such as a key for temporary files, and perhaps
> even a key for logged relations and a different for unlogged..  Or

Yes, we have to make sure the nonce (computed as LSN/pageno) is never
reused, so if we have several LSN usage "spaces", they need different
data keys. 

> perhaps sets of keys for each which automatically are rotating every X
> number of GB based on the LSN...  Which is a big part of why key
> management is such an important part of this effort.

Yes, this would avoid the need to failover to a standby for data key
rotation.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Key management with tests

2021-01-11 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Mon, Jan 11, 2021 at 08:12:00PM +0900, Masahiko Sawada wrote:
> > Looking at the patch, it supports three algorithms but only
> > PG_CIPHER_AES_KWP is used in the core for now:
> > 
> > +/*
> > + * Supported symmetric encryption algorithm. These identifiers are passed
> > + * to pg_cipher_ctx_create() function, and then actual encryption
> > + * implementations need to initialize their context of the given encryption
> > + * algorithm.
> > + */
> > +#define PG_CIPHER_AES_GCM  0
> > +#define PG_CIPHER_AES_KW   1
> > +#define PG_CIPHER_AES_KWP  2
> > +#define PG_MAX_CIPHER_ID   3
> > 
> > Are we in the process of experimenting which algorithms are better? If
> > we support one algorithm that is actually used in the core, we would
> > reduce the tests as well.
> 
> I think we are only using KWP (Key Wrap with Padding) because that is
> for wrapping keys:
> 
>   
> https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Algorithm-Validation-Program/documents/mac/KWVS.pdf

Yes.

> I am not sure about KW.  I think we are using GCM for the WAP/heap/index
> pages.  Stephen would know more.

KW was more-or-less 'for free' and there were tests for it, which is why
it was included.  Yes, GCM would be for WAL/heap/index pages, it
wouldn't be appropriate to use KW or KWP for that.  Using KW/KWP for the
key wrapping also makes the API simpler- and therefore easier for other
implementations to be written which provide the same API.

> > FWIW, I've written a PoC patch for buffer encryption to make sure the
> > kms patch would be workable with other components using the encryption
> > key managed by kmgr.
> 
> Wow, it is a small patch --- nice.

I agree that the actual encryption patch, for just the main heap/index,
won't be too bad.  The larger part will be dealing with all of the
temporary files we create that have user data in them...  I've been
contemplating a way to try and make that part of the patch smaller
though and hopefully that will bear fruit and we can avoid having to
change a lof of, eg, reorderbuffer.c and pgstat.c.

There's a few places where we need to be sure to be updating the LSN for
both logged and unlogged relations properly, including dealing with
things like the magic GIST "GistBuildLSN" fake-LSN too, and we will
absolutely need to have a bit used in the IV to distinguish if it's a
real LSN or an unlogged LSN.

Although, another approach and one that I've discussed a bit with Bruce,
is to have more keys- such as a key for temporary files, and perhaps
even a key for logged relations and a different for unlogged..  Or
perhaps sets of keys for each which automatically are rotating every X
number of GB based on the LSN...  Which is a big part of why key
management is such an important part of this effort.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposal: Global Index

2021-01-11 Thread Bruce Momjian
On Mon, Jan 11, 2021 at 07:40:18PM +0800, 曾文旌 wrote:
> >> In addition you mentioned: "It is still unclear if these use-cases justify 
> >> the architectural changes needed to enable global indexes."
> >> Please also describe the problems you see, I will confirm each specific 
> >> issue one by one.
> > 
> > One example is date partitioning.  People frequently need to store
> > only the most recent data.  For instance doing a monthly partitioning
> > and dropping the oldest partition every month once you hit the wanted
> > retention is very efficient for that use case, as it should be almost
> > instant (provided that you can acquire the necessary locks
> > immediately).  But if you have a global index, you basically lose the
> > advantage of partitioning as it'll require heavy changes on that
> > index.
> If the global index removes all the major benefits of partitioned tables, 
> then it is not worth having it.
> 
> This is indeed a typical scenario for a partitioned table.
> there are two basic operations
> 1) Monthly DETACH old child table
> 2) Monthly ATTACH new child table
> 
> For 1) The DETACH old child table can be finished immediately, global index 
> can be kept valid after DETACH is completed, and the cleanup of garbage data 
> in global index can be deferred to VACUUM.
> This is similar to the global index optimization done by Oracle12c.
> For 2) ATTACH new empty child table can also be completed immediately.
> If this is the case, many of the advantages of partitioned tables will be 
> retained, while the advantages of global indexes will be gained.

Yes, we can keep the index rows for the deleted partition and clean them
up later, but what is the advantage of partitioning then?  Just heap
deletion quickly?  Is that enough of a value?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: libpq compression

2021-01-11 Thread Tomas Vondra
On 1/11/21 2:53 PM, Konstantin Knizhnik wrote:
> 
> ...
> 
> New version of libpq compression patch is attached.
> It can be also be found at g...@github.com:postgrespro/libpq_compression.git
> 

Seems it bit-rotted already, so here's a slightly fixed version.

1) Fixes the MSVC makefile. The list of files is sorted alphabetically,
so I've added the file at the end.

2) Fixes duplicate OID. It's a good practice to assign OIDs from the end
of the range, to prevent collisions during development.


Other than that, I wonder what's the easiest way to run all tests with
compression enabled. ISTM it'd be nice to add pg_regress option forcing
a particular compression algorithm to be used, or something similar. I'd
like a convenient way to pass this through a valgrind, for example. Or
how do we get this tested on a buildfarm?

I'm not convinced it's very user-friendly to not have a psql option
enabling compression. It's true it can be enabled in a connection
string, but I doubt many people will notice that.

The sgml docs need a bit more love / formatting. The lines in libpq.sgml
are far too long, and there are no tags whatsoever. Presumably zlib/zstd
should be marked as , and so on.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 14690d8a877244fbc839f97274ccf7d84e971e71 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 11 Jan 2021 18:01:11 +0100
Subject: [PATCH] v29

---
 configure |  81 +++
 configure.ac  |  21 +
 .../postgres_fdw/expected/postgres_fdw.out|   2 +-
 doc/src/sgml/config.sgml  |  16 +
 doc/src/sgml/libpq.sgml   |  25 +
 doc/src/sgml/protocol.sgml|  93 +++
 src/Makefile.global.in|   1 +
 src/backend/Makefile  |   8 +
 src/backend/catalog/system_views.sql  |   9 +
 src/backend/libpq/pqcomm.c| 247 +--
 src/backend/postmaster/pgstat.c   |  30 +
 src/backend/postmaster/postmaster.c   |  10 +
 src/backend/utils/adt/pgstatfuncs.c   |  50 +-
 src/backend/utils/misc/guc.c  |  10 +
 src/common/Makefile   |   3 +-
 src/common/zpq_stream.c   | 684 ++
 src/include/catalog/pg_proc.dat   |  18 +-
 src/include/common/zpq_stream.h   |  94 +++
 src/include/libpq/libpq-be.h  |   3 +
 src/include/libpq/libpq.h |   1 +
 src/include/libpq/pqcomm.h|   1 +
 src/include/pg_config.h.in|   3 +
 src/include/pgstat.h  |   7 +
 src/interfaces/libpq/Makefile |  14 +
 src/interfaces/libpq/fe-connect.c |  92 ++-
 src/interfaces/libpq/fe-exec.c|   4 +-
 src/interfaces/libpq/fe-misc.c|  55 +-
 src/interfaces/libpq/fe-protocol3.c   | 165 -
 src/interfaces/libpq/libpq-int.h  |  21 +
 src/test/regress/expected/rules.out   |  14 +-
 src/tools/msvc/Mkvcbuild.pm   |   2 +-
 31 files changed, 1701 insertions(+), 83 deletions(-)
 create mode 100644 src/common/zpq_stream.c
 create mode 100644 src/include/common/zpq_stream.h

diff --git a/configure b/configure
index b917a2a1c9..feb9420106 100755
--- a/configure
+++ b/configure
@@ -699,6 +699,7 @@ LD
 LDFLAGS_SL
 LDFLAGS_EX
 with_zlib
+with_zstd
 with_system_tzdata
 with_libxslt
 XML2_LIBS
@@ -866,6 +867,7 @@ with_libxml
 with_libxslt
 with_system_tzdata
 with_zlib
+with_zstd
 with_gnu_ld
 enable_largefile
 '
@@ -8571,6 +8573,85 @@ fi
 
 
 
+#
+# ZStd
+#
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+  withval=$with_zstd;
+  case $withval in
+yes)
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  with_zstd=no
+
+fi
+
+
+
+
+if test "$with_zstd" = yes ; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_zstd_ZSTD_compress=yes
+else
+  ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext

Re: Key management with tests

2021-01-11 Thread Bruce Momjian
On Mon, Jan 11, 2021 at 08:12:00PM +0900, Masahiko Sawada wrote:
> On Sun, Jan 10, 2021 at 11:51 PM Bruce Momjian  wrote:
> > OK, here they are with numeric prefixes.  It was actually tricky to
> > figure out how to create a squashed format-patch based on another branch.
> 
> Thank you for attaching the patches. It passes all cfbot tests, great.

Yeah, I saw that.  :-)  I head to learn a lot about how to create
squashed format-patches on non-master branches.  I have now automated it
so it will be easy going forward.

> Looking at the patch, it supports three algorithms but only
> PG_CIPHER_AES_KWP is used in the core for now:
> 
> +/*
> + * Supported symmetric encryption algorithm. These identifiers are passed
> + * to pg_cipher_ctx_create() function, and then actual encryption
> + * implementations need to initialize their context of the given encryption
> + * algorithm.
> + */
> +#define PG_CIPHER_AES_GCM  0
> +#define PG_CIPHER_AES_KW   1
> +#define PG_CIPHER_AES_KWP  2
> +#define PG_MAX_CIPHER_ID   3
> 
> Are we in the process of experimenting which algorithms are better? If
> we support one algorithm that is actually used in the core, we would
> reduce the tests as well.

I think we are only using KWP (Key Wrap with Padding) because that is
for wrapping keys:


https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Algorithm-Validation-Program/documents/mac/KWVS.pdf

I am not sure about KW.  I think we are using GCM for the WAP/heap/index
pages.  Stephen would know more.

> FWIW, I've written a PoC patch for buffer encryption to make sure the
> kms patch would be workable with other components using the encryption
> key managed by kmgr.

Wow, it is a small patch --- nice.
 
-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Moving other hex functions to /common

2021-01-11 Thread Bruce Momjian
On Mon, Jan 11, 2021 at 04:45:14PM +0900, Michael Paquier wrote:
> On Wed, Jan 06, 2021 at 08:58:23AM -0500, Bruce Momjian wrote:
> > Fine.  Do you want to add the overflow to the patch I posted, for all
> > encoding types?
> 
> Yeah.  It looks that it would be good to be consistent as well for
> escape case, so as it is possible to add a dstlen argument to struct
> pg_encoding for the encoding and decoding routines.  I would also

Sure.

> prefer the option to remove the argument "data" from the encode and
> decode length routines for the hex routines part of src/common/, even
> if it forces the creation of two small wrappers in encode.c to call
> the routines of src/common/.  Would you prefer if I send a patch by

Agreed.  Having an argument that does nothing is odd.

> myself?  Please note that anything I'd send would use directly elog()
> and pg_log() instead of returning status codes for the src/common/
> routines, and of course not touch ECPG, as that's the approach you are
> favoring.

Sure, I realize the elog/pg_log is odd, but the alternatives seem worse.
You can take ownership of my hex patch and just add to it.  I know you
already did the hex length part, and have other ideas of what you want.

My key management patch needs the hex encoding in /common, so it will
wait for the application of your patch.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Add Nullif case for eval_const_expressions_mutator

2021-01-11 Thread Tom Lane
"Hou, Zhijie"  writes:
> I notice that there are no Nullif case in eval_const_expression.
> Since Nullif is similar to Opexpr and is easy to implement,
> I try add this case in eval_const_expressions_mutator.

I think this patch should be about a tenth the size.  Try modeling
it on the T_SubscriptingRef-etc case, ie, use ece_generic_processing
and then ece_evaluate_expr to cover the generic cases.  OpExpr is
common enough to deserve specially optimized code, but NullIf isn't,
so shorter is better.

regards, tom lane




Re: popcount

2021-01-11 Thread David Fetter
On Mon, Jan 11, 2021 at 03:50:54PM +0100, Peter Eisentraut wrote:
> On 2020-12-30 17:41, David Fetter wrote:
> > > The input may have more than 2 billion bits set to 1. The biggest possible
> > > result should be 8 billion for bytea (1 GB with all bits set to 1).
> > > So shouldn't this function return an int8?
> > It does now, and thanks for looking at this.
> 
> The documentation still reflects the previous int4 return type (in two
> different spellings, too).

Thanks for looking this over!

Please find attached the next version with corrected documentation.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From f0f8e0639a4d08db7d454d5181aa9d98d31264a3 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 30 Dec 2020 02:51:46 -0800
Subject: [PATCH v3] popcount
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.29.2"

This is a multi-part message in MIME format.
--2.29.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Now it's accessible to SQL for the BIT VARYING and BYTEA types.

diff --git doc/src/sgml/func.sgml doc/src/sgml/func.sgml
index 02a37658ad..1d86d610dd 100644
--- doc/src/sgml/func.sgml
+++ doc/src/sgml/func.sgml
@@ -4069,6 +4069,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');

   
 
+  
+   
+
+ popcount
+
+popcount ( bytes bytea )
+bigint
+   
+   
+Counts the number of bits set in a binary string.
+   
+   
+popcount('\xdeadbeef'::bytea)
+24
+   
+  
+
   

 
@@ -4830,6 +4847,24 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 101010001010101010

   
+
+  
+   
+
+ popcount
+
+popcount ( bits bit )
+bigint
+   
+   
+Counts the bits set in a bit string.
+   
+   
+popcount(B'101010101010101010')
+9
+   
+  
+
  
 

diff --git src/include/catalog/pg_proc.dat src/include/catalog/pg_proc.dat
index d7b55f57ea..2d53e0d46d 100644
--- src/include/catalog/pg_proc.dat
+++ src/include/catalog/pg_proc.dat
@@ -1446,6 +1446,9 @@
 { oid => '752', descr => 'substitute portion of string',
   proname => 'overlay', prorettype => 'bytea',
   proargtypes => 'bytea bytea int4', prosrc => 'byteaoverlay_no_len' },
+{ oid => '8436', descr => 'count set bits',
+  proname => 'popcount', prorettype => 'int8', proargtypes => 'bytea',
+  prosrc => 'byteapopcount'},
 
 { oid => '725',
   proname => 'dist_pl', prorettype => 'float8', proargtypes => 'point line',
@@ -3865,6 +3868,9 @@
 { oid => '3033', descr => 'set bit',
   proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4',
   prosrc => 'bitsetbit' },
+{ oid => '8435', descr => 'count set bits',
+  proname => 'popcount', prorettype => 'int8', proargtypes => 'bit',
+  prosrc => 'bitpopcount'},
 
 # for macaddr type support
 { oid => '436', descr => 'I/O',
diff --git src/backend/utils/adt/varbit.c src/backend/utils/adt/varbit.c
index 2235866244..a6a44f3f4f 100644
--- src/backend/utils/adt/varbit.c
+++ src/backend/utils/adt/varbit.c
@@ -36,6 +36,7 @@
 #include "libpq/pqformat.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
+#include "port/pg_bitutils.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/varbit.h"
@@ -1878,3 +1879,29 @@ bitgetbit(PG_FUNCTION_ARGS)
 	else
 		PG_RETURN_INT32(0);
 }
+
+/*
+ * bitpopcount
+ *
+ * Returns the number of bits set in a bit array.
+ *
+ */
+Datum
+bitpopcount(PG_FUNCTION_ARGS)
+{
+	VarBit		*arg1 = PG_GETARG_VARBIT_P(0);
+	bits8		*p;
+	int			len;
+	int8		popcount;
+
+	/*
+	 * ceil(VARBITLEN(ARG1)/BITS_PER_BYTE)
+	 * done to minimize branches and instructions.
+	 */
+	len = (VARBITLEN(arg1) + BITS_PER_BYTE + 1) / BITS_PER_BYTE;
+	p = VARBITS(arg1);
+
+	popcount = pg_popcount((const char *)p, len);
+
+	PG_RETURN_INT64(popcount);
+}
diff --git src/backend/utils/adt/varlena.c src/backend/utils/adt/varlena.c
index 4838bfb4ff..da3ba769c4 100644
--- src/backend/utils/adt/varlena.c
+++ src/backend/utils/adt/varlena.c
@@ -3433,6 +3433,22 @@ bytea_overlay(bytea *t1, bytea *t2, int sp, int sl)
 	return result;
 }
 
+/*
+ * popcount
+ */
+Datum
+byteapopcount(PG_FUNCTION_ARGS)
+{
+	bytea	*t1 = PG_GETARG_BYTEA_PP(0);
+	int		len;
+	int8	result;
+
+	len = VARSIZE_ANY_EXHDR(t1);
+	result = pg_popcount(VARDATA_ANY(t1), len);
+
+	PG_RETURN_INT64(result);
+}
+
 /*
  * byteapos -
  *	  Return the position of the specified substring.
diff --git src/test/regress/expected/bit.out src/test/regress/expected/bit.out
index a7f95b846d..228f992ced 100644
--- src/test/regress/expected/bit.out
+++ src/test/regress/expected/bit.out
@@ -710,6 +710,19 @@ SELECT overlay(B'0101011100' placing '001' from 20);
  

Re: pg_upgrade test for binary compatibility of core data types

2021-01-11 Thread Bruce Momjian
On Mon, Jan 11, 2021 at 03:28:08PM +0100, Peter Eisentraut wrote:
> On 2020-12-27 20:07, Justin Pryzby wrote:
> > rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f
> 
> I think these patches could use some in-place documentation of what they are
> trying to achieve and how they do it.  The required information is spread
> over a lengthy thread.  No one wants to read that.  Add commit messages to
> the patches.

Oh, I see that now, and agree that you need to explain each item with a
comment.  pg_upgrade is doing some odd things, so documenting everything
it does is a big win.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Multiple hosts in connection string failed to failover in non-hot standby mode

2021-01-11 Thread Tom Lane
Hubert Zhang  writes:
> As for patch 0004, find ':' after "could not connect to" may failed when 
> error message like:
> "could not connect to host "localhost" (::1), port 12345: Connection 
> refused", where p2 will point to "::1" instead of ": Connection refused". But 
> since it's only used for test case, we don't need to filter the error message 
> precisely.

Excellent point, and I think that could happen on a Windows installation.
We can make it look for ": " instead of just ':', and that'll reduce the
odds of trouble.

regards, tom lane




Re: popcount

2021-01-11 Thread Peter Eisentraut

On 2020-12-30 17:41, David Fetter wrote:

The input may have more than 2 billion bits set to 1. The biggest possible
result should be 8 billion for bytea (1 GB with all bits set to 1).
So shouldn't this function return an int8?

It does now, and thanks for looking at this.


The documentation still reflects the previous int4 return type (in two 
different spellings, too).





Re: Multiple hosts in connection string failed to failover in non-hot standby mode

2021-01-11 Thread Hubert Zhang
Hi Tom,

I agree to get detailed error message for each failed host as your patch 0001.

As for patch 0004, find ':' after "could not connect to" may failed when error 
message like:
"could not connect to host "localhost" (::1), port 12345: Connection refused", 
where p2 will point to "::1" instead of ": Connection refused". But since it's 
only used for test case, we don't need to filter the error message precisely.

```
ecpg_filter_stderr(const char *resultfile, const char *tmpfile)
{
..
char   *p1 = strstr(linebuf.data, "could not connect to ");
if (p1)
{
char   *p2 = strchr(p1, ':');
if (p2)
memmove(p1 + 17, p2, strlen(p2) + 1);
}
}
```

Thanks,
Hubert


From: Tom Lane 
Sent: Monday, January 11, 2021 10:56 AM
To: Hubert Zhang 
Cc: tsunakawa.ta...@fujitsu.com ; 
pgsql-hack...@postgresql.org 
Subject: Re: Multiple hosts in connection string failed to failover in non-hot 
standby mode

I wrote:
> I feel pretty good about 0001: it might be committable as-is.  0002 is
> probably subject to bikeshedding, plus it has a problem in the ECPG tests.
> Two of the error messages are now unstable because they expose
> chosen-at-random socket paths:
> ...
> I don't have any non-hacky ideas what to do about that.  The extra detail
> seems useful to end users, but we don't have any infrastructure that
> would allow filtering it out in the ECPG tests.

So far the only solution that comes to mind is to introduce some
infrastructure to do that filtering.  0001-0003 below are unchanged,
0004 patches up the ecpg test framework with a rather ad-hoc filtering
function.  I'd feel worse about this if there weren't already a very
ad-hoc filtering function there ;-)

This set passes check-world for me; we'll soon see what the cfbot
thinks.

regards, tom lane



Re: multi-install PostgresNode

2021-01-11 Thread Peter Eisentraut

On 2020-12-20 18:09, Andrew Dunstan wrote:

On 12/19/20 11:19 AM, Andrew Dunstan wrote:

This turns out to be remarkably short, with the use of a little eval magic.

Give the attached, this test program works just fine:

 #!/bin/perl
 use PostgresNodePath;
 $ENV{PG_REGRESS} =
 '/home/andrew/pgl/vpath.12/src/test/regress/pg_regress';
 my $node = get_new_node('/home/andrew/pgl/inst.12.5711','blurfl');
 print $node->info;
 print $node->connstr(),"\n";
 $node->init();



This time with a typo removed.


What is proposed the destination of this file?  Is it meant to be part 
of a patch?  Is it meant to be installed?  Is it meant for the buildfarm 
code?





Re: pg_upgrade test for binary compatibility of core data types

2021-01-11 Thread Peter Eisentraut

On 2020-12-27 20:07, Justin Pryzby wrote:

On Wed, Dec 16, 2020 at 11:22:23AM -0600, Justin Pryzby wrote:

On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote:

I meant to notice if the binary format is accidentally changed again, which was
what happened here:
7c15cef86 Base information_schema.sql_identifier domain on name, not varchar.

I added a table to the regression tests so it's processed by pg_upgrade tests,
run like:

| time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 
oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin


Per cfbot, this avoids testing ::xml (support for which may not be enabled)
And also now tests oid types.

I think the per-version hacks should be grouped by logical change, rather than
by version.  Which I've started doing here.


rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f


I think these patches could use some in-place documentation of what they 
are trying to achieve and how they do it.  The required information is 
spread over a lengthy thread.  No one wants to read that.  Add commit 
messages to the patches.





Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion

2021-01-11 Thread Peter Eisentraut

On 2021-01-05 10:56, Masahiko Sawada wrote:

BTW according to the documentation, the options of DECLARE statement
(BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive.

DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
 CURSOR [ { WITH | WITHOUT } HOLD ] FOR query

But I realized that these options are actually order-insensitive. For
instance, we can declare a cursor like:

=# declare abc scroll binary cursor for select * from pg_class;
DECLARE CURSOR

The both parser code and documentation has been unchanged from 2003.
Is it a documentation bug?


According to the SQL standard, the ordering of the cursor properties is 
fixed.  Even if the PostgreSQL parser offers more flexibility, I think 
we should continue to encourage writing the clauses in the standard order.





Re: libpq compression

2021-01-11 Thread Konstantin Knizhnik



On 09.01.2021 23:31, Justin Pryzby wrote:

On Thu, Dec 17, 2020 at 05:54:28PM +0300, Konstantin Knizhnik wrote:

I am maintaining this code in
g...@github.com:postgrespro/libpq_compression.git repository.
I will be pleased if anybody, who wants to suggest any bug
fixes/improvements of libpq compression, create pull requests: it will be
much easier for me to merge them.

Thanks for working on this.
I have a patch for zstd compression in pg_dump so I looked at your patch.
I'm attaching some language fixes.


Thank you very much.
I applied your patch on top of pull request of Daniil Zakhlystov who has 
implemented support of using different compressors in different direction.


Frankly speaking I still very skeptical concerning too much flexibility 
in compression configuration:

- toggle compression on the fly
- using different compression algorithms in both directions
- toggle compression on the fly

According to Daniil's results there is only 30% differences in 
compression ration between zstd:1 and zstd:19.
But making it possible to specify arbitrary compression level we give 
user of a simple tool to attack server (cause CPU/memory exhaustion).





+zstd_create(int level, zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg, 
char* rx_data, size_t rx_data_size)
+zlib_create(int level, zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg, 
char* rx_data, size_t rx_data_size)
+build_compressors_list(PGconn *conn, char** client_compressors, bool 
build_descriptors)

Are you able to run pg_indent to fix all the places where "*" is before the
space ?  (And similar style issues).


Also done by Daniil.



There are several compression patches in the commitfest, I'm not sure how much
they need to be coordinated, but for sure we should coordinate the list of
compressions available at compile time.

Maybe there should be a central structure for this, or maybe just the ID
numbers of compressors should be a common enum/define.  In my patch, I have:

+struct compressLibs {
+   const CompressionAlgorithm alg;
+   const char  *name;  /* Name in -Z alg= */
+   const char  *suffix;/* file extension */
+   const int   defaultlevel;   /* Default compression level */
+};

Maybe we'd also want to store the "magic number" of each compression library.
Maybe there'd also be a common parsing of compression options.

You're supporting a syntax like zlib,zstd:5, but zstd also supports long-range,
checksum, and rsyncable modes (rsyncable is relevant to pg_dump, but not to
libpq).


There are at least three places in Postgres where compression is used 
right :

1. TOAST and extened attributes compression.
2. pg_basebackup compression
3. pg_dump compression

And there are also patches for
4. page compression
5. protocol compression


It is awful that all this five places are using compression in their own 
way.
It seems to me that compression (as well as all other system dependent 
stuff as socket IO, file IO, synchronization primitives,...) should be 
extracted to SAL (system-abstract layer)
where then can be used both by backend, frontend and utilities. 
Including external utilities, like pg_probackup, pg_bouncer, Odyssey, ...
Unfortunately such refactoring requires so much efforts, that it can 
have any chance to be committed if this work will be coordinated by one 
of the core committers.




I think your patch has an issue here.  You have this:

src/interfaces/libpq/fe-connect.c

+pqGetc(, conn);
+index = resp;
+if (index == (char)-1)
+{
+appendPQExpBuffer(>errorMessage,
+  libpq_gettext(
+  "server is not 
supported requested compression algorithms %s\n"),
+  conn->compression);
+goto error_return;
+}
+Assert(!conn->zstream);
+conn->zstream = zpq_create(conn->compressors[index].impl,
+   
conn->compressors[index].level,
+   
(zpq_tx_func)pqsecure_write, (zpq_rx_func)pqsecure_read, conn,
+   
>inBuffer[conn->inCursor], conn->inEnd-conn->inCursor);

This takes the "index" returned by the server and then accesses
conn->compressors[index] without first checking if the index is out of range,
so a malicious server could (at least) crash the client by returning index=666.


Thank you for pointed this problem. I will add the check, although I 
think that problem of malicious server
is less critical than malicious client. Also such "byzantine" server may 
return wrong data, which in many cases

is more fatal than crash of a client.

I suggest that there should be an enum of algorithms, which is constant across
all 

Re: Added schema level support for publication.

2021-01-11 Thread Bharath Rupireddy
On Mon, Jan 11, 2021 at 5:28 PM Bharath Rupireddy
 wrote:
>
> On Mon, Jan 11, 2021 at 5:25 PM Amit Kapila  wrote:
> >
> > On Mon, Jan 11, 2021 at 4:29 PM Li Japin  wrote:
> > >
> > >
> > > On Jan 11, 2021, at 5:06 PM, Bharath Rupireddy 
> > >  wrote:
> > >
> > > On Mon, Jan 11, 2021 at 1:29 PM japin  wrote:
> > >
> > > Sorry, I misunderstood. After the test (ce6a71fa530). I found that if we 
> > > do not insert data
> > > between step (5) and (6), it will not ship the new records, however, if 
> > > we insert
> > > data between step (5) and (6), it will ship the new records.
> > >
> > >
> > ..
> > > It might be a bug.
> > >
> >
> > Can you check pg_publication_rel and pg_subscription_rel? Also, this
> > is not related to the feature proposed in this thread, so it is better
> > to start a new thread to conclude whether this is a bug or not.
>
> Thanks Amit, sure I will verify and start a new thread.

I started a new thread [1] for this, please have a look.

[1] - 
https://www.postgresql.org/message-id/CALj2ACV%2B0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-11 Thread Bharath Rupireddy
Hi,

While providing thoughts on the design in [1], I found a strange
behaviour with the $subject. The use case is shown below as a sequence
of steps that need to be run on publisher and subscriber to arrive at
the strange behaviour.  In step 5, the table is dropped from the
publication and in step 6, the refresh publication is run on the
subscriber, from here onwards, the expectation is that no further
inserts into the publisher table have to be replicated on to the
subscriber, but the opposite happens i.e. the inserts are still
replicated to the subscriber. ISTM as a bug. Let me know if I'm
missing anything.

Thoughts?

step 1) on the publisher:
DROP TABLE t1;
DROP PUBLICATION mypub1;
CREATE TABLE t1 (a int);
INSERT INTO t1 VALUES (1);
CREATE PUBLICATION mypub1 FOR TABLE t1;
postgres=# SELECT r1.*, r2.relname, r3.* FROM pg_publication_rel r1,
pg_class r2, pg_publication r3 WHERE r1.prrelid = r2.oid AND
r1.prpubid = r3.oid;
  oid  | prpubid | prrelid | relname |  oid  | pubname | pubowner |
puballtables | pubinsert | pubupdate | pubdelete | pubtruncate |
pubviaroot
---+-+-+-+---+-+--+--+---+---+---+-+
 16462 |   16461 |   16458 | t1  | 16461 | mypub1  |   10 | f
  | t | t | t | t   | f
(1 row)

step 2) on the subscriber:
DROP TABLE t1;
DROP SUBSCRIPTION mysub1;
CREATE TABLE t1 (a int);
CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost dbname=postgres
user=bharath port=5432' PUBLICATION mypub1;
postgres=# SELECT r1.*, r2.relname, r3.* FROM pg_subscription_rel r1,
pg_class r2, pg_subscription r3 WHERE r1.srrelid = r2.oid AND
r1.srsubid = r3.oid;
 srsubid | srrelid | srsubstate | srsublsn | relname |  oid  | subdbid
| subname | subowner | subenabled | subbinary | substream |
 subconninfo  | subslotname | subsynccommit |
subpublications
-+-++--+-+---+-+-+--++---+---+-
--+-+---+-
   16446 |   16443 | i  |  | t1  | 16446 |   12872
| mysub1  |   10 | t  | f | f |
host=localhost dbnam
e=postgres user=bharath port=5432 | mysub1  | off   | {mypub1}
(1 row)
postgres=# SELECT * FROM t1;
 a
---
 1
(1 row)

step 3) on the publisher:
INSERT INTO t1 VALUES (2);

step 4) on the subscriber:
postgres=# SELECT * FROM t1;
 a
---
 1
 2
(2 rows)

step 5) on the publisher:
ALTER PUBLICATION mypub1 DROP TABLE t1;
postgres=# SELECT r1.*, r2.relname, r3.* FROM pg_publication_rel r1,
pg_class r2, pg_publication r3 WHERE r1.prrelid = r2.oid AND
r1.prpubid = r3.oid;
 oid | prpubid | prrelid | relname | oid | pubname | pubowner |
puballtables | pubinsert | pubupdate | pubdelete | pubtruncate |
pubviaroot
-+-+-+-+-+-+--+--+---+---+---+-+
(0 rows)
INSERT INTO t1 VALUES (3);

step 6) on the subscriber:
postgres=# SELECT * FROM t1;
 a
---
 1
 2
 3
(3 rows)
ALTER SUBSCRIPTION mysub1 REFRESH PUBLICATION;
postgres=# SELECT r1.*, r2.relname, r3.* FROM pg_subscription_rel r1,
pg_class r2, pg_subscription r3 WHERE r1.srrelid = r2.oid AND
r1.srsubid = r3.oid;
 srsubid | srrelid | srsubstate | srsublsn | relname | oid | subdbid |
subname | subowner | subenabled | subbinary | substream | subconninfo
| subslotn
ame | subsynccommit | subpublications
-+-++--+-+-+-+-+--++---+---+-+-
+---+-
(0 rows)

step 7) on the publisher:
INSERT INTO t1 VALUES (4);

step 8) on the subscriber:
postgres=# SELECT * FROM t1;
 a
---
 1
 2
 3
 4
(4 rows)

step 9) on the publisher:
INSERT INTO t1 SELECT * FROM generate_series(5,100);

step 10) on the subscriber:
postgres=# SELECT count(*) FROM t1;
 count
---
   100
(1 row)

[1] - 
https://www.postgresql.org/message-id/CAA4eK1L5TejNHNctyPB3GVuEriRQw6xxU32iMyv%3Dh4tCJKkLew%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: adding partitioned tables to publications

2021-01-11 Thread Mark Zhao
Thanks for your reply. The patch is exactly what I want.
My English name is Mark Zhao, which should be the current email name.


Thanks,
Mark Zhao




--Original--
From: "Amit Kapila";

Re: [Bug Fix] Logical replication on partition table is very slow and CPU is 99%

2021-01-11 Thread Ashutosh Bapat
On Mon, Jan 4, 2021 at 7:07 PM 赵锐 <875941...@qq.com> wrote:

>
>
> This patch adds the missed decrement to resolve the problem.
>

>
> Previous discussion is here: 
> https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_ejmoo...@mail.gmail.com
> And I believe patch #83fd453 introduce this problem.
>


Thanks for the report and fix. The fix looks straight forward and
correct to me. Please add the patch to the next CF so that it's not
forgotten. But since this is a bug fix it need not wait for CF.

Also please report the results of your experiment with the patch
applied so as to know this bug's contribution to the slow down.

-- 
Best Wishes,
Ashutosh Bapat




Re: Proposal: Global Index

2021-01-11 Thread 曾文旌


> 2021年1月7日 23:04,Robert Haas  写道:
> 
> On Thu, Jan 7, 2021 at 4:44 AM 曾文旌  wrote:
>> I've been following this topic for a long time. It's been a year since the 
>> last response.
>> It was clear that our customers wanted this feature as well, and a large 
>> number of them mentioned it.
>> 
>> So, I wish the whole feature to mature as soon as possible.
>> I summarized the scheme mentioned in the email and completed the POC 
>> patch(base on PG_13).
> 
> You need to summarize the basic design choices you've made here. Like,
> what did you do about the fact that TIDs have to be 6 bytes?

These are the basic choices, and most of them come from discussions in previous 
emails.

1 Definition of global index
Obviously, we need to expand Index address info(CTID) to include child table 
info in GlobalIndexTuple.

1.1 As mentioned in the previous email, Bruce suggested having the OID
instead of relfilenode as relfilenode can be duplicated across tablespaces. 
I agree with that.

1.2 And Heikki pointed me to include heap specific information using the 
INCLUDE keyword so that heap information
is stored with each index node as data.

So ,In POC stage, I choose use INCLUDE keyword to INCLUDE the tableoid of 
global index. This will add 4 bytes to each IndexTuple.

Considering that if a single partitioned table does not exceed 65535 child 
tables, perhaps two bytes for tracking which child table the data belongs to is 
sufficient.

2. Maintenance of global index by partition table DML.
The DML of each child table of the partitioned table needs to maintain the 
global index on the partitioned table.

3. Global index scan
Planner: 
Processes predicate on the primary partition, generating paths and plans for 
the global index.
The cost model of the global index may need to be considered. We need to make 
the global index or the local index selected in their respective advantageous 
scenarios.

Executer: 
The index scan get indextup, get the tableoid from indextup, and verify the 
visibility of the data in the child table.
If a child table is DETACH, then the index item of this table is ignored during 
the index scan until VACUUM finishes cleaning up the global index.

4. Vacuum partition table maintains global index.
Old data in the global index also needs to be cleaned up, and vaccum is 
suitable for it.
Each child table in VACUUM, while vacuuming its own index, also vacuums the 
global index on the partitioned table.

5. Other
The global index indexes all of the child tables, which makes the global index 
large and has many levels. 
Follow the technical route, The partitioned indexes are a further target.

This is my basic idea for implementing global index.
Looking forward to your feedback.

Thanks!

Wenjing

> 
> -- 
> Robert Haas
> EDB: http://www.enterprisedb.com



smime.p7s
Description: S/MIME cryptographic signature


RE: [POC] Fast COPY FROM command for the table with foreign partitions

2021-01-11 Thread Tang, Haiying
Hi Andrey,

I had a general look at this extension feature, I think it's beneficial for 
some application scenarios of PostgreSQL. So I did 7 performance cases test on 
your patch(v13). The results are really good. As you can see below we can get 
7-10 times improvement with this patch.

PSA test_copy_from.sql shows my test cases detail(I didn't attach my data file 
since it's too big). 

Below are the test results:
'Test No' corresponds to the number(0 1...6) in attached test_copy_from.sql.
%reg=(Patched-Unpatched)/Unpatched), Unit is millisecond.

|Test No| Test Case 
  |Patched(ms)  | Unpatched(ms) |%reg   |
|---|-|-|---|---|
|0  |COPY FROM insertion into the partitioned table(parition is foreign 
table)| 102483.223  |  1083300.907  |  -91% |
|1  |COPY FROM insertion into the partitioned table(parition is foreign 
partition)| 104779.893  |  1207320.287  |  -91% |
|2  |COPY FROM insertion into the foreign table(without partition)  
  | 100268.730  |  1077309.158  |  -91% |
|3  |COPY FROM insertion into the partitioned table(part of foreign 
partitions)   | 104110.620  |  1134781.855  |  -91% |
|4  |COPY FROM insertion into the partitioned table with constraint(part of 
foreign partition)| 136356.201  |  1238539.603  |  -89% |
|5  |COPY FROM insertion into the foreign table with constraint(without 
partition)| 136818.262  |  1189921.742  |  -89% |
|6  |\copy insertion into the partitioned table with constraint.
  | 140368.072  |  1242689.924  |  -89% |

If there is any question on my tests, please feel free to ask.

Best Regard,
Tang




test_copy_from.sql
Description: test_copy_from.sql


Re: Added schema level support for publication.

2021-01-11 Thread Bharath Rupireddy
On Mon, Jan 11, 2021 at 5:25 PM Amit Kapila  wrote:
>
> On Mon, Jan 11, 2021 at 4:29 PM Li Japin  wrote:
> >
> >
> > On Jan 11, 2021, at 5:06 PM, Bharath Rupireddy 
> >  wrote:
> >
> > On Mon, Jan 11, 2021 at 1:29 PM japin  wrote:
> >
> > Sorry, I misunderstood. After the test (ce6a71fa530). I found that if we do 
> > not insert data
> > between step (5) and (6), it will not ship the new records, however, if we 
> > insert
> > data between step (5) and (6), it will ship the new records.
> >
> >
> ..
> > It might be a bug.
> >
>
> Can you check pg_publication_rel and pg_subscription_rel? Also, this
> is not related to the feature proposed in this thread, so it is better
> to start a new thread to conclude whether this is a bug or not.

Thanks Amit, sure I will verify and start a new thread.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Added schema level support for publication.

2021-01-11 Thread Amit Kapila
On Mon, Jan 11, 2021 at 4:29 PM Li Japin  wrote:
>
>
> On Jan 11, 2021, at 5:06 PM, Bharath Rupireddy 
>  wrote:
>
> On Mon, Jan 11, 2021 at 1:29 PM japin  wrote:
>
> Sorry, I misunderstood. After the test (ce6a71fa530). I found that if we do 
> not insert data
> between step (5) and (6), it will not ship the new records, however, if we 
> insert
> data between step (5) and (6), it will ship the new records.
>
>
..
> It might be a bug.
>

Can you check pg_publication_rel and pg_subscription_rel? Also, this
is not related to the feature proposed in this thread, so it is better
to start a new thread to conclude whether this is a bug or not.

-- 
With Regards,
Amit Kapila.




Re: adding partitioned tables to publications

2021-01-11 Thread Amit Kapila
On Wed, Dec 30, 2020 at 8:03 PM 赵锐 <875941...@qq.com> wrote:
>
> The first file of Amit's patch can not only re-range the code, but also fix a 
> hidden bug.
> To make it easy to see, I attach another patch.
> "RelationIdGetRelation" will increase ref on owner->relrefarr, without 
> "RelationClose", the owner->relrefarr will enlarge and re-hash.
> When the capacity of owner->relrefarr is over than 10 million, enlarge and 
> re-hash takes serial hours. And what's worse, increase ref will also take 
> minutes, as the hash collision resolution is based on looking up an array in 
> order.
> When we want to publish 10 billion data under one partition table, it takes 
> serial days up to increase ref, enlarge and re-hash, and CPU is always 99%.
> After applying my patch, 10 billion will be published in 10 minutes.
>

It is a clear relation descriptor leak. The proposed fix seems correct
to me. The patch wasn't getting applied to HEAD. So, I have prepared
the separate patches for HEAD and 13. There are minor modifications in
the patch like I have used RelationIsValid before closing the
relation. I have not added any test because I see that there is
already a test in src/test/subscription/t/013_partition.

Kindly let me know your English name so that I can give you credit as
a co-author?

-- 
With Regards,
Amit Kapila.


v1-0001-Fix-relation-descriptor-leak.HEAD.patch
Description: Binary data


v1-0001-Fix-relation-descriptor-leak.13.patch
Description: Binary data


Re: Proposal: Global Index

2021-01-11 Thread 曾文旌


> 2021年1月8日 16:26,Julien Rouhaud  写道:
> 
> On Fri, Jan 8, 2021 at 4:02 PM 曾文旌  wrote:
>> 
>>> 2021年1月7日 22:16,Bruce Momjian  写道:
>>> 
>>> On Thu, Jan  7, 2021 at 05:44:01PM +0800, 曾文旌 wrote:
 I've been following this topic for a long time. It's been a year since the 
 last response.
 It was clear that our customers wanted this feature as well, and a large 
 number of them mentioned it.
 
 So, I wish the whole feature to mature as soon as possible.
 I summarized the scheme mentioned in the email and completed the POC 
 patch(base on PG_13).
>>> 
>>> I think you need to address the items mentioned in this blog, and the
>>> email link it mentions:
>>> 
>>>  https://momjian.us/main/blogs/pgblog/2020.html#July_1_2020
>> 
>> Thank you for your reply.
>> I read your blog and it helped me a lot.
>> 
>> The blog mentions a specific problem: "A large global index might also 
>> reintroduce problems that prompted the creation of partitioning in the first 
>> place. "
>> I don't quite understand, could you give some specific information?
>> 
>> In addition you mentioned: "It is still unclear if these use-cases justify 
>> the architectural changes needed to enable global indexes."
>> Please also describe the problems you see, I will confirm each specific 
>> issue one by one.
> 
> One example is date partitioning.  People frequently need to store
> only the most recent data.  For instance doing a monthly partitioning
> and dropping the oldest partition every month once you hit the wanted
> retention is very efficient for that use case, as it should be almost
> instant (provided that you can acquire the necessary locks
> immediately).  But if you have a global index, you basically lose the
> advantage of partitioning as it'll require heavy changes on that
> index.
If the global index removes all the major benefits of partitioned tables, then 
it is not worth having it.

This is indeed a typical scenario for a partitioned table.
there are two basic operations
1) Monthly DETACH old child table
2) Monthly ATTACH new child table

For 1) The DETACH old child table can be finished immediately, global index can 
be kept valid after DETACH is completed, and the cleanup of garbage data in 
global index can be deferred to VACUUM.
This is similar to the global index optimization done by Oracle12c.
For 2) ATTACH new empty child table can also be completed immediately.
If this is the case, many of the advantages of partitioned tables will be 
retained, while the advantages of global indexes will be gained.



smime.p7s
Description: S/MIME cryptographic signature


Re: Key management with tests

2021-01-11 Thread Masahiko Sawada
On Sun, Jan 10, 2021 at 11:51 PM Bruce Momjian  wrote:
>
> On Sun, Jan 10, 2021 at 06:04:12PM +1300, Thomas Munro wrote:
> > On Sun, Jan 10, 2021 at 3:45 PM Bruce Momjian  wrote:
> > > Does anyone know why the cfbot applied the patch listed second first
> > > here?
> > >
> > > http://cfbot.cputube.org/patch_31_2925.log
> > >
> > > Specifically, it applied hex..key.diff.gz before hex.diff.gz.  I assumed
> > > it would apply attachments in the order they appear in the email.
> >
> > It sorts the filenames (in this case after decompressing step removes
> > the .gz endings).  That works pretty well for the patches that "git
> > format-patch" spits out, but it's a bit hit and miss with cases like
> > yours.
>
> OK, here they are with numeric prefixes.  It was actually tricky to
> figure out how to create a squashed format-patch based on another branch.
>

Thank you for attaching the patches. It passes all cfbot tests, great.

Looking at the patch, it supports three algorithms but only
PG_CIPHER_AES_KWP is used in the core for now:

+/*
+ * Supported symmetric encryption algorithm. These identifiers are passed
+ * to pg_cipher_ctx_create() function, and then actual encryption
+ * implementations need to initialize their context of the given encryption
+ * algorithm.
+ */
+#define PG_CIPHER_AES_GCM  0
+#define PG_CIPHER_AES_KW   1
+#define PG_CIPHER_AES_KWP  2
+#define PG_MAX_CIPHER_ID   3

Are we in the process of experimenting which algorithms are better? If
we support one algorithm that is actually used in the core, we would
reduce the tests as well.

FWIW, I've written a PoC patch for buffer encryption to make sure the
kms patch would be workable with other components using the encryption
key managed by kmgr.

Overall it’s good. While the buffer encryption patch is still PoC
quality and there are some problems regarding nonce generation we need
to deal with, it easily can use the relation key managed by the kmgr
to encrypt/decrypt buffers.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/


0003-Poc-buffer-encryption.patch
Description: Binary data


Re: Added schema level support for publication.

2021-01-11 Thread Li Japin

On Jan 11, 2021, at 5:06 PM, Bharath Rupireddy 
mailto:bharath.rupireddyforpostg...@gmail.com>>
 wrote:

On Mon, Jan 11, 2021 at 1:29 PM japin 
mailto:japi...@hotmail.com>> wrote:
Say a user has created a publication for a schema with hundreds of
tables in it, at some point later, can he stop replicating a single or
some tables from that schema?


There is no provision for this currently.

The documentation [1] says, we can ALTER PUBLICATION testpub DROP
TABLE t1; which removes the table from the list of published tables,
but looks like it requires ALTER SUBSCRIPTION testsub REFRESH
PUBLICATION; for the changes to become effective on the subscriber. I
have done some testing for this case:
1) created publication for table t1, see \d+ t1, the associated
publication is visible in the output
2) created subscription on the subscriber, initial available data from
the publisher for table t1 is received
3) insert into table t1 on the publisher
4) inserted data in (3) is received in the subscriber table t1
5) alter publication to drop the table t1 on the publisher, see \d+
t1, there will not be any associated publication in the output
6) execute alter subscription refresh publication on the subscriber,
with the expectation that it should not receive the data from the
publisher for the table t1 since it's dropped from the publication in
(5)
7) insert into table t1 on the publisher
8) still the newly inserted data in (7) from the publisher, will be
received into the table t1 in the subscriber

IIUC, the behaviour of ALTER PUBLICATION DROP TABLE from the docs and
the above use case, it looks like a bug to me. If I'm wrong, can
someone correct me?


Yes, if we modify the publication, we should refresh the subscription on
each subscriber.  It looks strange for me, especially for partitioned
tables [1].

Thoughts?


Can we trace the different between publication and subscription, and
auto-refresh subscription on subscriber?

[1]
https://www.postgresql.org/message-id/flat/1D6DCFD2-0F44-4A18-BF67-17C2697B1631%40hotmail.com

As Amit stated in your thread [1], DDLs like creation of the new
tables or partitions, schema changes etc. on the publisher can not be
replicated automatically by the logical replication framework to the
subscriber. Users have to perform those DDLs on the subscribers by
themselves.

Yeah, DDLs is not supported now. On publisher, the partitions are added to the
publication automatically.  However, even if we created the partitions on 
subscriber,
it will not sync the new partitions, because it likes normal table, we must 
execute
ALTER SUBSCRIPTION my_test REFRESH PUBLICATION;
I preferred it will automatically add to subscription when we create the new 
partitions
if the partitions is already in publication.

If your point is to at least issue the ALTER SUBSCRIPTION testsub
REFRESH PUBLICATION; from the publication whenever the publication is
altered i.e. added or dropped tables, IMO, we cannot do this, because
running this command on the subscriber only makes sense, after user
runs the same DDLs (which were run on the publisher) also on the
subscriber. To illustrate this:
1) create a new table or partition on the publisher and add it to
publisher, note that the same table has not yet been created on the
subscriber
2) imagine the publisher issuing an auto refresh command to all the
subscribers, then, no point in that right, because the new table or
the partition is not yet created on all the subscribers.

So, IMO, we can not have an auto refresh mechanism, until we have the
feature to replicate the DDL changes to all the subscribers.

Thanks for clarification.

What I stated in my earlier mail [1] is that even though we drop the
table from the publication in the publisher and run a refresh
publication on the subscriber, still the data is being replicated from
the publisher to the subscriber table. I just wanted to know whether
this is the expected behaviour or what exactly means. a user running
ALTER PUBLICATION mypub DROP TABLE mytable;

[1] - 
https://www.postgresql.org/message-id/CALj2ACWAxO3vSToT0o5nXL%3Drz5cNx90zaV-at%3DcvM14Tag4%3DcQ%40mail.gmail.com

Sorry, I misunderstood. After the test (ce6a71fa530). I found that if we do not 
insert data
between step (5) and (6), it will not ship the new records, however, if we 
insert
data between step (5) and (6), it will ship the new records.

(1) created publication for table t1, t2
postgres[8765]=# CREATE TABLE t1 (a int);
CREATE TABLE
postgres[8765]=# CREATE TABLE t2 (a int);
CREATE TABLE
postgres[8765]=# INSERT INTO t1 VALUES (1);
INSERT 0 1
postgres[8765]=# INSERT INTO t2 VALUES (1);
INSERT 0 1
postgres[8765]=# CREATE PUBLICATION mypub1 FOR TABLE t1;
CREATE PUBLICATION
postgres[8765]=# CREATE PUBLICATION mypub2 FOR TABLE t2;
CREATE PUBLICATION

(2) created subscription on the subscriber
postgres[9812]=# CREATE TABLE t1 (a int);
CREATE TABLE
postgres[9812]=# CREATE TABLE t2 (a int);
CREATE TABLE
postgres[9812]=# CREATE SUBSCRIPTION mysub1 CONNECTION 

Re: Single transaction in the tablesync worker?

2021-01-11 Thread Amit Kapila
On Mon, Jan 11, 2021 at 3:53 PM Ajin Cherian  wrote:
>
> On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila  wrote:
>
> > > BTW, I have analyzed whether we need any modifications to
> > > pg_dump/restore for this patch as this changes the state of one of the
> > > fields in the system table and concluded that we don't need any
> > > change. For subscriptions, we don't dump any of the information from
> > > pg_subscription_rel, rather we just dump subscriptions with the
> > > connect option as false which means users need to enable the
> > > subscription and refresh publication after restore. I have checked
> > > this in the code and tested it as well. The related information is
> > > present in pg_dump doc page [1], see from "When dumping logical
> > > replication subscriptions ".
> > >
> >
> > I have further analyzed that we don't need to do anything w.r.t
> > pg_upgrade as well because it uses pg_dump/pg_dumpall to dump the
> > schema info of the old cluster and then restore it to the new cluster.
> > And, we know that pg_dump ignores the info in pg_subscription_rel, so
> > we don't need to change anything as our changes are specific to the
> > state of one of the columns in pg_subscription_rel. I have not tested
> > this but we should test it by having some relations in not_ready state
> > and then allow the old cluster (<=PG13) to be upgraded to new (pg14)
> > both with and without this patch and see if there is any change in
> > behavior.
>
> I have tested this scenario, stopped a server running PG_13 when
> subscription table sync was in progress.
>

Thanks for the test. This confirms my analysis and we don't need any
change in pg_dump or pg_upgrade for this patch.

-- 
With Regards,
Amit Kapila.




Re: Single transaction in the tablesync worker?

2021-01-11 Thread Ajin Cherian
On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila  wrote:

> > BTW, I have analyzed whether we need any modifications to
> > pg_dump/restore for this patch as this changes the state of one of the
> > fields in the system table and concluded that we don't need any
> > change. For subscriptions, we don't dump any of the information from
> > pg_subscription_rel, rather we just dump subscriptions with the
> > connect option as false which means users need to enable the
> > subscription and refresh publication after restore. I have checked
> > this in the code and tested it as well. The related information is
> > present in pg_dump doc page [1], see from "When dumping logical
> > replication subscriptions ".
> >
>
> I have further analyzed that we don't need to do anything w.r.t
> pg_upgrade as well because it uses pg_dump/pg_dumpall to dump the
> schema info of the old cluster and then restore it to the new cluster.
> And, we know that pg_dump ignores the info in pg_subscription_rel, so
> we don't need to change anything as our changes are specific to the
> state of one of the columns in pg_subscription_rel. I have not tested
> this but we should test it by having some relations in not_ready state
> and then allow the old cluster (<=PG13) to be upgraded to new (pg14)
> both with and without this patch and see if there is any change in
> behavior.

I have tested this scenario, stopped a server running PG_13 when
subscription table sync was in progress.
One of the tables in pg_subscription_rel was still in 'd' state (DATASYNC)

postgres=# select * from pg_subscription_rel;
 srsubid | srrelid | srsubstate |  srsublsn
-+-++
   16424 |   16384 | d  |
   16424 |   16390 | r  | 0/247A63D8
   16424 |   16395 | r  | 0/247A6410
   16424 |   16387 | r  | 0/247A6448
(4 rows)

then initiated the pg_upgrade to PG_14 with the patch and without the patch:
I see that the subscription exists but is not enabled:

postgres=# select * from pg_subscription;
  oid  | subdbid | subname | subowner | subenabled | subbinary |
substream |   subconninfo| subslotname |
subsynccommit | subpublications
---+-+-+--++---+---+--+-+---+-
 16407 |   16401 | tap_sub |   10 | f  | f | f
| host=localhost port=6972 dbname=postgres | tap_sub | off
  | {tap_pub}
(1 row)

and looking at the pg_subscription_rel:

postgres=# select * from pg_subscription_rel;
 srsubid | srrelid | srsubstate | srsublsn
-+-++--
(0 rows)

As can be seen, none of the data in the pg_subscription_rel has been
copied over. Same behaviour is seen with the patch and without the
patch.

regards,
Ajin Cherian
Fujitsu Australia




Re: [HACKERS] Custom compression methods

2021-01-11 Thread Dilip Kumar
On Mon, Jan 11, 2021 at 12:21 PM Justin Pryzby  wrote:
>
> On Mon, Jan 11, 2021 at 12:11:54PM +0530, Dilip Kumar wrote:
> > On Mon, Jan 11, 2021 at 11:00 AM Dilip Kumar  wrote:
> > > On Sun, Jan 10, 2021 at 10:59 PM Justin Pryzby  
> > > wrote:
> > > >
> > > > On Mon, Jan 04, 2021 at 04:57:16PM +0530, Dilip Kumar wrote:
> > > > > On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby  
> > > > > wrote:
> > > > > > And fails pg_upgrade check, apparently losing track of the 
> > > > > > compression (?)
> > > > > >
> > > > > >  CREATE TABLE public.cmdata2 (
> > > > > > -f1 text COMPRESSION lz4
> > > > > > +f1 text
> > > > > >  );
> > > > >
> > > > > I did not get this?  pg_upgrade check is passing for me.
> > > >
> > > > I realized that this was failing in your v16 patch sent Dec 25.
> > > > It's passing on current patches because they do "DROP TABLE cmdata2", 
> > > > but
> > > > that's only masking the error.
> >
> > I tested specifically pg_upgrade by removing all the DROP table and MV
> > and it is passing.  I don't see the reason why should it fail.  I mean
> > after the upgrade why COMPRESSION lz4 is missing?
>
> How did you test it ?
>
> I'm not completely clear how this is intended to work... has it been tested
> before ?  According to the comments, in binary upgrade mode, there's an ALTER
> which is supposed to SET COMPRESSION, but that's evidently not happening.

I am able to reproduce this issue, If I run pg_dump with
binary_upgrade mode then I can see the issue (./pg_dump
--binary-upgrade   -d Postgres).  Yes you are right that for fixing
this there should be an ALTER..SET COMPRESSION method.

> > > > I found that's the AM's OID in the old clsuter:
> > > > regression=# SELECT * FROM pg_am WHERE oid=36447;
> > > >   oid  | amname |  amhandler  | amtype
> > > > ---++-+
> > > >  36447 | pglz2  | pglzhandler | c
> > > >
> > > > But in the new cluster, the OID has changed.  Since that's written into 
> > > > table
> > > > data, I think you have to ensure that the compression OIDs are 
> > > > preserved on
> > > > upgrade:
> > > >
> > > >  16755 | pglz2  | pglzhandler  | c
> > >
> > > Yeah, basically we are storing am oid in the compressed data so Oid
> > > must be preserved.  I will look into this and fix it.
> >
> > On further analysis, if we are dumping and restoring then we will
> > compress the data back while inserting it so why would we need to old
> > OID.  I mean in the new cluster we are inserting data again so it will
> > be compressed again and now it will store the new OID.  Am I missing
> > something here?
>
> I'm referring to pg_upgrade which uses pg_dump, but does *not* re-insert data,
> but rather recreates catalogs only and then links to the old tables (either
> with copy, link, or clone).  Test with make -C src/bin/pg_upgrade (which is
> included in make check-world).

Got this as well.

I will fix these two issues and post the updated patch by tomorrow.

Thanks for your findings.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Added schema level support for publication.

2021-01-11 Thread Bharath Rupireddy
On Mon, Jan 11, 2021 at 1:29 PM japin  wrote:
> >> > Say a user has created a publication for a schema with hundreds of
> >> > tables in it, at some point later, can he stop replicating a single or
> >> > some tables from that schema?
> >> >
> >>
> >> There is no provision for this currently.
> >
> > The documentation [1] says, we can ALTER PUBLICATION testpub DROP
> > TABLE t1; which removes the table from the list of published tables,
> > but looks like it requires ALTER SUBSCRIPTION testsub REFRESH
> > PUBLICATION; for the changes to become effective on the subscriber. I
> > have done some testing for this case:
> > 1) created publication for table t1, see \d+ t1, the associated
> > publication is visible in the output
> > 2) created subscription on the subscriber, initial available data from
> > the publisher for table t1 is received
> > 3) insert into table t1 on the publisher
> > 4) inserted data in (3) is received in the subscriber table t1
> > 5) alter publication to drop the table t1 on the publisher, see \d+
> > t1, there will not be any associated publication in the output
> > 6) execute alter subscription refresh publication on the subscriber,
> > with the expectation that it should not receive the data from the
> > publisher for the table t1 since it's dropped from the publication in
> > (5)
> > 7) insert into table t1 on the publisher
> > 8) still the newly inserted data in (7) from the publisher, will be
> > received into the table t1 in the subscriber
> >
> > IIUC, the behaviour of ALTER PUBLICATION DROP TABLE from the docs and
> > the above use case, it looks like a bug to me. If I'm wrong, can
> > someone correct me?
> >
>
> Yes, if we modify the publication, we should refresh the subscription on
> each subscriber.  It looks strange for me, especially for partitioned
> tables [1].
>
> > Thoughts?
> >
>
> Can we trace the different between publication and subscription, and
> auto-refresh subscription on subscriber?
>
> [1]
> https://www.postgresql.org/message-id/flat/1D6DCFD2-0F44-4A18-BF67-17C2697B1631%40hotmail.com

As Amit stated in your thread [1], DDLs like creation of the new
tables or partitions, schema changes etc. on the publisher can not be
replicated automatically by the logical replication framework to the
subscriber. Users have to perform those DDLs on the subscribers by
themselves.

If your point is to at least issue the ALTER SUBSCRIPTION testsub
REFRESH PUBLICATION; from the publication whenever the publication is
altered i.e. added or dropped tables, IMO, we cannot do this, because
running this command on the subscriber only makes sense, after user
runs the same DDLs (which were run on the publisher) also on the
subscriber. To illustrate this:
1) create a new table or partition on the publisher and add it to
publisher, note that the same table has not yet been created on the
subscriber
2) imagine the publisher issuing an auto refresh command to all the
subscribers, then, no point in that right, because the new table or
the partition is not yet created on all the subscribers.

So, IMO, we can not have an auto refresh mechanism, until we have the
feature to replicate the DDL changes to all the subscribers.

What I stated in my earlier mail [1] is that even though we drop the
table from the publication in the publisher and run a refresh
publication on the subscriber, still the data is being replicated from
the publisher to the subscriber table. I just wanted to know whether
this is the expected behaviour or what exactly means. a user running
ALTER PUBLICATION mypub DROP TABLE mytable;

[1] - 
https://www.postgresql.org/message-id/CALj2ACWAxO3vSToT0o5nXL%3Drz5cNx90zaV-at%3DcvM14Tag4%3DcQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Improper use about DatumGetInt32

2021-01-11 Thread Michael Paquier
On Sat, Jan 09, 2021 at 01:41:39PM +0100, Peter Eisentraut wrote:
> If we had a way to do such testing then we wouldn't need these markers. But
> AFAICT, we don't.

Not sure I am following your point here.  Taking your patch, it is
possible to trigger the version of get_raw_page() <= 1.8 just with
something like the following:
create extension pageinspect version "1.8";
select get_raw_page('pg_class', 0);

There are no in-core regression tests that check the compatibility of
extensions with older versions, but it is technically possible to do
so.
--
Michael


signature.asc
Description: PGP signature


Re: Added schema level support for publication.

2021-01-11 Thread japin


On Mon, 11 Jan 2021 at 14:15, Bharath Rupireddy wrote:
> On Sun, Jan 10, 2021 at 11:21 PM vignesh C  wrote:
>> On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
>>  wrote:
>> > I think this feature can be useful, in case a user has a lot of tables
>> > to publish inside a schema. Having said that, I wonder if this feature
>> > mandates users to create the same schema with same
>> > permissions/authorizations manually on the subscriber, because logical
>> > replication doesn't propagate any ddl's so are the schema or schema
>> > changes? Or is it that the list of tables from the publisher can go
>> > into a different schema on the subscriber?
>> >
>>
>> DDL's will not be propagated to the subscriber. Users have to create
>> the schema & tables in the subscriber. No change in
>> Permissions/authorizations handling, it will be the same as the
>> existing behavior for relations.
>
> Looks like the existing behaviour already requires users to create the
> schema on the subscriber when publishing the tables from that schema.
> Otherwise, an error is thrown on the subscriber [1].
>
> [1] on publisher:
> CREATE SCHEMA myschema;
> CREATE TABLE myschema.t1(a1 int, b1 int);
> INSERT INTO myschema.t1_myschema SELECT i, i+10 FROM generate_series(1,10) i;
> CREATE PUBLICATION testpub FOR TABLE myschema.t1;
>
> on subscriber:
> postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
> dbname=postgres user=bharath port=5432' PUBLICATION testpub;
> ERROR:  schema "myschema" does not exist
> CREATE SCHEMA myschema;
> CREATE TABLE myschema.t1(a1 int, b1 int);
> postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
> dbname=postgres user=bharath port=5432' PUBLICATION testpub;
> NOTICE:  created replication slot "testsub" on publisher
> CREATE SUBSCRIPTION
>
>> > Since the schema can have other objects such as data types, functions,
>> > operators, I'm sure with your feature, non-table objects will be
>> > skipped.
>> >
>>
>> Yes, only table data will be sent to subscribers, non-table objects
>> will be skipped.
>
> Looks like the existing CREATE PUBLICATION FOR ALL TABLES, which is
> for all the tables in the database, does this i.e. skips non-table
> objects and temporary tables, foreign tables and so on. So, your
> feature also can behave the same way, but within the scope of the
> given schema/s.
>
>> > As Amit pointed out earlier, the behaviour when schema dropped, I
>> > think we should also consider when schema is altered, say altered to a
>> > different name, maybe we should change that in the publication too.
>> >
>>
>> I agree that when schema is altered the renamed schema should be
>> reflected in the publication.
>
> I think, it's not only making sure that the publisher side has the new
> altered schema, but also the subscriber needs those alters. Having
> said that, since these alters come under DDL changes and in logical
> replication we don't publish the scheme changes to the subscriber, we
> may not need to anything extra for informing the schema alters to the
> subscriber from the publisher, the users might have to do the same
> schema alter on the subscriber and then a ALTER SUBSCRIPTION testsub
> REFRESH PUBLICATION;  should work for them? If this understanding is
> correct, then we should document this.
>
>> > In general, what happens if we have some temporary tables or foreign
>> > tables inside the schema, will they be allowed to send the data to
>> > subscribers?
>> >
>>
>> Temporary tables & foreign tables will not be added to the publications.
>
> Yes the existing logical replication framework doesn't allow
> replication of temporary, unlogged, foreign tables and other non-table
> relations such as materialized views, indexes etc [1]. The CREATE
> PUBLICATION statement either fails in check_publication_add_relation
> or before that.
>
> CREATE PUBLICATION testpub FOR TABLE tab1, throwing the error if the
> single table tab1 is any of the above restricted tables, seems fine.
> But, if there's a list of tables with CREATE PUBLICATION testpub FOR
> TABLE normal_tab1, temp_tab2, normal_tab3, foreign_tab4,
> unlogged_tab5, normal_tab6, normal_tab7 ..; This query fails on
> first encounter of the restricted table, say at temp_tab2. Whereas,
> CREATE PUBLICATION testpub FOR ALL TABLES; would skip the restricted
> tables and continue to add the accepted tables into the publication
> within the database.
>
> IMHO, if there's a list of tables specified with FOR TABLE, then
> instead of throwing an error in case of any restricted table, we can
> issue a warning and continue with the addition of accepted tables into
> the publication. If done, this behaviour will be in sync with FOR ALL
> TABLES;
>
> Thoughts? If okay, I can work on a patch.
>
> Related to error messages: when foreign table is specified in CREATE
> PUBLICATION statement, then "ERROR:  "f1" is not a table", is thrown
> [1], how about the error message "ERROR: foerign table "f1" cannot be
> replicated". In general, it