RE: Data is copied twice when specifying both child and parent table in publication

2023-03-31 Thread shiy.f...@fujitsu.com
On Fri, Mar 31, 2023 2:16 AM Jacob Champion  wrote:
> 
> On Wed, Mar 29, 2023 at 2:00 AM Amit Kapila 
> wrote:
> > Pushed.
> 
> While rebasing my logical-roots patch over the top of this, I ran into
> another situation where mixed viaroot settings can duplicate data. The
> key idea is to subscribe to two publications with mixed settings, as
> before, and add a partition root that's already been replicated with
> viaroot=false to the other publication with viaroot=true.
> 
>   pub=# CREATE TABLE part (a int) PARTITION BY RANGE (a);
>   pub=# CREATE PUBLICATION pub_all FOR ALL TABLES;
>   pub=# CREATE PUBLICATION pub_other FOR TABLE other WITH
> (publish_via_partition_root);
>   -- populate with data, then switch to subscription side
>   sub=# CREATE SUBSCRIPTION sub CONNECTION ... PUBLICATION pub_all,
> pub_other;
>   -- switch back to publication
>   pub=# ALTER PUBLICATION pub_other ADD TABLE part;
>   -- and back to subscription
>   sub=# ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
>   -- data is now duplicated
> 
> (Standalone reproduction attached.)
> 
> This is similar to what happens if you alter the
> publish_via_partition_root setting for an existing publication, but
> I'd argue it's easier to hit by accident. Is this part of the same
> class of bugs, or is it different (or even expected) behavior?
> 

I noticed that a similar problem has been discussed in this thread, see [1] [2]
[3] [4]. It seems complicated to fix it if we want to automatically skip tables
that have been synchronized previously by code, and this may overkill in some
cases (e.g. The target table in subscriber is not a partitioned table, and the
user want to synchronize all data in the partitioned table from the publisher).
Besides, it seems not a common case. So I'm not sure we should fix it. Maybe we
can just add some documentation for it as Peter mentioned.

[1] 
https://www.postgresql.org/message-id/CAJcOf-eQR_%3Dq0f4ZVHd342QdLvBd_995peSr4xCU05hrS3TeTg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com
 (the second issue in it)
[3] 
https://www.postgresql.org/message-id/CA%2BHiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ%40mail.gmail.com
[4] 
https://www.postgresql.org/message-id/CAA4eK1%2BNWreG%3D2sKiMz8vFzTsFhEHCjgQMyAu6zj3sdLmcheYg%40mail.gmail.com

Regards,
Shi Yu


RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-22 Thread shiy.f...@fujitsu.com
On Wed, Mar 22, 2023 2:53 PM Önder Kalacı  wrote:
> 
> We don't really need to, if you check the first patch, we don't have DROP for 
> generated case. I mostly
> wanted to make the test a little more interesting, but it also seems to be a 
> little confusing.
> 
> Now attaching v2 where we do not drop the columns. I don't have strong 
> preference on
> which patch to proceed with, mostly wanted to attach this version to progress 
> faster (in case
> you/Amit considers this one better).
> 

Thanks for updating the patches.
The v2 patch LGTM.

Regards,
Shi Yu



RE: Allow logical replication to copy tables in binary format

2023-03-21 Thread shiy.f...@fujitsu.com
On Wed Mar 22, 2023 7:29 AM Peter Smith  wrote:
> 
> Thanks for all the patch updates. Patch v19 LGTM.
> 

+1

Regards,
Shi Yu


RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-21 Thread shiy.f...@fujitsu.com
On Tuesday, March 21, 2023 8:03 PM Önder Kalacı  wrote:
> 
> Attached patches again.
> 

Thanks for updating the patch.

@@ -408,15 +412,18 @@ $node_subscriber->wait_for_subscription_sync;
 $node_publisher->safe_psql(
'postgres', qq(
ALTER TABLE dropped_cols DROP COLUMN b_drop;
+   ALTER TABLE generated_cols DROP COLUMN c_drop;
 ));
 $node_subscriber->safe_psql(
'postgres', qq(
ALTER TABLE dropped_cols DROP COLUMN b_drop;
+   ALTER TABLE generated_cols DROP COLUMN c_drop;
 ));

Is there any reasons why we drop column here? Dropped column case has been
tested on table dropped_cols. The generated column problem can be detected
without dropping columns on my machine.

Regards,
Shi Yu


RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-21 Thread shiy.f...@fujitsu.com
On Tue, Mar 21, 2023 4:51 PM Önder Kalacı  wrote:
> 
> Hi Amit, Shi Yu
> 
> Now attaching the similar patches for generated columns.
> 

Thanks for your patches. Here are some comments.

1.
 $node_publisher->safe_psql(
'postgres', qq(
ALTER TABLE dropped_cols DROP COLUMN b_drop;
+   ALTER TABLE generated_cols DROP COLUMN b_gen;
 ));
 $node_subscriber->safe_psql(
'postgres', qq(
ALTER TABLE dropped_cols DROP COLUMN b_drop;
+   ALTER TABLE generated_cols DROP COLUMN b_gen;
 ));

I think we want to test generated columns, so we don't need to drop columns.
Otherwise the generated column problem can't be detected.

2. 
# The bug was that when the REPLICA IDENTITY FULL is used with dropped columns,
# we fail to apply updates and deletes

Maybe we should mention generated columns in comment of the test.

3.
I ran pgindent and it modified some lines. Maybe we can improve the patch
as the following.

@@ -292,8 +292,8 @@ tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2,
att = TupleDescAttr(slot1->tts_tupleDescriptor, attrnum);
 
/*
-* Ignore dropped and generated columns as the publisher
-* doesn't send those
+* Ignore dropped and generated columns as the publisher 
doesn't send
+* those
 */
if (att->attisdropped || att->attgenerated)
continue;

Regards,
Shi Yu


RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-20 Thread shiy.f...@fujitsu.com
On Fri, Mar 17, 2023 11:29 PM Önder Kalacı  wrote:
> 
> Thanks for sharing. Fixed
> 
> 
> This time I was able to run all the tests with all the patches applied.
> 
> Again, the generated column fix also has some minor differences
> per version. So, overall we have 6 patches with very minor 
> differences :) 

Thanks for updating the patches. It seems you forgot to attach the patches of
dropped columns for HEAD and pg15, I think they are the same as v2.

On HEAD, we can re-use clusters in other test cases, which can save some time.
(see fccaf259f22f4a)

In the patches for pg12 and pg11, I am not sure why not add the test at end of
the file 100_bugs.pl. I think it would be better to be consistent with other
versions.

The attached patches modify these two points. Besides, I made some minor
changes, ran pgindent and pgperltidy. These are patches for dropped columns,
because I think this would be submitted first, and we can discuss the fix for
generated columns later.

Regards,
Shi Yu


v4-pg12-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch
Description:  v4-pg12-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch


v4-pg13-pg14-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch
Description:  v4-pg13-pg14-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch


v4-pg15-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch
Description:  v4-pg15-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch


v4-HEAD-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FULL.patch
Description:  v4-HEAD-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FULL.patch


v4-pg11-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch
Description:  v4-pg11-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch


RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-17 Thread shiy.f...@fujitsu.com
On Friday, March 17, 2023 3:38 PM Önder Kalacı  wrote:
> 
> Hi Amit, all
>  
> You can first submit the fix for dropped columns with patches till
> v10. Once that is committed, then you can send the patches for
> generated columns.
> 
> Alright, attaching 2 patches for dropped columns, the names of the files 
> shows which 
> versions the patch can be applied to:
> v2-0001-Ignore-dropped-columns-HEAD-REL_15-REL_14-REL_13.patch
> v2-0001-Ignore-dropped-columns-REL_12-REL_11.patch 
> 
> And, then on top of that, you can apply the patch for generated columns on 
> all applicable
> versions (HEAD, 15, 14, 13 and 12). It applies cleanly.  The name of the file
> is: v2-0001-Ignore-generated-columns.patch
> 
> 
> But unfortunately I couldn't test the patch with PG 12 and below. I'm getting 
> some
> unrelated compile errors and Postgrees CI is not available on these versions 
> . I'll try
> to fix that, but I thought it would still be good to share the patches as you 
> might
> already have the environment to run the tests. 
> 

Thanks for updating the patch.

I couldn't apply v2-0001-Ignore-dropped-columns-HEAD-REL_15-REL_14-REL_13.patch
cleanly in v13 and v14. It looks the patch needs some changes in these versions.

```
Checking patch src/backend/executor/execReplication.c...
Hunk #1 succeeded at 243 (offset -46 lines).
Hunk #2 succeeded at 263 (offset -46 lines).
Checking patch src/test/subscription/t/100_bugs.pl...
error: while searching for:
$node_publisher->stop('fast');
$node_subscriber->stop('fast');

done_testing();

error: patch failed: src/test/subscription/t/100_bugs.pl:373
Applied patch src/backend/executor/execReplication.c cleanly.
Applying patch src/test/subscription/t/100_bugs.pl with 1 reject...
Rejected hunk #1.
```

Besides, I tried v2-0001-Ignore-dropped-columns-REL_12-REL_11.patch in v12. The
test failed and here's some information.

```
Can't locate object method "new" via package "PostgreSQL::Test::Cluster" 
(perhaps you forgot to load "PostgreSQL::Test::Cluster"?) at t/100_bugs.pl line 
74.
# Looks like your test exited with 2 just after 1.
```

+my $node_publisher_d_cols = 
PostgreSQL::Test::Cluster->new('node_publisher_d_cols');

It seems this usage is not supported in v12 and we should use get_new_node()
like other test cases.

Regards,
Shi Yu


RE: Allow logical replication to copy tables in binary format

2023-03-16 Thread shiy.f...@fujitsu.com
On Thu, Mar 16, 2023 9:20 PM Melih Mutlu   wrote:
> 
> Hi,
> 
> Please see the attached v16.
> 

Thanks for updating the patch.

+# Cannot sync due to type mismatch
+$node_subscriber->wait_for_log(qr/ERROR: ( [A-Z0-9]+:)? incorrect binary data 
format/);

+# Ensure the COPY command is executed in text format on the publisher
+$node_publisher->wait_for_log(qr/LOG: ( [A-Z0-9]+:)? statement: COPY (.+)? TO 
STDOUT\n/);

It looks that you forgot to pass `offset` into wait_for_log().

Regards,
Shi Yu


RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-16 Thread shiy.f...@fujitsu.com
On Thu, Mar 16, 2023 5:23 PM Amit Kapila  wrote:
> 
> On Mon, Mar 13, 2023 at 6:26 PM Önder Kalacı 
> wrote:
> >
> > Attaching v2
> >
> 
> Can we change the comment to: "Ignore dropped and generated columns as
> the publisher doesn't send those."? After your change, att =
> TupleDescAttr(slot1->tts_tupleDescriptor, attrnum); is done twice in
> the same function.
> 
> In test cases, let's change the comment to: "The bug was that when the
> REPLICA IDENTITY FULL is used with dropped or generated columns, we
> fail to apply updates and deletes.". Also, I think we don't need to
> provide the email link as anyway commit message will have a link to
> the discussion.
> 
> Did you check this in the back branches?
> 

I tried to reproduce this bug in backbranch.

Generated column is introduced in PG12, and I reproduced generated column 
problem
in PG12~PG15.

For dropped column problem, I reproduced it in PG10~PG15. (Logical replication
was introduced in PG10)

So I think we should backpatch the fix for generated column to PG12, and
backpatch the fix for dropped column to PG10.

Regards,
Shi Yu


RE: Allow logical replication to copy tables in binary format

2023-03-15 Thread shiy.f...@fujitsu.com
On Thu, Mar 16, 2023 2:26 AM Melih Mutlu  wrote:
> 
> Right, it needs to be ordered. Fixed.
> 

Hi,

Thanks for updating the patch. I tested some cases like toast data, combination
of row filter and column lists, and it works well.

Here is a comment:

+# Ensure the COPY command is executed in binary format on the publisher
+$node_publisher->wait_for_log(qr/LOG: ( [a-z0-9]+:)? COPY (.+)? TO STDOUT WITH 
\(FORMAT binary\)/);

The test failed with `log_error_verbosity = verbose` because it couldn't match
the following log:
2023-03-16 09:45:50.096 CST [2499415] pg_16398_sync_16391_7210954376230900539 
LOG:  0: statement: COPY public.test_arrays (a, b, c) TO STDOUT WITH 
(FORMAT binary)

I think we should make it pass, see commit 19408aae7f.
Should it be changed to:

$node_publisher->wait_for_log(qr/LOG: ( [A-Z0-9]+:)? statement: COPY (.+)? TO 
STDOUT WITH \(FORMAT binary\)/);

Besides, for the same reason, this line also needs to be modified.
+$node_publisher->wait_for_log(qr/LOG: ( [a-z0-9]+:)? COPY (.+)? TO STDOUT\n/);

Regards,
Shi Yu


RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-14 Thread shiy.f...@fujitsu.com
On Mon, Mar 13, 2023 10:16 PM Önder Kalacı  wrote:
> 
> Attaching v47.
> 

Thanks for updating the patch. Here are some comments.

1.
in RemoteRelContainsLeftMostColumnOnIdx():

+   if (indexInfo->ii_NumIndexAttrs < 1)
+   return false;

Did you see any cases that the condition is true? I think there is at least one
column in the index. If so, we can use an Assert().

+   if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol))
+   return false;

Similarly, I think `attrmap->maplen` is the number of columns and it is always
greater than keycol. If you agree, we can check it with an Assert(). Besides, It
seems we don't need AttrNumberGetAttrOffset().

2.
+# make sure that the subscriber has the correct data after the update UPDATE

"update UPDATE" seems to be a typo.

3.
+# now, drop the index with the expression, and re-create index on column 
lastname

The comment says "re-create index on column lastname" but it seems we didn't do
that, should it be modified to something like: 
# now, drop the index with the expression, we will use sequential scan

Besides these, the patch LGTM.

Regards,
Shi Yu


RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-13 Thread shiy.f...@fujitsu.com
On Sun, Mar 12, 2023 4:00 AM Önder Kalacı  wrote:
> 
> Attaching a patch that could possibly solve the problem. 
> 

Thanks for your patch. I tried it and it worked well.
Here are some minor comments.

1.
@@ -243,6 +243,17 @@ tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2,
Form_pg_attribute att;
TypeCacheEntry *typentry;
 
+
+   Form_pg_attribute attr = 
TupleDescAttr(slot1->tts_tupleDescriptor, attrnum);
+

I think we can use "att" instead of a new variable. They have the same value.

2. 
+# The bug was that when when the REPLICA IDENTITY FULL is used with dropped

There is an extra "when".

Regards,
Shi Yu


RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-12 Thread shiy.f...@fujitsu.com
On Fri, Mar 10, 2023 8:17 PM Amit Kapila  wrote:
> 
> On Fri, Mar 10, 2023 at 5:16 PM Önder Kalacı  wrote:
> >
> >>
> >> wip_for_optimize_index_column_match
> >> +static bool
> >> +IndexContainsAnyRemoteColumn(IndexInfo  *indexInfo,
> >> + LogicalRepRelation  *remoterel)
> >> +{
> >> + for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
> >> + {
> >>
> >> Wouldn't it be better to just check if the first column is not part of
> >> the remote column then we can skip that index?
> >
> >
> > Reading [1], I think I can follow what you suggest. So, basically,
> > if the leftmost column is not filtered, we have the following:
> >
> >>  but the entire index would have to be scanned, so in most cases the 
> >> planner
> would prefer a sequential table scan over using the index.
> >
> >
> > So, in our case, we could follow a similar approach. If the leftmost column 
> > of
> the index
> > is not sent over the wire from the pub, we can prefer the sequential scan.
> >
> > Is my understanding of your suggestion accurate?
> >
> 
> Yes. I request an opinion from Shi-San who has reported the problem.
> 

I also agree with this.
And I think we can mention this in the comments if we do so.

Regards,
Shi Yu


RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-07 Thread shiy.f...@fujitsu.com
On Tue, Mar 7, 2023 9:47 PM Önder Kalacı  wrote:
> 
> I'm attaching v35. 
> 

I noticed that if the index column only exists on the subscriber side, this 
index
can also be chosen. This seems a bit odd because the index column isn't sent
from publisher.

e.g.
-- pub
CREATE TABLE test_replica_id_full (x int, y int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
-- sub
CREATE TABLE test_replica_id_full (x int, y int, z int);
CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(z);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' 
PUBLICATION tap_pub_rep_full;

I didn't see in any cases the behavior changed after applying the patch, which
looks good. Besides, I tested the performance for such case.

Steps:
1. create tables, index, publication, and subscription
-- pub
create table tbl (a int);
alter table tbl replica identity full;
create publication pub for table tbl;
-- sub
create table tbl (a int, b int);
create index idx_b on tbl(b);
create subscription sub connection 'dbname=postgres port=5432' publication pub;
2. setup synchronous replication
3. execute SQL:
truncate tbl;
insert into tbl select i from generate_series(0,1)i;
update tbl set a=a+1;

The time of UPDATE (take the average of 10 runs):
master: 1356.06 ms
patched: 3968.14 ms

For the cases that all values of extra columns on the subscriber are NULL, index
scan can't do better than sequential scan. This is not a real scenario and I
think it only degrades when there are many NULL values in the index column, so
this is probably not a case to worry about. I just share this case and then we
can discuss should we pick the index which only contain the extra columns on the
subscriber.

Regards,
Shi Yu


RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-06 Thread shiy.f...@fujitsu.com
On Monay, Mar 6, 2023 7:19 PM Önder Kalacı  wrote:
> 
> Yeah, seems easier to follow to me as well. Reflected it in the comment as 
> well.  
> 

Thanks for updating the patch. Here are some comments on v33-0001 patch.

1.
+   if (RelationReplicaIdentityFullIndexScanEnabled(localrel) &&
+   remoterel->replident == REPLICA_IDENTITY_FULL)

RelationReplicaIdentityFullIndexScanEnabled() is introduced in 0002 patch, so
the call to it should be moved to 0002 patch I think.

2.
+#include "optimizer/cost.h"

Do we need this in the latest patch? I tried and it looks it could be removed
from src/backend/replication/logical/relation.c.

3.
+# now, create a unique index and set the replica
+$node_publisher->safe_psql('postgres',
+   "CREATE UNIQUE INDEX test_replica_id_full_unique ON 
test_replica_id_full(x);");
+$node_subscriber->safe_psql('postgres',
+   "CREATE UNIQUE INDEX test_replica_id_full_unique ON 
test_replica_id_full(x);");
+

Should the comment be "now, create a unique index and set the replica identity"?

4.
+$node_publisher->safe_psql('postgres',
+   "ALTER TABLE test_replica_id_full REPLICA IDENTITY USING INDEX 
test_replica_id_full_unique;");
+$node_subscriber->safe_psql('postgres',
+   "ALTER TABLE test_replica_id_full REPLICA IDENTITY USING INDEX 
test_replica_id_full_unique;");
+
+# wait for the synchronization to finish
+$node_subscriber->wait_for_subscription_sync;

There's no new tables to need to be synchronized here, should we remove the call
to wait_for_subscription_sync?

Regards,
Shi Yu


RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-02 Thread shiy.f...@fujitsu.com
On Wed, Mar 1, 2023 9:22 PM Önder Kalacı  wrote:
> 
> Hi Andres, Amit, Shi Yu, all
> 
> Andres Freund , 28 Şub 2023 Sal, 21:39 tarihinde 
> şunu yazdı:
> Hi,
> 
> On 2023-02-25 16:00:05 +0530, Amit Kapila wrote:
> > On Tue, Feb 21, 2023 at 7:55 PM Önder Kalacı  
> > wrote:
> > >> I think this overhead seems to be mostly due to the need to perform
> > >> tuples_equal multiple times for duplicate values.
> 
> I think more work needs to be done to determine the source of the
> overhead. It's not clear to me why there'd be an increase in tuples_equal()
> calls in the tests upthread.
> 
> You are right, looking closely, in fact, we most of the time do much less 
> tuples_equal() with index scan.
> 
> I've done some profiling with perf, and created flame graphs for the apply 
> worker, with the
> test described above: -- case 1 (All values are duplicated). I used the 
> following commands:
> - perf record -F 99 -p 122555 -g -- sleep 60
> -  perf script | ./http://stackcollapse-perf.pl > out.perf-folded
> -  ./http://flamegraph.pl out.perf-folded > perf_[index|seq]_scan.svg
> 
> I attached both flame graphs. I do not see anything specific regarding what 
> the patch does, but
> instead the difference mostly seems to come down to index scan vs sequential 
> scan related
> functions. As I continue to investigate, I thought it might be useful to 
> share the flame graphs
> so that more experienced hackers could comment on the difference.   
> 
> Regarding my own end-to-end tests: In some runs, the sequential scan is 
> indeed faster for case-1. But, 
> when I execute update tbl set a=a+1; for 50 consecutive times, and measure 
> end to end performance, I see
> much better results for index scan, only case-1 is on-par as mostly I'd 
> expect.
> 
> Case-1, running the update 50 times and waiting all changes applied
> • index scan: 2minutes 36 seconds
> • sequential scan: 2minutes 30 seconds
> Case-2, running the update 50 times and waiting all changes applied
> • index scan: 1 minutes, 2 seconds
> • sequential scan: 2minutes 30 seconds
> Case-7, running the update 50 times and waiting all changes applied
> • index scan: 6 seconds
> • sequential scan: 2minutes 26seconds
> 
> 
> > # Result
> The time executing update (the average of 3 runs is taken, the unit is
> milliseconds):
> 
> Shi Yu, could it be possible for you to re-run the tests with some more runs, 
> and share the average?
> I suspect maybe your test results have a very small pool size, and some runs 
> are making
> the average slightly problematic.
>  
> In my tests, I shared the total time, which is probably also fine.
>

Thanks for your reply, I re-tested (based on
v25_0001_use_index_on_subs_when_pub_rep_ident_full.patch) and took the average
of 100 runs. The results are as follows. The unit is milliseconds.

case1
sequential scan: 1348.57
index scan: 3785.15

case2
sequential scan: 1350.26
index scan: 1754.01

case3
sequential scan: 1350.13
index scan: 1340.97

There was still some degradation in the first two cases. There are some gaps in
our test results. Some information about my test is as follows.

a. Some parameters specified in postgresql.conf.
shared_buffers = 8GB
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off

b. Executed SQL.
I executed TRUNCATE and INSERT before each UPDATE. I am not sure if you did the
same, or just executed 50 consecutive UPDATEs. If the latter one, there would be
lots of old tuples and this might have a bigger impact on sequential scan. I
tried this case (which executes 50 consecutive UPDATEs) and also saw that the
overhead is smaller than before.


Besides, I looked into the regression of this patch with `gprof`. Some results
are as follows. I think with single buffer lock, sequential scan can scan
multiple tuples (see heapgettup()), while index scan can only scan one tuple. So
in case1, which has lots of duplicate values and more tuples need to be scanned,
index scan takes longer time.

- results of `gprof`
case1:
master
  %   cumulative   self  self total   
 time   seconds   secondscalls  ms/call  ms/call  name
  1.37  0.66 0.01   654312 0.00 0.00  LWLockAttemptLock
  0.00  0.73 0.00   573358 0.00 0.00  LockBuffer
  0.00  0.73 0.0010014 0.00 0.06  heap_getnextslot

patched
  %   cumulative   self  self total   
 time   seconds   secondscalls  ms/call  ms/call  name
  9.70  1.27 0.36 50531459 0.00 0.00  LWLockAttemptLock
  3.23  2.42 0.12 100259200 0.00 0.00  LockBuffer
  6.20  1.50 0.23 50015101 0.00 0.00  heapam_index_fetch_tuple
  4.04  2.02 0.15 50015101 0.00 0.00  index_fetch_heap
  1.35  3.21 0.0510119 0.00 0.00  index_getnext_slot

case7:
master
  %   cumulative   self  self total   
 time   seconds   

RE: Allow logical replication to copy tables in binary format

2023-02-23 Thread shiy.f...@fujitsu.com
On Thu, Feb 23, 2023 12:40 PM Kuroda, Hayato/黒田 隼人  
wrote:
> 
> > > I'm not sure the combination of "copy_format = binary" and "copy_data =
> false"
> > > should be accepted or not. How do you think?
> >
> > It seems quite useless indeed to specify the format of a copy that won't
> happen.
> 
> I understood that the conbination of "copy_format = binary" and "copy_data =
> false"
> should be rejected in parse_subscription_options() and AlterSubscription(). 
> Is it
> right?
> I'm expecting that is done in next version.
> 

The copy_data option only takes effect once in CREATE SUBSCIPTION or ALTER
SUBSCIPTION REFRESH PUBLICATION command, but the copy_format option can take
affect multiple times if the subscription is refreshed multiple times. Even if
the subscription is created with copy_date=false, copy_format can take affect
when executing ALTER SUBSCIPTION REFRESH PUBLICATION. So, I am not sure we want
to reject this usage.

Besides, here are my comments on the v9 patch.
1.
src/bin/pg_dump/pg_dump.c
if (fout->remoteVersion >= 16)
-   appendPQExpBufferStr(query, " s.suborigin\n");
+   {
+   appendPQExpBufferStr(query, " s.suborigin,\n");
+   appendPQExpBufferStr(query, " s.subcopyformat\n");
+   }
else
-   appendPQExpBuffer(query, " '%s' AS suborigin\n", 
LOGICALREP_ORIGIN_ANY);
+   {
+   appendPQExpBuffer(query, " '%s' AS suborigin,\n", 
LOGICALREP_ORIGIN_ANY);
+   appendPQExpBuffer(query, " '%c' AS subcopyformat\n", 
LOGICALREP_COPY_AS_TEXT);
+   }

src/bin/psql/describe.c
if (pset.sversion >= 16)
+   {
appendPQExpBuffer(,
  ", suborigin AS 
\"%s\"\n",
  
gettext_noop("Origin"));
+   /* Copy format is only supported in v16 and higher */
+   appendPQExpBuffer(,
+ ", subcopyformat AS 
\"%s\"\n",
+ gettext_noop("Copy 
Format"));
+   }

I think we can call only once appendPQExpBuffer() for the two options which are 
supported in v16.
For example,
if (pset.sversion >= 16)
{
appendPQExpBuffer(,
  ", suborigin AS 
\"%s\"\n"
  ", subcopyformat AS 
\"%s\"\n",
  
gettext_noop("Origin"),
  gettext_noop("Copy 
Format"));
}

2.
src/bin/psql/tab-complete.c
@@ -1926,7 +1926,7 @@ psql_completion(const char *text, int start, int end)
/* ALTER SUBSCRIPTION  SET ( */
else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && 
TailMatches("SET", "("))
COMPLETE_WITH("binary", "disable_on_error", "origin", 
"slot_name",
- "streaming", "synchronous_commit");
+ "streaming", "synchronous_commit", 
"copy_format");
/* ALTER SUBSCRIPTION  SKIP ( */
else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && 
TailMatches("SKIP", "("))
COMPLETE_WITH("lsn");
@@ -3269,7 +3269,8 @@ psql_completion(const char *text, int start, int end)
else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", 
"("))
COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
  "disable_on_error", "enabled", 
"origin", "slot_name",
- "streaming", "synchronous_commit", 
"two_phase");
+ "streaming", "synchronous_commit", 
"two_phase",
+ "copy_format");


The options should be listed in alphabetical order. See commit d547f7cf5ef.

Regards,
Shi Yu


RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-23 Thread shiy.f...@fujitsu.com
On Wed, Feb 22, 2023 9:48 PM Kuroda, Hayato/黒田 隼人  
wrote:
> 
> Thank you for reviewing! PSA new version.
> 

Thanks for your patch. Here is a comment.

+   elog(DEBUG2, "time-delayed replication for txid %u, delay_time 
= %d ms, remaining wait time: %ld ms",
+ctx->write_xid, (int) ctx->min_send_delay,
+remaining_wait_time_ms);

I tried and saw that the xid here looks wrong, what it got is the xid of the
previous transaction. It seems `ctx->write_xid` has not been updated and we
can't use it.

Regards,
Shi Yu


RE: "out of relcache_callback_list slots" after multiple calls to pg_logical_slot_get_binary_changes

2023-02-22 Thread shiy.f...@fujitsu.com
On Wed, Feb 22, 2023 2:20 PM Michael Paquier  wrote:
> 
> On Wed, Feb 22, 2023 at 12:07:06PM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 22 Feb 2023 12:29:59 +1100, Peter Smith 
> wrote in
> >> If you are going to do that, then won't just copying the
> >> CacheRegisterSyscacheCallback(PUBLICATIONOID...  into function
> >> init_rel_sync_cache() be effectively the same as doing that?
> >
> > I'm not sure if it has anything to do with the relation sync cache.
> > On the other hand, moving all the content of init_rel_sync_cache() up
> > to pgoutput_startup() doesn't seem like a good idea.. Another option,
> > as you see, was to separate callback registration code.
> 
> Both are kept separate in the code, so keeping this separation makes
> sense to me.
> 
> +   /* Register callbacks if we didn't do that. */
> +   if (!callback_registered)
> +   CacheRegisterSyscacheCallback(PUBLICATIONOID,
> + publication_invalidation_cb,
> + (Datum) 0);
> 
> /* Initialize relation schema cache. */
> init_rel_sync_cache(CacheMemoryContext);
> +   callback_registered = true;
> [...]
> +   /* Register callbacks if we didn't do that. */
> +   if (!callback_registered)
> 
> I am a bit confused by the use of one single flag called
> callback_registered to track both the publication callback and the
> relation callbacks.  Wouldn't it be cleaner to use two flags?  I don't
> think that we'll have soon a second code path calling
> init_rel_sync_cache(), but if we do then the callback load could again
> be messed up.
> 

Thanks for your reply. Using two flags makes sense to me.
Attach the updated patch.

Regards,
Shi Yu


v3-0001-Avoid-duplicate-registration-of-callbacks-in-pgou.patch
Description:  v3-0001-Avoid-duplicate-registration-of-callbacks-in-pgou.patch


RE: "out of relcache_callback_list slots" after multiple calls to pg_logical_slot_get_binary_changes

2023-02-21 Thread shiy.f...@fujitsu.com
On Mon, Feb 20, 2023 11:31 PM Tom Lane  wrote:
> 
> Kyotaro Horiguchi  writes:
> > I'm pretty sure that everytime an output plugin is initialized on a
> > process, it installs the same set of syscache/relcache callbacks each
> > time.  Do you think we could simply stop duplicate registration of
> > those callbacks by using a static boolean?  It would be far simpler.
> 
> Yeah, I think that's the way it's done elsewhere.  Removing and
> re-registering your callback seems expensive, and it also destroys
> any reasoning that anyone might have made about the order in which
> different callbacks will get called.  (Admittedly, that's probably not
> important for invalidation callbacks, but it does matter for e.g.
> process exit callbacks.)
> 

Thanks for your reply. I agree that's expensive. Attach a new patch which adds a
static boolean to avoid duplicate registration.

Regards,
Shi Yu


v2-0001-Avoid-duplicate-registration-of-callbacks-in-pgou.patch
Description:  v2-0001-Avoid-duplicate-registration-of-callbacks-in-pgou.patch


RE: Allow logical replication to copy tables in binary format

2023-02-20 Thread shiy.f...@fujitsu.com
On Thu, Feb 16, 2023 8:48 PM Amit Kapila  wrote:
> 
> On Mon, Jan 30, 2023 at 4:19 PM Melih Mutlu 
> wrote:
> >
> > Logical replication between different types like int and smallint is 
> > already not
> working properly on HEAD too.
> > Yes, the scenario you shared looks like working. But you didn't create the
> subscription with binary=true. The patch did not change subscription with
> binary=false case. I believe what you should experiment is binary=true case
> which already fails in the apply phase on HEAD.
> >
> > Well, with this patch, it will begin to fail in the table copy phase. But I 
> > don't
> think this is a problem because logical replication in binary format is 
> already
> broken for replications between different data types.
> >
> 
> So, doesn't this mean that there is no separate failure mode during
> the initial copy? I am clarifying this to see if the patch really
> needs a separate copy_format option for initial sync?
> 

In the case that the data type doesn't have binary output function, for apply
phase, the column will be sent in text format (see logicalrep_write_tuple()) and
it works fine. But with copy_format = binary, the walsender exits with an
error.

For example:
-- create table on publisher and subscriber
CREATE TYPE myvarchar;
CREATE FUNCTION myvarcharin(cstring, oid, integer) RETURNS myvarchar
LANGUAGE internal IMMUTABLE PARALLEL SAFE STRICT AS 'varcharin';
CREATE FUNCTION myvarcharout(myvarchar) RETURNS cstring
LANGUAGE internal IMMUTABLE PARALLEL SAFE STRICT AS 'varcharout';
CREATE TYPE myvarchar (
input = myvarcharin,
output = myvarcharout,
alignment = integer,
storage = main
);
CREATE TABLE tbl1 (a myvarchar);

-- create publication and insert some data on publisher
create publication pub for table tbl1;
INSERT INTO tbl1 values ('a');

-- create subscription on subscriber
create subscription sub connection 'dbname=postgres port=5432' publication pub 
with(binary, copy_format = binary);

Then I got the following error in the publisher log.

walsender ERROR:  no binary output function available for type public.myvarchar
walsender STATEMENT:  COPY public.tbl1 (a) TO STDOUT  WITH (FORMAT binary)

Regards,
Shi Yu


"out of relcache_callback_list slots" after multiple calls to pg_logical_slot_get_binary_changes

2023-02-18 Thread shiy.f...@fujitsu.com
Hi hackers,

After multiple calls to the function pg_logical_slot_get_binary_changes() in
single client backend (the output plugin of the slot is pgoutput), I got the
following error:

client backend FATAL:  out of relcache_callback_list slots
client backend CONTEXT:  slot "testslot", output plugin "pgoutput", in the 
startup callback
client backend STATEMENT:  SELECT data FROM 
pg_logical_slot_get_binary_changes('testslot', NULL, NULL, 'proto_version', 
'3', 'streaming', 'off', 'publication_names', 'pub');

I tried to look into it and found that it's because every time the function
(pg_logical_slot_get_binary_changes) is called, relcache callback and syscache
callbacks are registered when initializing pgoutput (see pgoutput_startup() and
init_rel_sync_cache()), but they are not unregistered when it shutdowns. So,
after multiple calls to the function, MAX_RELCACHE_CALLBACKS is exceeded. This
is mentioned in the following comment.

/*
 * We can get here if the plugin was used in SQL interface as the
 * RelSchemaSyncCache is destroyed when the decoding finishes, but there
 * is no way to unregister the relcache invalidation callback.
 */
if (RelationSyncCache == NULL)
return;

Could we fix it by adding two new function to unregister relcache callback and
syscache callback? I tried to do so in the attached patch.

Regards,
Shi Yu


v1-0001-Unregister-syscache-callback-and-relcache-callbac.patch
Description:  v1-0001-Unregister-syscache-callback-and-relcache-callbac.patch


Missing default value of createrole_self_grant in document

2023-02-16 Thread shiy.f...@fujitsu.com
Hi hackers,

I noticed that the document of GUC createrole_self_grant doesn't mention its
default value. The attached patch adds that.

Regards,
Shi Yu


v1-0001-Add-default-value-of-createrole_self_grant-in-doc.patch
Description:  v1-0001-Add-default-value-of-createrole_self_grant-in-doc.patch


RE: run pgindent on a regular basis / scripted manner

2023-02-16 Thread shiy.f...@fujitsu.com
On Thu, Feb 9, 2023 6:10 AM Andrew Dunstan  wrote:
> Thanks, I have committed this. Still looking at Robert's other request.
>

Hi,

Commit #068a243b7 supported directories to be non-option arguments of pgindent.
But the help text doesn't mention that. Should we update it? Attach a small
patch which did that.

Regards,
Shi Yu


v1-0001-Update-help-text-of-pgindent.patch
Description: v1-0001-Update-help-text-of-pgindent.patch


RE: Add LZ4 compression in pg_dump

2023-02-15 Thread shiy.f...@fujitsu.com
On Fri, Jan 27, 2023 2:04 AM gkokola...@pm.me  wrote:
> 
> --- Original Message ---
> On Thursday, January 26th, 2023 at 12:53 PM, Michael Paquier
>  wrote:
> 
> 
> >
> >
> > On Thu, Jan 26, 2023 at 11:24:47AM +, gkokola...@pm.me wrote:
> >
> > > I gave this a little bit of thought. I think that ReadHead should not
> > > emit a warning, or at least not this warning as it is slightly misleading.
> > > It implies that it will automatically turn off data restoration, which is
> > > false. Further ahead, the code will fail with a conflicting error message
> > > stating that the compression is not available.
> > >
> > > Instead, it would be cleaner both for the user and the maintainer to
> > > move the check in RestoreArchive and make it the sole responsible for
> > > this logic.
> >
> >
> > - pg_fatal("cannot restore from compressed archive (compression not
> supported in this installation)");
> > + pg_fatal("cannot restore data from compressed archive (compression not
> supported in this installation)");
> > Hmm. I don't mind changing this part as you suggest.
> >
> > -#ifndef HAVE_LIBZ
> > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> >
> > - pg_fatal("archive is compressed, but this installation does not support
> compression");
> > -#endif
> > However I think that we'd better keep the warning, as it can offer a
> > hint when using pg_restore -l not built with compression support if
> > looking at a dump that has been compressed.
> 
> Fair enough. Please find v27 attached.
> 

Hi,

I am interested in this feature and tried the patch. While reading the comments,
I noticed some minor things that could possibly be improved (in v27-0003 patch).

1.
+   /*
+* Open a file for writing.
+*
+* 'mode' can be one of ''w', 'wb', 'a', and 'ab'. Requrires an already
+* initialized CompressFileHandle.
+*/
+   int (*open_write_func) (const char *path, const 
char *mode,
+   
CompressFileHandle *CFH);

There is a redundant single quote in front of 'w'.

2.
/*
 * Callback function for WriteDataToArchive. Writes one block of (compressed)
 * data to the archive.
 */
/*
 * Callback function for ReadDataFromArchive. To keep things simple, we
 * always read one compressed block at a time.
 */

Should the function names in the comments be updated?

WriteDataToArchive
->
writeData

ReadDataFromArchive
->
readData

3.
+   Assert(strcmp(mode, "r") == 0 || strcmp(mode, "rb") == 0);

Could we use PG_BINARY_R instead of "r" and "rb" here?

Regards,
Shi Yu



RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-02-14 Thread shiy.f...@fujitsu.com
On Sat, Feb 4, 2023 7:24 PM Amit Kapila  wrote:
> 
> On Thu, Feb 2, 2023 at 2:03 PM Önder Kalacı  wrote:
> >
> >>
> >> and if there's more
> >> than one candidate index pick any one. Additionally, we can allow
> >> disabling the use of an index scan for this particular case. If we are
> >> too worried about API change for allowing users to specify the index
> >> then we can do that later or as a separate patch.
> >>
> >
> > On v23, I dropped the planner support for picking the index. Instead, it 
> > simply
> > iterates over the indexes and picks the first one that is suitable.
> >
> > I'm currently thinking on how to enable users to override this decision.
> > One option I'm leaning towards is to add a syntax like the following:
> >
> > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
> >
> > Though, that should probably be a seperate patch. I'm going to work
> > on that, but still wanted to share v23 given picking the index sounds
> > complementary, not strictly required at this point.
> >
> 
> I agree that it could be a separate patch. However, do you think we
> need some way to disable picking the index scan? This is to avoid
> cases where sequence scan could be better or do we think there won't
> exist such a case?
> 

I think such a case exists. I tried the following cases based on v23 patch.

# Step 1.
Create publication, subscription and tables.
-- on publisher
create table tbl (a int);
alter table tbl replica identity full;
create publication pub for table tbl;

-- on subscriber
create table tbl (a int);
create index idx_a on tbl(a);
create subscription sub connection 'dbname=postgres port=5432' publication pub;

# Step 2.
Setup synchronous replication.

# Step 3.
Execute SQL query on publisher.

-- case 1 (All values are duplicated)
truncate tbl;
insert into tbl select 1 from generate_series(0,1)i;
update tbl set a=a+1;

-- case 2
truncate tbl;
insert into tbl select i%3 from generate_series(0,1)i;
update tbl set a=a+1;

-- case 3
truncate tbl;
insert into tbl select i%5 from generate_series(0,1)i;
update tbl set a=a+1;

-- case 4
truncate tbl;
insert into tbl select i%10 from generate_series(0,1)i;
update tbl set a=a+1;

-- case 5
truncate tbl;
insert into tbl select i%100 from generate_series(0,1)i;
update tbl set a=a+1;

-- case 6
truncate tbl;
insert into tbl select i%1000 from generate_series(0,1)i;
update tbl set a=a+1;

-- case 7 (No duplicate value)
truncate tbl;
insert into tbl select i from generate_series(0,1)i;
update tbl set a=a+1;

# Result
The time executing update (the average of 3 runs is taken, the unit is
milliseconds):

++-+-+
|| patched |  master |
++-+-+
| case 1 | 3933.68 | 1298.32 |
| case 2 | 1803.46 | 1294.42 |
| case 3 | 1380.82 | 1299.90 |
| case 4 | 1042.60 | 1300.20 |
| case 5 |  691.69 | 1297.51 |
| case 6 |  578.50 | 1300.69 |
| case 7 |  566.45 | 1302.17 |
++-+-+

In case 1~3, there's an overhead after applying the patch. In other cases, the
patch improved the performance. As more duplicate values, the greater the
overhead after applying the patch.

Regards,
Shi Yu



RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-02-14 Thread shiy.f...@fujitsu.com
On Mon, Feb 13, 2023 7:01 PM shiy.f...@fujitsu.com  
wrote:
> 
> On Thu, Feb 2, 2023 4:34 PM Önder Kalacı  wrote:
> >
> >>
> >> and if there's more
> >> than one candidate index pick any one. Additionally, we can allow
> >> disabling the use of an index scan for this particular case. If we are
> >> too worried about API change for allowing users to specify the index
> >> then we can do that later or as a separate patch.
> >>
> >
> > On v23, I dropped the planner support for picking the index. Instead, it 
> > simply
> > iterates over the indexes and picks the first one that is suitable.
> >
> > I'm currently thinking on how to enable users to override this decision.
> > One option I'm leaning towards is to add a syntax like the following:
> >
> > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
> >
> > Though, that should probably be a seperate patch. I'm going to work
> > on that, but still wanted to share v23 given picking the index sounds
> > complementary, not strictly required at this point.
> >
> 
> Thanks for your patch. Here are some comments.
> 

Hi,

Here are some comments on the test cases.

1. in test case "SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX"
+# now, ingest more data and create index on column y which has higher 
cardinality
+# so that the future commands use the index on column y
+$node_publisher->safe_psql('postgres',
+   "INSERT INTO test_replica_id_full SELECT 50, i FROM 
generate_series(0,3100)i;");
+$node_subscriber->safe_psql('postgres',
+   "CREATE INDEX test_replica_id_full_idy ON test_replica_id_full(y)");

We don't pick the cheapest index in the current patch, so should we modify this
part of the test?

BTW, the following comment in FindLogicalRepUsableIndex() need to be changed,
too.

+* We are looking for one more opportunity for using an index. 
If
+* there are any indexes defined on the local relation, try to 
pick
+* the cheapest index.


2. Is there any reasons why we need the test case "SUBSCRIPTION USES INDEX WITH
DROPPED COLUMNS"? Has there been a problem related to dropped columns before?

3. in test case "SUBSCRIPTION USES INDEX ON PARTITIONED TABLES"
+# deletes rows and moves between partitions
+$node_publisher->safe_psql('postgres',
+   "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+   "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");

"moves between partitions" in the comment seems wrong.

4. in test case "SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS"
+# update 2 rows
+$node_publisher->safe_psql('postgres',
+   "UPDATE people SET firstname = 'Nan' WHERE firstname = 
'first_name_1';");
+$node_publisher->safe_psql('postgres',
+   "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_2' 
AND lastname = 'last_name_2';");
+
+# make sure the index is not used on the subscriber
+$result = $node_subscriber->safe_psql('postgres',
+   "select idx_scan from pg_stat_all_indexes where indexrelname = 
'people_names'");
+is($result, qq(0), 'ensure subscriber tap_sub_rep_full updates two rows via 
seq. scan with index on expressions');
+

I think it would be better to call wait_for_catchup() before the check because
we want to check the index is NOT used. Otherwise the check may pass because the
rows have not yet been updated on subscriber.

5. in test case "SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN"
+# show that index is not used even when enable_indexscan=false
+$result = $node_subscriber->safe_psql('postgres',
+   "select idx_scan from pg_stat_all_indexes where indexrelname = 
'test_replica_id_full_idx'");
+is($result, qq(0), 'ensure subscriber has not used index with 
enable_indexscan=false');

Should we remove the word "even" in the comment?

6. 
In each test case we re-create publications, subscriptions, and tables. Could we
create only one publication and one subscription at the beginning, and use them
in all test cases? I think this can save some time running the test file.

Regards,
Shi Yu


RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-02-13 Thread shiy.f...@fujitsu.com
On Thu, Feb 2, 2023 4:34 PM Önder Kalacı  wrote:
>
>>
>> and if there's more
>> than one candidate index pick any one. Additionally, we can allow
>> disabling the use of an index scan for this particular case. If we are
>> too worried about API change for allowing users to specify the index
>> then we can do that later or as a separate patch.
>>
>
> On v23, I dropped the planner support for picking the index. Instead, it 
> simply
> iterates over the indexes and picks the first one that is suitable.
>
> I'm currently thinking on how to enable users to override this decision.
> One option I'm leaning towards is to add a syntax like the following:
>
> ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
>
> Though, that should probably be a seperate patch. I'm going to work
> on that, but still wanted to share v23 given picking the index sounds
> complementary, not strictly required at this point.
>

Thanks for your patch. Here are some comments.

1.
I noticed that get_usable_indexoid() is called in apply_handle_update_internal()
and apply_handle_delete_internal() to get the usable index. Could usableIndexOid
be a parameter of these two functions? Because we have got the
LogicalRepRelMapEntry when calling them and if we do so, we can get
usableIndexOid without get_usable_indexoid(). Otherwise for partitioned tables,
logicalrep_partition_open() is called in get_usable_indexoid() and searching
the entry via hash_search() will increase cost.

2.
+* This attribute is an expression, and
+* SuitableIndexPathsForRepIdentFull() was called 
earlier when the
+* index for subscriber was selected. There, the indexes
+* comprising *only* expressions have already been 
eliminated.

The comment looks need to be updated:
SuitableIndexPathsForRepIdentFull
->
FindUsableIndexForReplicaIdentityFull

3.

/* Build scankey for every attribute in the index. */
-   for (attoff = 0; attoff < 
IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
+   for (index_attoff = 0; index_attoff < 
IndexRelationGetNumberOfKeyAttributes(idxrel);
+index_attoff++)
{

Should the comment be changed? Because we skip the attributes that are 
expressions.

4.
+   Assert(RelationGetReplicaIndex(rel) != 
RelationGetRelid(idxrel) &&
+  RelationGetPrimaryKeyIndex(rel) != 
RelationGetRelid(idxrel));

Maybe we can call the new function idxIsRelationIdentityOrPK()?

Regards,
Shi Yu


RE: run pgindent on a regular basis / scripted manner

2023-02-09 Thread shiy.f...@fujitsu.com
On Thu, Feb 9, 2023 6:10 AM Andrew Dunstan  wrote:
> Thanks, I have committed this. Still looking at Robert's other request.
>

Hi,

I tried the new option --commit and found that it seems to try to indent files
which are deleted in the specified commit and reports an error.
 
cannot open file "src/backend/access/brin/brin.c": No such file or directory

It looks we should filter such files.

Regards,
Shi Yu


RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-02-06 Thread shiy.f...@fujitsu.com

On Thu, Feb 2, 2023 11:48 AM shveta malik  wrote:
> 
> On Wed, Feb 1, 2023 at 5:42 PM Melih Mutlu 
> wrote:
> >
> > Hi Shveta,
> >
> > shveta malik , 1 Şub 2023 Çar, 15:01 tarihinde
> şunu yazdı:
> >>
> >> On Wed, Feb 1, 2023 at 5:05 PM Melih Mutlu 
> wrote:
> >> 2) I found a crash in the previous patch (v9), but have not tested it
> >> on the latest yet. Crash happens when all the replication slots are
> >> consumed and we are trying to create new. I tweaked the settings like
> >> below so that it can be reproduced easily:
> >> max_sync_workers_per_subscription=3
> >> max_replication_slots = 2
> >> and then ran the test case shared by you. I think there is some memory
> >> corruption happening. (I did test in debug mode, have not tried in
> >> release mode). I tried to put some traces to identify the root-cause.
> >> I observed that worker_1 keeps on moving from 1 table to another table
> >> correctly, but at some point, it gets corrupted i.e. origin-name
> >> obtained for it is wrong and it tries to advance that and since that
> >> origin does not exist, it  asserts and then something else crashes.
> >> From log: (new trace lines added by me are prefixed by shveta, also
> >> tweaked code to have my comment 1 fixed to have clarity on worker-id).
> >>
> >> form below traces, it is clear that worker_1 was moving from one
> >> relation to another, always getting correct origin 'pg_16688_1', but
> >> at the end it got 'pg_16688_49' which does not exist. Second part of
> >> trace shows who updated 'pg_16688_49', it was done by worker_49
> which
> >> even did not get chance to create this origin due to max_rep_slot
> >> reached.
> >
> >
> > Thanks for investigating this error. I think it's the same error as the one 
> > Shi
> reported earlier. [1]
> > I couldn't reproduce it yet but will apply your tweaks and try again.
> > Looking into this.
> >
> > [1] https://www.postgresql.org/message-
> id/OSZPR01MB631013C833C98E826B3CFCB9FDC69%40OSZPR01MB6310.jpn
> prd01.prod.outlook.com
> >
> 
> Hi Melih,
> I think I am able to identify the root cause. It is not memory
> corruption, but the way origin-names are stored in system-catalog
> mapped to a particular relation-id before even those are created.
> 
> After adding few more logs:
> 
> [4227] LOG:  shveta- LogicalRepSyncTableStart- worker_49 constructed
> originname :pg_16684_49, relid:16540
> [4227] LOG:  shveta- LogicalRepSyncTableStart- worker_49
> updated-origin in system catalog:pg_16684_49,
> slot:pg_16684_sync_49_7195149685251088378, relid:16540
> [4227] LOG:  shveta- LogicalRepSyncTableStart- worker_49
> get-origin-id2 originid:0, originname:pg_16684_49
> [4227] ERROR:  could not create replication slot
> "pg_16684_sync_49_7195149685251088378": ERROR:  all replication slots
> are in use
> HINT:  Free one or increase max_replication_slots.
> 
> 
> [4428] LOG:  shveta- LogicalRepSyncTableStart- worker_148 constructed
> originname :pg_16684_49, relid:16540
> [4428] LOG:  could not drop replication slot
> "pg_16684_sync_49_7195149685251088378" on publisher: ERROR:
> replication slot "pg_16684_sync_49_7195149  685251088378" does not
> exist
> [4428] LOG:  shveta- LogicalRepSyncTableStart- worker_148 drop-origin
> originname:pg_16684_49
> [4428] LOG:  shveta- LogicalRepSyncTableStart- worker_148
> updated-origin:pg_16684_49,
> slot:pg_16684_sync_148_7195149685251088378, relid:16540
> 
> So from above, worker_49 came and picked up relid:16540, constructed
> origin-name and slot-name and updated in system-catalog and then it
> tried to actually create that slot and origin but since max-slot count
> was reached, it failed and exited, but did not do cleanup from system
> catalog for that relid.
> 
> Then worker_148 came and also picked up table 16540 since it was not
> completed/started by previous worker, but this time it found that
> origin and slot entry present in system-catalog against this relid, so
> it picked the same names and started processing, but since those do
> not exist, it asserted while advancing the origin.
> 
> The assert is only reproduced when an already running worker (say
> worker_1) who has 'created=true' set, gets to sync the relid for which
> a previously failed worker has tried and updated origin-name w/o
> creating it. In such a case worker_1 (with created=true) will try to
> reuse the origin and thus will try to advance it and will end up
> asserting. That is why you might not see the assertion always. The
> condition 'created' is set to true for that worker and it goes to
> reuse the origin updated by the previous worker.
> 
> So to fix this, I think either we update origin and slot entries in
> the system catalog after the creation has passed or we clean-up the
> system catalog in case of failure. What do you think?
> 

I think the first way seems better.

I reproduced the problem I reported before with latest patch (v7-0001,
v10-0002), and looked into this problem. It is caused by a similar reason. Here
is some analysis 

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-01 Thread shiy.f...@fujitsu.com
On Mon, Jan 30, 2023 6:05 PM Takamichi Osumi (Fujitsu) 
 wrote:
> 
> On Saturday, January 28, 2023 1:28 PM I wrote:
> > Attached the updated patch v24.
> Hi,
> 
> 
> I've conducted the rebase affected by the commit(1e8b61735c)
> by renaming the GUC to logical_replication_mode accordingly,
> because it's utilized in the TAP test of this time-delayed LR feature.
> There is no other change for this version.
> 
> Kindly have a look at the attached v25.
> 

Thanks for your patch. Here are some comments.

1.
+   /*
+* The min_apply_delay parameter is ignored until all tablesync workers
+* have reached READY state. This is because if we allowed the delay
+* during the catchup phase, then once we reached the limit of tablesync
+* workers it would impose a delay for each subsequent worker. That 
would
+* cause initial table synchronization completion to take a long time.
+*/
+   if (!AllTablesyncsReady())
+   return;

I saw that the new parameter becomes effective after all tables are in ready
state, because the apply worker can't set the state to catchup during the delay.
But can we call process_syncing_tables() in the while-loop of
maybe_apply_delay()? Then the tablesync can finish without delay. If we can't do
so, it might be better to add some comments for it.

2.
+# Make sure the apply worker knows to wait for more than 500ms
+check_apply_delay_log($node_subscriber, $offset, "0.5");

I think the last parameter should be 500.
Besides, I am not sure it's a stable test to check the log. Is it possible that
there's no such log on a slow machine? I modified the code to sleep 1s at the
beginning of apply_dispatch(), then the new added test failed because the server
log cannot match.

Regards,
Shi yu




RE: Logical replication timeout problem

2023-01-29 Thread shiy.f...@fujitsu.com
On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com  
wrote:
> 
> I tested a mix transaction of transactional and non-transactional messages on
> the current HEAD and reproduced the timeout problem. I think this result is 
> OK.
> Because when decoding a transaction, non-transactional changes are processed
> directly and the function WalSndKeepaliveIfNecessary is called, while
> transactional changes are cached and processed after decoding. After decoding,
> only transactional changes will be processed (in the function
> ReorderBufferProcessTXN), so the timeout problem will still be reproduced.
> 
> After applying the v8 patch, the test mentioned above didn't reproduce the
> timeout problem (Attach this test script 'test_with_nontransactional.sh').
> 
> Attach the new patch.
> 

Thanks for updating the patch. Here is a comment.

In update_progress_txn_cb_wrapper(), it looks like we need to reset
changes_count to 0 after calling OutputPluginUpdateProgress(), otherwise
OutputPluginUpdateProgress() will always be called after 100 changes.

Regards,
Shi yu



RE: Update comments in multixact.c

2023-01-18 Thread shiy.f...@fujitsu.com
On Wed, Jan 18, 2023 6:04 AM Peter Geoghegan  wrote:
> 
> On Tue, Jan 17, 2023 at 1:33 AM shiy.f...@fujitsu.com
>  wrote:
> > I noticed that commit 5212d447fa updated some comments in multixact.c
> because
> > SLRU truncation for multixacts is performed during VACUUM, instead of
> > checkpoint. Should the following comments which mentioned checkpointer be
> > changed, too?
> 
> Yes, I think so.

Thanks for your reply.

Attach a patch which fixed them.

Regards,
Shi yu


v1-0001-Update-comments-in-multixact.c.patch
Description: v1-0001-Update-comments-in-multixact.c.patch


Update comments in multixact.c

2023-01-17 Thread shiy.f...@fujitsu.com
Hi,

I noticed that commit 5212d447fa updated some comments in multixact.c because
SLRU truncation for multixacts is performed during VACUUM, instead of
checkpoint. Should the following comments which mentioned checkpointer be
changed, too?

1.
* we compute it (using nextMXact if none are valid).  Each backend is
* required not to attempt to access any SLRU data for MultiXactIds older
* than its own OldestVisibleMXactId[] setting; this is necessary because
* the checkpointer could truncate away such data at any instant.

2.
 * We set the OldestVisibleMXactId for a given transaction the first time
 * it's going to inspect any MultiXactId.  Once we have set this, we are
 * guaranteed that the checkpointer won't truncate off SLRU data for
 * MultiXactIds at or after our OldestVisibleMXactId.

Regards,
Shi yu




RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-01-16 Thread shiy.f...@fujitsu.com
On Wed, Jan 11, 2023 4:31 PM Melih Mutlu  wrote:
> 
> Hi hackers,
> 
> Rebased the patch to resolve conflicts.
> 

Thanks for your patch. Here are some comments.

0001 patch

1. walsender.c
+   /* Create a tuple to send consisten WAL location */

"consisten" should be "consistent" I think.

2. logical.c
+   if (need_full_snapshot)
+   {
+   LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+
+   SpinLockAcquire(>mutex);
+   slot->effective_catalog_xmin = xmin_horizon;
+   slot->data.catalog_xmin = xmin_horizon;
+   slot->effective_xmin = xmin_horizon;
+   SpinLockRelease(>mutex);
+
+   xmin_horizon = 
GetOldestSafeDecodingTransactionId(!need_full_snapshot);
+   ReplicationSlotsComputeRequiredXmin(true);
+
+   LWLockRelease(ProcArrayLock);
+   }

It seems that we should first get the safe decoding xid, then inform the slot
machinery about the new limit, right? Otherwise the limit will be
InvalidTransactionId and that seems inconsistent with the comment.

3. doc/src/sgml/protocol.sgml
+   is used in the currenct transaction. This command is currently only 
supported
+   for logical replication.
+   slots.

We don't need to put "slots" in a new line.


0002 patch

1.
In pg_subscription_rel.h, I think the type of "srrelslotname" can be changed to
NameData, see "subslotname" in pg_subscription.h.

2.
+* Find the logical replication sync worker if 
exists store
+* the slot number for dropping associated 
replication slots
+* later.

Should we add comma after "if exists"?

3.
+   PG_FINALLY();
+   {
+   pfree(cmd.data);
+   }
+   PG_END_TRY();
+   \
+   return tablelist;
+}

Do we need the backslash?

4.
+   /*
+* Advance to the LSN got from walrcv_create_slot. This is WAL
+* logged for the purpose of recovery. Locks are to prevent the
+* replication origin from vanishing while advancing.

"walrcv_create_slot" should be changed to
"walrcv_create_slot/walrcv_slot_snapshot" I think.

5.
+   /* Replication drop might still exist. Try to drop */
+   replorigin_drop_by_name(originname, true, false);

Should "Replication drop" be "Replication origin"?

6.
I saw an assertion failure in the following case, could you please look into it?
The backtrace is attached.

-- pub
CREATE TABLE tbl1 (a int, b text);
CREATE TABLE tbl2 (a int primary key, b text);
create publication pub for table tbl1, tbl2;
insert into tbl1 values (1, 'a');
insert into tbl1 values (1, 'a');

-- sub
CREATE TABLE tbl1 (a int primary key, b text);
CREATE TABLE tbl2 (a int primary key, b text);
create subscription sub connection 'dbname=postgres port=5432' publication pub;

Subscriber log:
2023-01-17 14:47:10.054 CST [1980841] LOG:  logical replication apply worker 
for subscription "sub" has started
2023-01-17 14:47:10.060 CST [1980843] LOG:  logical replication table 
synchronization worker for subscription "sub", table "tbl1" has started
2023-01-17 14:47:10.070 CST [1980845] LOG:  logical replication table 
synchronization worker for subscription "sub", table "tbl2" has started
2023-01-17 14:47:10.073 CST [1980843] ERROR:  duplicate key value violates 
unique constraint "tbl1_pkey"
2023-01-17 14:47:10.073 CST [1980843] DETAIL:  Key (a)=(1) already exists.
2023-01-17 14:47:10.073 CST [1980843] CONTEXT:  COPY tbl1, line 2
2023-01-17 14:47:10.074 CST [1980821] LOG:  background worker "logical 
replication worker" (PID 1980843) exited with exit code 1
2023-01-17 14:47:10.083 CST [1980845] LOG:  logical replication table 
synchronization worker for subscription "sub", table "tbl2" has finished
2023-01-17 14:47:10.083 CST [1980845] LOG:  logical replication table 
synchronization worker for subscription "sub" has moved to sync table "tbl1".
TRAP: failed Assert("node != InvalidRepOriginId"), File: "origin.c", Line: 892, 
PID: 1980845

Regards,
Shi yu
#0  0x7f38bb78baff in raise () from /lib64/libc.so.6
#1  0x7f38bb75eea5 in abort () from /lib64/libc.so.6
#2  0x00b44501 in ExceptionalCondition (conditionName=0xd0deac "node != 
InvalidRepOriginId", fileName=0xd0da4d "origin.c", lineNumber=892) at 
assert.c:66
#3  0x008f93f5 in replorigin_advance (node=0, remote_commit=22144256, 
local_commit=0, go_backward=true, wal_log=true) at origin.c:892
#4  0x0090cf2f in LogicalRepSyncTableStart 
(origin_startpos=0x7ffcbe54e808) at tablesync.c:1641
#5  0x00913ee4 in start_table_sync (origin_startpos=0x7ffcbe54e808, 
myslotname=0x7ffcbe54e790) at worker.c:4419
#6  0x009140be in run_tablesync_worker (options=0x7ffcbe54e7c0, 
slotname=0x0, originname=0x7ffcbe54e810 "pg_16398_3", originname_size=64, 
origin_startpos=0x7ffcbe54e808)
at worker.c:4497
#7  

RE: Fix pg_publication_tables to exclude generated columns

2023-01-11 Thread shiy.f...@fujitsu.com
On Wed, Jan 11, 2023 2:40 PM Amit Kapila  wrote:
> 
> On Wed, Jan 11, 2023 at 10:07 AM Tom Lane  wrote:
> >
> > Amit Kapila  writes:
> > >> On Mon, Jan 9, 2023 11:06 PM Tom Lane  wrote:
> > >>> We could just not fix it in the back branches.  I'd argue that this is
> > >>> as much a definition change as a bug fix, so it doesn't really feel
> > >>> like something to back-patch anyway.
> >
> > > So, if we don't backpatch then it could lead to an error when it
> > > shouldn't have which is clearly a bug. I think we should backpatch
> > > this unless Tom or others are against it.
> >
> > This isn't a hill that I'm ready to die on ... but do we have any field
> > complaints about this?  If not, I still lean against a back-patch.
> > I think there's a significant risk of breaking case A while fixing
> > case B when we change this behavior, and that's something that's
> > better done only in a major release.
> >
> 
> Fair enough, but note that there is a somewhat related problem for
> dropped columns [1] as well. While reviewing that it occurred to me
> that generated columns also have a similar problem which leads to this
> thread (it would have been better if there is a mention of the same in
> the initial email). Now, as symptoms are similar, I think we shouldn't
> back-patch that as well, otherwise, it will appear to be partially
> fixed. What do you think?
> 
> [1] - https://www.postgresql.org/message-
> id/OSZPR01MB631087C65BA81E1FEE5A60D2FDF59%40OSZPR01MB6310.jpnpr
> d01.prod.outlook.com
> 

I agree to only fix them on HEAD.

I merged this patch and the one in [1] as they are similar problems. Please
see the attached patch.

I removed the changes in tablesync.c which simplified the query in
fetch_remote_table_info(), because it only works for publishers of v16. Those
changes are based on pg_get_publication_tables() returning all columns when no
column list is specified, but publishers of v15 return NULL in that case.

[1] 
https://www.postgresql.org/message-id/OSZPR01MB631087C65BA81E1FEE5A60D2FDF59%40OSZPR01MB6310.jpnprd01.prod.outlook.com


Regards,
Shi yu


v2-0001-Ignore-dropped-columns-and-generated-columns-when.patch
Description:  v2-0001-Ignore-dropped-columns-and-generated-columns-when.patch


v2-0002-Tests-for-checking-column-list-restriction.patch
Description: v2-0002-Tests-for-checking-column-list-restriction.patch


RE: Allow logical replication to copy tables in binary format

2023-01-11 Thread shiy.f...@fujitsu.com
On Mon, Nov 14, 2022 8:08 PM Melih Mutlu  wrote:
>
> Attached patch with updated version of this patch.

Thanks for your patch.

I tried to do a performance test for this patch, the result looks good to me.
(The steps are similar to what Melih shared [1].)

The time to synchronize about 1GB data in binary (the average of the middle
eight was taken):
HEAD: 16.854 s
Patched: 6.805 s

Besides, here are some comments.

1.
+# Binary enabled subscription should fail
+$node_subscriber_binary->wait_for_log("ERROR:  insufficient data left in 
message");

Should it be changed to "ERROR: ( [A-Z0-9]+:)? ", like other subscription tests.

2.
+# Binary disabled subscription should succeed
+$node_publisher->wait_for_catchup('tap_sub');

If we want to wait for table synchronization to finish, should we call
wait_for_subscription_sync()?

3.
I also think it might be better to support copy binary only for publishers of
v16 or later. Do you plan to implement it in the patch?
  
[1] 
https://www.postgresql.org/message-id/CAGPVpCQEKDVKQPf6OFQ-9WiRYB1YRejm--YJTuwgzuvj1LEJ2A%40mail.gmail.com

Regards,
Shi yu



RE: Fix pg_publication_tables to exclude generated columns

2023-01-09 Thread shiy.f...@fujitsu.com
On Mon, Jan 9, 2023 11:06 PM Tom Lane  wrote:
> 
> Amit Kapila  writes:
> > On Mon, Jan 9, 2023 at 5:29 PM shiy.f...@fujitsu.com
> >  wrote:
> >> I think one way to fix it is to modify pg_publication_tables query to 
> >> exclude
> >> generated columns. But in this way, we need to bump catalog version when
> fixing
> >> it in back-branch. Another way is to modify function
> >> pg_get_publication_tables()'s return value to contain all supported columns
> if
> >> no column list is specified, and we don't need to change system view.
> 
> > That sounds like a reasonable approach to fix the issue.
> 
> We could just not fix it in the back branches.  I'd argue that this is
> as much a definition change as a bug fix, so it doesn't really feel
> like something to back-patch anyway.
> 

If this is not fixed in back-branch, in some cases we will get an error when
creating/refreshing subscription because we query pg_publication_tables in
column list check.

e.g.

-- publisher
CREATE TABLE test_mix_4 (a int PRIMARY KEY, b int, c int, d int GENERATED 
ALWAYS AS (a + 1) STORED);
CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 (a, b, c);
CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4;

-- subscriber
CREATE TABLE test_mix_4 (a int PRIMARY KEY, b int, c int, d int);

postgres=# CREATE SUBSCRIPTION sub1 CONNECTION 'port=5432' PUBLICATION 
pub_mix_7, pub_mix_8;
ERROR:  cannot use different column lists for table "public.test_mix_4" in 
different publications

I think it might be better to fix it in back-branch. And if we fix it by
modifying pg_get_publication_tables(), we don't need to bump catalog version in
back-branch, I think this seems acceptable.

Regards,
Shi yu




Fix pg_publication_tables to exclude generated columns

2023-01-09 Thread shiy.f...@fujitsu.com
Hi hackers,

I noticed that there is a problem about system view pg_publication_tables when
looking into [1]. The column "attnames" contains generated columns when no
column list is specified, but generated columns shouldn't be included because
they are not replicated (see send_relation_and_attrs()).

I think one way to fix it is to modify pg_publication_tables query to exclude
generated columns. But in this way, we need to bump catalog version when fixing
it in back-branch. Another way is to modify function
pg_get_publication_tables()'s return value to contain all supported columns if
no column list is specified, and we don't need to change system view.

Attach the patch for HEAD, and we can ignore the changes of the system view in
PG15.

[1] 
https://www.postgresql.org/message-id/OSZPR01MB631087C65BA81E1FEE5A60D2FDF59%40OSZPR01MB6310.jpnprd01.prod.outlook.com

Regards,
Shi yu




v1-0002-Test-for-column-list-check-in-tablesync.patch
Description: v1-0002-Test-for-column-list-check-in-tablesync.patch


v1-0001-Modify-pg_get_publication_tables-to-show-all-colu.patch
Description:  v1-0001-Modify-pg_get_publication_tables-to-show-all-colu.patch


RE: Force streaming every change in logical decoding

2023-01-05 Thread shiy.f...@fujitsu.com
On Thu, Dec 22, 2022 3:17 PM Masahiko Sawada  wrote:
> >  I think
> > instead of adding new tests with this patch, it may be better to
> > change some of the existing tests related to streaming to use this
> > parameter as that will clearly show one of the purposes of this patch.
> 
> +1. I think test_decoding/sql/stream.sql and spill.sql are good
> candidates and we change logical replication TAP tests in a separate
> patch.
> 

Hi,

I tried to reduce the data size in test_decoding tests by using the new added
GUC "logical_decoding_mode", and found that the new GUC is not suitable for
some cases.

For example, in the following cases in stream.sql (and there are some similar
cases in twophase_stream.sql), the changes in sub transaction exceed
logical_decoding_work_mem, but they won't be streamed because the it is rolled
back. (But the transaction is marked as streamed.) After the sub transaction,
there is a small amount of inserts, as logical_decoding_work_mem is not
exceeded, it will be streamed when decoding COMMIT. If we use
logical_decoding_mode=immediate, the small amount of inserts in toplevel
transaction will be streamed immediately. This is different from before, so I'm
not sure we can use logical_decoding_mode in this case.

```
-- streaming test with sub-transaction
BEGIN;
savepoint s1;
SELECT 'msg5' FROM pg_logical_emit_message(true, 'test', repeat('a', 50));
INSERT INTO stream_test SELECT repeat('a', 2000) || g.i FROM generate_series(1, 
35) g(i);
TRUNCATE table stream_test;
rollback to s1;
INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 
20) g(i);
COMMIT;
```

Besides, some cases in spill.sql can't use the new GUC because they want to
serialize when processing the specific toplevel transaction or sub transaction.
For example, in the case below, the changes in the subxact exceed
logical_decoding_work_mem and are serialized, and the insert after subxact is
not serialized. If logical_decoding_mode is used, both of them will be
serialized. This is not what we want to test.

```
-- spilling subxact, non-spilling main xact
BEGIN;
SAVEPOINT s;
INSERT INTO spill_test SELECT 'serialize-subbig-topsmall--1:'||g.i FROM 
generate_series(1, 5000) g(i);
RELEASE SAVEPOINT s;
INSERT INTO spill_test SELECT 'serialize-subbig-topsmall--2:'||g.i FROM 
generate_series(5001, 5001) g(i);
COMMIT;
```

Although the rest cases in spill.sql can use new GUC, but it needs set and reset
logical_decoding_mode many times. Besides, the tests in this file were added
before logical_decoding_work_mem was introduced, I am not sure if we can modify
these cases by using the new GUC.

Also, it looks the tests for toast in stream.sql can't be changed, too.

Due to the above reasons, it seems that currently few tests can be modified to
use logical_decoding_mode. If later we find some tests can changed to use
it, we can do it in a separate thread. I will close the CF entry.

Regards,
Shi yu


Ignore dropped columns when checking the column list in logical replication

2023-01-03 Thread shiy.f...@fujitsu.com
Hi hackers,

I saw a problem related to column list.

There's a restriction that different column lists for same table can't be used
in the publications of single subscription. But we will get unexpected errors in
some cases because the dropped columns are not ignored.

For example:
-- publisher
create table tbl1 (a int primary key, b int, c int);
create publication pub1 for table tbl1(a,b);
create publication pub2 for table tbl1;
alter table tbl1 drop column c;

-- subscriber
create table tbl1 (a int primary key, b int, c int);
create subscription sub connection 'dbname=postgres port=5432' publication 
pub1, pub2;

-- publisher
insert into tbl1 values (1,2);

The publisher and subscriber will report the error:
ERROR:  cannot use different column lists for table "public.tbl1" in different 
publications

This is caused by:
a. walsender (in pgoutput_column_list_init())
/*
 * If column list includes all the columns of the table,
 * set it to NULL.
 */
if (bms_num_members(cols) == 
RelationGetNumberOfAttributes(relation))
{
bms_free(cols);
cols = NULL;
}

The returned value of RelationGetNumberOfAttributes() contains dropped columns.

b. table synchronization (in fetch_remote_table_info())
appendStringInfo(,
 "SELECT DISTINCT"
 "  (CASE WHEN (array_length(gpt.attrs, 1) = 
c.relnatts)"
 "   THEN NULL ELSE gpt.attrs END)"
 "  FROM pg_publication p,"
 "  LATERAL pg_get_publication_tables(p.pubname) gpt,"
 "  pg_class c"
 " WHERE gpt.relid = %u AND c.oid = gpt.relid"
 "   AND p.pubname IN ( %s )",
 lrel->remoteid,
 pub_names.data);

If the result of the above SQL contains more than one tuple, an error will be
report (cannot use different column lists for table ...). In this SQL, attrs is
NULL if `array_length(gpt.attrs, 1) = c.relnatts`, but `c.relnatts` contains
dropped columns, what we want is the count of alive columns.

I tried to fix them in the attached patch.

Regards,
Shi yu


v1-0001-Ignore-dropped-columns-when-checking-the-column-l.patch
Description:  v1-0001-Ignore-dropped-columns-when-checking-the-column-l.patch


RE: Force streaming every change in logical decoding

2022-12-23 Thread shiy.f...@fujitsu.com
On Fri, Dec 23, 2022 1:50 PM Amit Kapila 
> 
> On Thu, Dec 22, 2022 at 6:18 PM shiy.f...@fujitsu.com
>  wrote:
> >
> >
> > Besides, I tried to reduce data size in streaming subscription tap tests by 
> > this
> > new GUC (see 0002 patch). But I didn't covert all streaming tap tests
> because I
> > think we also need to cover the case that there are lots of changes. So, 
> > 015*
> is
> > not modified. And 017* is not modified because streaming transactions and
> > non-streaming transactions are tested alternately in this test.
> >
> 
> I think we can remove the newly added test from the patch and instead
> combine the 0001 and 0002 patches. I think we should leave the
> 022_twophase_cascade as it is because it can impact code coverage,
> especially the below part of the test:
> # 2PC PREPARE with a nested ROLLBACK TO SAVEPOINT
> $node_A->safe_psql(
> 'postgres', "
> BEGIN;
> INSERT INTO test_tab VALUES (, 'foobar');
> SAVEPOINT sp_inner;
> INSERT INTO test_tab SELECT i, md5(i::text) FROM
> generate_series(3, 5000) s(i);
> 
> Here, we will stream first time after the subtransaction, so can
> impact the below part of the code in ReorderBufferStreamTXN:
> if (txn->snapshot_now == NULL)
> {
> ...
> dlist_foreach(subxact_i, >subtxns)
> {
> ReorderBufferTXN *subtxn;
> 
> subtxn = dlist_container(ReorderBufferTXN, node, subxact_i.cur);
> ReorderBufferTransferSnapToParent(txn, subtxn);
> }
> ...
> 

OK, I removed the modification in 022_twophase_cascade.pl and combine the two 
patches.

Please see the attached patch.
I also fixed Kuroda-san's comments[1].

[1] 
https://www.postgresql.org/message-id/TYAPR01MB5866CD99CF86EAC84119BC91F5E99%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Regards,
Shi yu


v6-0001-Add-logical_decoding_mode-GUC.patch
Description: v6-0001-Add-logical_decoding_mode-GUC.patch


RE: Force streaming every change in logical decoding

2022-12-22 Thread shiy.f...@fujitsu.com
On Thu, Dec 22, 2022 5:24 PM Amit Kapila  wrote:
> 
> On Thu, Dec 22, 2022 at 1:15 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Thu, 22 Dec 2022 12:35:46 +0530, Amit Kapila
>  wrote in
> > > I have addressed these comments in the attached. Additionally, I have
> > > modified the docs and commit messages to make those clear. I think
> > > instead of adding new tests with this patch, it may be better to
> > > change some of the existing tests related to streaming to use this
> > > parameter as that will clearly show one of the purposes of this patch.
> >
> > Being late but I'm warried that we might sometime regret that the lack
> > of the explicit default. Don't we really need it?
> >
> 
> For this, I like your proposal for "buffered" as an explicit default value.
> 
> > +Allows streaming or serializing changes immediately in logical
> decoding.
> > +The allowed values of logical_decoding_mode
> are the
> > +empty string and immediate. When set to
> > +immediate, stream each change if
> > +streaming option is enabled, otherwise, 
> > serialize
> > +each change.  When set to an empty string, which is the default,
> > +decoding will stream or serialize changes when
> > +logical_decoding_work_mem is reached.
> >
> > With (really) fresh eyes, I took a bit long time to understand what
> > the "streaming" option is. Couldn't we augment the description by a
> > bit?
> >
> 
> Okay, how about modifying it to: "When set to
> immediate, stream each change if
> streaming option (see optional parameters set by
> CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change.
> 

I updated the patch to use "buffered" as the explicit default value, and include
Amit's changes about document.

Besides, I tried to reduce data size in streaming subscription tap tests by this
new GUC (see 0002 patch). But I didn't covert all streaming tap tests because I
think we also need to cover the case that there are lots of changes. So, 015* is
not modified. And 017* is not modified because streaming transactions and
non-streaming transactions are tested alternately in this test.

I collected the time to run these tests before and after applying the patch set
on my machine. In debug version, it saves about 5.3 s; and in release version,
it saves about 1.8 s. The time of each test is attached.

Regards,
Shi yu


v5-0001-Add-logical_decoding_mode-GUC.patch
Description: v5-0001-Add-logical_decoding_mode-GUC.patch


v5-0002-Converting-streaming-tap-tests-by-using-logical_d.patch
Description:  v5-0002-Converting-streaming-tap-tests-by-using-logical_d.patch
The unit is milliseconds.

- debug
+-+---+-+
| test_no |  HEAD | patched |
+-+---+-+
|   016   |  3425 |   3118  |
|   018   |  4873 |   3159  |
|   019   |  3241 |   3116  |
|   022   |  8273 |   7373  |
|   023   |  7156 |   4823  |
| |   | |
|   sum   | 26968 |  21589  |
+-+---+-+


- release
+-+---+-+
| test_no |  HEAD | patched |
+-+---+-+
|   016   |  1776 |   1649  |
|   018   |  2248 |   1689  |
|   019   |  1653 |   1648  |
|   022   |  4858 |   4462  |
|   023   |  3992 |   3292  |
| |   | |
|   sum   | 14527 |  12740  |
+-+---+-+


RE: Force streaming every change in logical decoding

2022-12-21 Thread shiy.f...@fujitsu.com
On Wed, Dec 21, 2022 4:05 PM Peter Smith  wrote:
> 
> Here are some review comments for patch v2.
> 
> Since the GUC is still under design maybe these comments can be
> ignored for now, but I guess similar comments will apply in future
> anyhow (just with some name changes).
> 

Thanks for your comments.

> ==
> 
> 3. More docs.
> 
> It might be helpful if this developer option is referenced also on the
> page "31.10.1 Logical Replication > Configuration Settings >
> Publishers" [1]
> 

The new added GUC is developer option, and it seems that most developer options
are not referenced on other pages. So, I am not sure if we need to add this to
other docs.

Other comments are fixed [1]. (Some of them are ignored because of the new
design.)

[1] 
https://www.postgresql.org/message-id/OSZPR01MB6310AAE12BC281158880380DFDEB9%40OSZPR01MB6310.jpnprd01.prod.outlook.com

Regards,
Shi yu


RE: Force streaming every change in logical decoding

2022-12-21 Thread shiy.f...@fujitsu.com
On Wed, Dec 21, 2022 4:54 PM Amit Kapila  wrote:
> 
> On Wed, Dec 21, 2022 at 1:55 PM Peter Smith 
> wrote:
> >
> > On Wed, Dec 21, 2022 at 6:22 PM Masahiko Sawada
>  wrote:
> > >
> > > On Tue, Dec 20, 2022 at 7:49 PM Amit Kapila 
> wrote:
> > > >
> > > > On Tue, Dec 20, 2022 at 2:46 PM Hayato Kuroda (Fujitsu)
> > > >  wrote:
> > > > >
> > > > > Dear hackers,
> > > > >
> > > > > > We have discussed three different ways to provide GUC for these
> > > > > > features. (1) Have separate GUCs like force_server_stream_mode,
> > > > > > force_server_serialize_mode, force_client_serialize_mode (we can
> use
> > > > > > different names for these) for each of these; (2) Have two sets of
> > > > > > GUCs for server and client. We can have logical_decoding_mode with
> > > > > > values as 'stream' and 'serialize' for the server and then
> > > > > > logical_apply_serialize = true/false for the client. (3) Have one 
> > > > > > GUC
> > > > > > like logical_replication_mode with values as 'server_stream',
> > > > > > 'server_serialize', 'client_serialize'.
> > > > >
> > > > > I also agreed for adding new GUC parameters (and I have already done
> partially
> > > > > in parallel apply[1]), and basically options 2 made sense for me. But 
> > > > > is
> it OK
> > > > > that we can choose "serialize" mode even if subscribers require
> streaming?
> > > > >
> > > > > Currently the reorder buffer transactions are serialized on publisher
> only when
> > > > > the there are no streamable transaction. So what happen if the
> > > > > logical_decoding_mode = "serialize" but streaming option streaming is
> on? If we
> > > > > break the first one and serialize changes on publisher anyway, it may
> be not
> > > > > suitable for testing the normal operation.
> > > > >
> > > >
> > > > I think the change will be streamed as soon as the next change is
> > > > processed even if we serialize based on this option. See
> > > > ReorderBufferProcessPartialChange. However, I see your point that
> when
> > > > the streaming option is given, the value 'serialize' for this GUC may
> > > > not make much sense.
> > > >
> > > > > Therefore, I came up with the variant of (2): logical_decoding_mode
> can be
> > > > > "normal" or "immediate".
> > > > >
> > > > > "normal" is a default value, which is same as current HEAD. Changes
> are streamed
> > > > > or serialized when the buffered size exceeds
> logical_decoding_work_mem.
> > > > >
> > > > > When users set to "immediate", the walsenders starts to stream or
> serialize all
> > > > > changes. The choice is depends on the subscription option.
> > > > >
> > > >
> > > > The other possibility to achieve what you are saying is that we allow
> > > > a minimum value of logical_decoding_work_mem as 0 which would
> mean
> > > > stream or serialize each change depending on whether the streaming
> > > > option is enabled. I think we normally don't allow a minimum value
> > > > below a certain threshold for other *_work_mem parameters (like
> > > > maintenance_work_mem, work_mem), so we have followed the same
> here.
> > > > And, I think it makes sense from the user's perspective because below
> > > > a certain threshold it will just add overhead by either writing small
> > > > changes to the disk or by sending those over the network. However, it
> > > > can be quite useful for testing/debugging. So, not sure, if we should
> > > > restrict setting logical_decoding_work_mem below a certain threshold.
> > > > What do you think?
> > >
> > > I agree with (2), having separate GUCs for publisher side and
> > > subscriber side. Also, on the publisher side, Amit's idea, controlling
> > > the logical decoding behavior by changing logical_decoding_work_mem,
> > > seems like a good idea.
> > >
> > > But I'm not sure it's a good idea if we lower the minimum value of
> > > logical_decoding_work_mem to 0. I agree it's helpful for testing and
> > > debugging but setting logical_decoding_work_mem = 0 doesn't benefit
> > > users at all, rather brings risks.
> > >
> > > I prefer the idea Kuroda-san previously proposed; setting
> > > logical_decoding_mode = 'immediate' means setting
> > > logical_decoding_work_mem = 0. We might not need to have it as an
> enum
> > > parameter since it has only two values, though.
> >
> > Did you mean one GUC (logical_decoding_mode) will cause a side-effect
> > implicit value change on another GUC value
> > (logical_decoding_work_mem)?
> >
> 
> I don't think that is required. The only value that can be allowed for
> logical_decoding_mode will be "immediate", something like we do for
> recovery_target. The default will be "". The "immediate" value will
> mean that stream each change if the "streaming" option is enabled
> ("on" of "parallel") or if "streaming" is not enabled then that would
> mean serializing each change.
> 

I agreed and updated the patch as Amit suggested.
Please see the attached patch.

Regards,
Shi yu


v3-0001-Allow-streaming-or-serializing-each-change-in-log.patch
Description:  

RE: Force streaming every change in logical decoding

2022-12-14 Thread shiy.f...@fujitsu.com
On Sat, Dec 10, 2022 2:03 PM Dilip Kumar  wrote:
> 
> On Tue, Dec 6, 2022 at 11:53 AM shiy.f...@fujitsu.com
>  wrote:
> >
> > Hi hackers,
> >
> > In logical decoding, when logical_decoding_work_mem is exceeded, the
> changes are
> > sent to output plugin in streaming mode. But there is a restriction that the
> > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC
> to
> > allow sending every change to output plugin without waiting until
> > logical_decoding_work_mem is exceeded.
> >
> > This helps to test streaming mode. For example, to test "Avoid streaming the
> > transaction which are skipped" [1], it needs many
> XLOG_XACT_INVALIDATIONS
> > messages. With the new option, it can be tested with fewer changes and in
> less
> > time. Also, this new option helps to test more scenarios for "Perform
> streaming
> > logical transactions by background workers" [2].
> 
> Some comments on the patch
> 

Thanks for your comments.

> 1. Can you add one test case using this option
> 

I added a simple test to confirm the new option works.

> 2. +  xreflabel="force_stream_mode">
> +  force_stream_mode
> (boolean)
> +  
> +   force_stream_mode configuration
> parameter
> +  
> +  
> 
> This GUC name "force_stream_mode" somehow appears like we are forcing
> this streaming mode irrespective of whether the
> subscriber has requested for this mode or not.  But actually it is not
> that, it is just streaming each change if
> it is enabled.  So we might need to think on the name (at least we
> should avoid using *mode* in the name IMHO).
> 

I think a similar GUC is force_parallel_mode, and if the query is parallel
unsafe or max_worker_processes is exceeded, force_parallel_mode will not work.
This is similar to what we do in this patch. So, maybe it's ok to use "mode". I
didn't change it in the new version of patch. What do you think?

> 3.
> -while (rb->size >= logical_decoding_work_mem * 1024L)
> +while ((!force_stream && rb->size >= logical_decoding_work_mem *
> 1024L) ||
> +   (force_stream && rb->size > 0))
>  {
> 
> It seems like if force_stream is on then indirectly it is enabling
> force serialization as well.  Because once we enter into the loop
> based on "force_stream" then it will either stream or serialize but I
> guess we do not want to force serialize based on this parameter.
> 

Agreed, I refactored the code and modified this point.

Please see the attached patch. I also fix Peter's comments[1]. The GUC name and
design are still under discussion, so I didn't modify them.

By the way, I noticed that the comment for ReorderBufferCheckMemoryLimit() on
HEAD missed something. I fix it in this patch, too.

[1] 
https://www.postgresql.org/message-id/CAHut%2BPtOjZ_e-KLf26i1XLH2ffPEZGOmGSKy0wDjwyB_uvzxBQ%40mail.gmail.com

Regards,
Shi yu


v2-0001-Allow-streaming-every-change-without-waiting-till.patch
Description:  v2-0001-Allow-streaming-every-change-without-waiting-till.patch


RE: Avoid streaming the transaction which are skipped (in corner cases)

2022-12-06 Thread shiy.f...@fujitsu.com
On Wed, Dec 7, 2022 12:01 PM Dilip Kumar  wrote:
> 
> On Wed, Dec 7, 2022 at 9:28 AM Amit Kapila 
> wrote:
> >
> > On Tue, Dec 6, 2022 at 11:55 AM shiy.f...@fujitsu.com
> >  wrote:
> > >
> > > On Mon, Dec 5, 2022 6:57 PM Amit Kapila 
> wrote:
> > > >
> > > > On Mon, Dec 5, 2022 at 3:41 PM Dilip Kumar 
> wrote:
> > > > >
> > > > > I think we need something like this[1] so that we can better control
> > > > > the streaming.
> > > > >
> > > >
> > > > +1. The additional advantage would be that we can generate parallel
> > > > apply and new streaming tests with much lesser data. Shi-San, can you
> > > > please start a new thread for the GUC patch proposed by you as
> > > > indicated by Dilip?
> > > >
> > >
> > > OK, I started a new thread for it. [1]
> > >
> >
> > Thanks. I think it is better to go ahead with this patch and once we
> > decide what is the right thing to do in terms of GUC then we can try
> > to add additional tests for this. Anyway, it is not that the code
> > added by this patch is not getting covered by existing tests. What do
> > you think?
> 
> That makes sense to me.
> 

+1

Regards,
Shi yu


RE: Avoid streaming the transaction which are skipped (in corner cases)

2022-12-05 Thread shiy.f...@fujitsu.com
On Mon, Dec 5, 2022 6:57 PM Amit Kapila  wrote:
> 
> On Mon, Dec 5, 2022 at 3:41 PM Dilip Kumar  wrote:
> >
> > I think we need something like this[1] so that we can better control
> > the streaming.
> >
> 
> +1. The additional advantage would be that we can generate parallel
> apply and new streaming tests with much lesser data. Shi-San, can you
> please start a new thread for the GUC patch proposed by you as
> indicated by Dilip?
> 

OK, I started a new thread for it. [1]

[1] 
https://www.postgresql.org/message-id/OSZPR01MB63104E7449DBE41932DB19F1FD1B9%40OSZPR01MB6310.jpnprd01.prod.outlook.com

Regards,
Shi yu


Force streaming every change in logical decoding

2022-12-05 Thread shiy.f...@fujitsu.com
Hi hackers,

In logical decoding, when logical_decoding_work_mem is exceeded, the changes are
sent to output plugin in streaming mode. But there is a restriction that the
minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC to
allow sending every change to output plugin without waiting until
logical_decoding_work_mem is exceeded.

This helps to test streaming mode. For example, to test "Avoid streaming the
transaction which are skipped" [1], it needs many XLOG_XACT_INVALIDATIONS
messages. With the new option, it can be tested with fewer changes and in less
time. Also, this new option helps to test more scenarios for "Perform streaming
logical transactions by background workers" [2].

[1] 
https://www.postgresql.org/message-id/CAFiTN-tHK=7lzfrps8fbt2ksrojgqbzywcgxst2bm9-rjja...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/CAA4eK1%2BwyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw%40mail.gmail.com

Regards,
Shi yu


v1-0001-Allow-streaming-every-change-without-waiting-till.patch
Description:  v1-0001-Allow-streaming-every-change-without-waiting-till.patch


RE: Avoid streaming the transaction which are skipped (in corner cases)

2022-11-28 Thread shiy.f...@fujitsu.com
On Sun, Nov 27, 2022 1:33 PM Dilip Kumar  wrote:
> 
> On Sat, Nov 26, 2022 at 12:15 PM Amit Kapila 
> wrote:
> >
> > On Fri, Nov 25, 2022 at 5:38 PM Amit Kapila 
> wrote:
> > >
> > > On Fri, Nov 25, 2022 at 1:35 PM Dilip Kumar 
> wrote:
> > > >
> > > > During DecodeCommit() for skipping a transaction we use ReadRecPtr to
> > > > check whether to skip this transaction or not.  Whereas in
> > > > ReorderBufferCanStartStreaming() we use EndRecPtr to check whether
> to
> > > > stream or not. Generally it will not create a problem but if the
> > > > commit record itself is adding some changes to the transaction(e.g.
> > > > snapshot) and if the "start_decoding_at" is in between ReadRecPtr and
> > > > EndRecPtr then streaming will decide to stream the transaction where
> > > > as DecodeCommit will decide to skip it.  And for handling this case in
> > > > ReorderBufferForget() we call stream_abort().
> > > >
> > >
> > > The other cases are probably where we don't have FilterByOrigin or
> > > dbid check, for example,
> XLOG_HEAP2_NEW_CID/XLOG_XACT_INVALIDATIONS.
> > > We anyway actually don't send anything for such cases except empty
> > > start/stop messages. Can we add some flag to txn which says that there
> > > is at least one change like DML that we want to stream?
> > >
> >
> > We can probably think of using txn_flags for this purpose.
> 
> In the attached patch I have used txn_flags to identify whether it has
> any streamable change or not and the transaction will not be selected
> for streaming unless it has at least one streamable change.
> 

Thanks for your patch.

I saw that the patch added a check when selecting largest transaction, but in
addition to ReorderBufferCheckMemoryLimit(), the transaction can also be
streamed in ReorderBufferProcessPartialChange(). Should we add the check in
this function, too?

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 9a58c4bfb9..108737b02f 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -768,7 +768,8 @@ ReorderBufferProcessPartialChange(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
 */
if (ReorderBufferCanStartStreaming(rb) &&
!(rbtxn_has_partial_change(toptxn)) &&
-   rbtxn_is_serialized(txn))
+   rbtxn_is_serialized(txn) &&
+   rbtxn_has_streamable_change(txn))
ReorderBufferStreamTXN(rb, toptxn);
 }

Regards,
Shi yu


RE: Fix some newly modified tab-complete changes

2022-11-16 Thread shiy.f...@fujitsu.com
On Thu, Nov 10, 2022 12:54 PM Michael Paquier  wrote:
> 
> On Tue, Oct 18, 2022 at 05:17:32PM +1100, Peter Smith wrote:
> > I re-tested and confirm that the patch does indeed fix the quirk I'd
> > previously reported.
> >
> > But, looking at the patch code, I don't know if it is the best way to
> > fix the problem or not. Someone with more experience of the
> > tab-complete module can judge that.
> 
> It seems to me that the patch as proposed has more problems than
> that.  I have spotted a few issues at quick glance, there may be
> more.
> 
> For example, take this one:
> +   else if (TailMatches("GRANT") ||
> +TailMatches("REVOKE", "GRANT", "OPTION", "FOR"))
> COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_roles,
> 
> "REVOKE GRANT OPTION FOR" completes with a list of role names, which
> is incorrect.
> 
> FWIW, I am not much a fan of the approach taken by the patch to
> duplicate the full list of keywords to append after REVOKE or GRANT,
> at the only difference of "GRANT OPTION FOR".  This may be readable if
> unified with a single list, with extra items appended for GRANT and
> REVOKE?
> 
> Note that REVOKE has a "ADMIN OPTION FOR" clause, which is not
> completed to.

Thanks a lot for looking into this patch.

I have fixed the problems you saw, and improved the patch as you suggested.

Besides, I noticed that the tab completion for "ALTER DEFAULT PRIVILEGES ...
GRANT/REVOKE ..." missed "CREATE". Fix it in 0001 patch.

And commit e3ce2de09 supported GRANT ... WITH INHERIT ..., but there's no tab
completion for it. Add this in 0002 patch.

Please see the attached patches.

Regards,
Shi yu


v5-0002-Add-tab-completion-for-GRANT-WITH-INHERIT.patch
Description: v5-0002-Add-tab-completion-for-GRANT-WITH-INHERIT.patch


v5-0001-Fix-tab-completion-for-GRANT-REVOKE.patch
Description: v5-0001-Fix-tab-completion-for-GRANT-REVOKE.patch


RE: Segfault on logical replication to partitioned table with foreign children

2022-10-30 Thread shiy.f...@fujitsu.com
On Sun, Oct 30, 2022 9:39 PM Tom Lane  wrote:
> 
> What I'm wondering about is whether we can refactor this code
> to avoid so many usually-useless catalog lookups.  Pulling the
> namespace name, in particular, is expensive and we generally
> are not going to need the result.  In the child-rel case it'd
> be much better to pass the opened relation and let the error-check
> subroutine work from that.  Maybe we should just do it like that
> at the start of the recursion, too?  Or pass the relid and let
> the subroutine look up the names only in the error case.
> 
> A completely different line of thought is that this doesn't seem
> like a terribly bulletproof fix, since children could get added to
> a partitioned table after we look.  Maybe it'd be better to check
> the relkind at the last moment before we do something that depends
> on a child table being a relation.
> 

I agree. So maybe we can add this check in apply_handle_tuple_routing().

diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index 5250ae7f54..e941b68e4b 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2176,6 +2176,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
Assert(partrelinfo != NULL);
partrel = partrelinfo->ri_RelationDesc;

+   /* Check for supported relkind. */
+   CheckSubscriptionRelkind(partrel->rd_rel->relkind,
+
relmapentry->remoterel.nspname, relmapentry->remoterel.relname);
+
/*
 * To perform any of the operations below, the tuple must match the
 * partition's rowtype. Convert if needed or just copy, using a 
dedicated


Regards,
Shi yu




RE: Perform streaming logical transactions by background workers and parallel apply

2022-10-28 Thread shiy.f...@fujitsu.com
On Tue, Oct 25, 2022 2:56 PM Wang, Wei/王 威  wrote:
> 
> On Tues, Oct 25, 2022 at 14:28 PM Peter Smith 
> wrote:
> > FYI - After a recent push, the v40-0001 patch no longer applies on the
> > latest HEAD.
> >
> > [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> > ../patches_misc/v40-0001-Perform-streaming-logical-transactions-by-
> > parall.patch
> > error: patch failed: src/backend/replication/logical/launcher.c:54
> > error: src/backend/replication/logical/launcher.c: patch does not apply
> 
> Thanks for your reminder.
> 
> I just rebased the patch set for review.
> The new patch set will be shared later when the comments in this thread are
> addressed.
> 

I tried to write a draft patch to force streaming every change instead of
waiting until logical_decoding_work_mem is exceeded, which could help to test
streaming parallel. Attach the patch. This is based on v41-0001 patch.

With this patch, I saw a problem that the subscription option "origin" doesn't
work when using streaming parallel. That's because when the parallel apply
worker writing the WAL for the changes, replorigin_session_origin is
InvalidRepOriginId. In current patch, origin can be active only in one process
at-a-time.

To fix it, maybe we need to remove this restriction, like what we did in the old
version of patch.

Regards
Shi yu


0001-Allow-streaming-every-change-instead-of-waiting-till_patch
Description:  0001-Allow-streaming-every-change-instead-of-waiting-till_patch


RE: Perform streaming logical transactions by background workers and parallel apply

2022-10-26 Thread shiy.f...@fujitsu.com
On Wed, Oct 26, 2022 7:19 PM Amit Kapila  wrote:
> 
> On Tue, Oct 25, 2022 at 8:38 AM Masahiko Sawada
>  wrote:
> >
> > On Fri, Oct 21, 2022 at 6:32 PM houzj.f...@fujitsu.com
> >  wrote:
> >
> > I've started to review this patch. I tested v40-0001 patch and have
> > one question:
> >
> > IIUC even when most of the changes in the transaction are filtered out
> > in pgoutput (eg., by relation filter or row filter), the walsender
> > sends STREAM_START. This means that the subscriber could end up
> > launching parallel apply workers also for almost empty (and streamed)
> > transactions. For example, I created three subscriptions each of which
> > subscribes to a different table. When I loaded a large amount of data
> > into one table, all three (leader) apply workers received START_STREAM
> > and launched their parallel apply workers.
> >
> 
> The apply workers will be launched just the first time then we
> maintain a pool so that we don't need to restart them.
> 
> > However, two of them
> > finished without applying any data. I think this behaviour looks
> > problematic since it wastes workers and rather decreases the apply
> > performance if the changes are not large. Is it worth considering a
> > way to delay launching a parallel apply worker until we find out the
> > amount of changes is actually large?
> >
> 
> I think even if changes are less there may not be much difference
> because we have observed that the performance improvement comes from
> not writing to file.
> 
> > For example, the leader worker
> > writes the streamed changes to files as usual and launches a parallel
> > worker if the amount of changes exceeds a threshold or the leader
> > receives the second segment. After that, the leader worker switches to
> > send the streamed changes to parallel workers via shm_mq instead of
> > files.
> >
> 
> I think writing to file won't be a good idea as that can hamper the
> performance benefit in some cases and not sure if it is worth.
> 

I tried to test some cases that only a small part of the transaction or an empty
transaction is sent to subscriber, to see if using streaming parallel will bring
performance degradation.

The test was performed ten times, and the average was taken.
The results are as follows. The details and the script of the test is attached.

10% of rows are sent
--
HEAD24.4595
patched 18.4545

5% of rows are sent
--
HEAD21.244
patched 17.9655

0% of rows are sent
--
HEAD18.0605
patched 17.893


It shows that when only 5% or 10% of rows are sent to subscriber, using parallel
apply takes less time than HEAD, and even if all rows are filtered there's no
performance degradation.


Regards
Shi yu
<>


RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-19 Thread shiy.f...@fujitsu.com
On Wed, Oct 19, 2022 12:05 AM Önder Kalacı   wrote:
> 
> Attached v19.
> 

Thanks for updating the patch. Here are some comments on v19.

1.
In execReplication.c:

+   TypeCacheEntry **eq = NULL; /* only used when the index is not unique */

Maybe the comment here should be changed. Now it is used when the index is not
primary key or replica identity index.

2.
+# wait until the index is created
+$node_subscriber->poll_query_until(
+   'postgres', q{select count(*)=1 from pg_stat_all_indexes where 
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full 
updates one row via index";

The message doesn't seem right,  should it be changed to "Timed out while
waiting for creating index test_replica_id_full_idx"?

3.
+# now, ingest more data and create index on column y which has higher 
cardinality
+# then create an index on column y so that future commands uses the index on 
column
+$node_publisher->safe_psql('postgres',
+   "INSERT INTO test_replica_id_full SELECT 50, i FROM 
generate_series(0,3100)i;");

The comment say "create (an) index on column y" twice, maybe it can be changed
to:

now, ingest more data and create index on column y which has higher cardinality,
so that future commands will use the index on column y

4.
+# deletes 200 rows
+$node_publisher->safe_psql('postgres',
+   "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+   'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where 
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full 
updates 200 rows via index";

It would be better to call wait_for_catchup() after DELETE. (And some other
places in this file.)
Besides, the "updates" in the message should be "deletes".

5.
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+   'postgres', q{select sum(idx_scan)=10 from pg_stat_all_indexes where 
indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full 
updates partitioned table";

Maybe we should say "updates partitioned table with index" in this message.

Regards,
Shi yu


RE: create subscription - improved warning message

2022-10-18 Thread shiy.f...@fujitsu.com
On Tue, Oct 18, 2022 5:44 PM Peter Smith  wrote:
> 
> On Mon, Oct 17, 2022 at 7:11 PM shiy.f...@fujitsu.com
>  wrote:
> >
> ...
> >
> > Thanks for your patch. Here are some comments.
> >
> > In Example 2, the returned slot_name should be "myslot".
> >
> > +test_pub=# SELECT * FROM pg_create_logical_replication_slot('myslot',
> 'pgoutput');
> > + slot_name |lsn
> > +---+---
> > + sub1  | 0/19059A0
> > +(1 row)
> >
> 
> Oops. Sorry for my cut/paste error. Fixed in patch v6.
> 
> 
> > Besides, I am thinking is it possible to slightly simplify the example. For
> > example, merge example 1 and 2, keep the steps of example 2 and in the
> step of
> > creating slot, mention what should we do if slot_name is not specified when
> > creating subscription.
> >
> 
> Sure, it might be a bit shorter to combine the examples, but I thought
> it’s just simpler not to do it that way because the combined example
> will then need additional bullets/notes to say – “if there is no
> slot_name do this…” and “if there is a slot_name do that…”. It’s
> really only the activation part which is identical for both.
> 

Thanks for updating the patch.

+test_sub=# CREATE SUBSCRIPTION sub1
+test_sub-# CONNECTION 'host=localhost dbname=test_pub'
+test_sub-# PUBLICATION pub1
+test_sub-# WITH (slot_name=NONE, enabled=false, create_slot=false);
+WARNING:  subscription was created, but is not connected
+HINT:  To initiate replication, you must manually create the replication slot, 
enable the subscription, and refresh the subscription.
+CREATE SUBSCRIPTION

In example 3, there is actually no such warning message when creating
subscription because "connect=false" is not specified.

I have tested the examples in the patch and didn't see any problem other than
the one above.

Regards,
Shi yu


RE: create subscription - improved warning message

2022-10-17 Thread shiy.f...@fujitsu.com
On Mon, Oct 17, 2022 9:47 AM Peter Smith  wrote:
> 
> On Sun, Oct 16, 2022 at 12:14 AM Amit Kapila 
> wrote:
> >
> > On Fri, Oct 14, 2022 at 8:22 AM Peter Smith 
> wrote:
> > >
> > > On Thu, Oct 13, 2022 at 9:07 AM Peter Smith 
> wrote:
> > > >
> ...
> > > PSA a patch for adding examples of how to activate a subscription that
> > > was originally created in a disconnected mode.
> > >
> > > The new docs are added as part of the "Logical Replication -
> > > Subscription" section 31.2.
> > >
> > > The CREATE SUBSCRIPTION reference page was also updated to include
> > > links to the new docs.
> > >
> >
> > You have used 'pgoutput' as plugin name in the examples. Shall we
> > mention in some way that this is a default plugin used for built-in
> > logical replication and it is required to use only this one to enable
> > logical replication?
> >
> 
> Updated as sugggested.
> 
> PSA v5.
> 

Thanks for your patch. Here are some comments.

In Example 2, the returned slot_name should be "myslot".

+test_pub=# SELECT * FROM pg_create_logical_replication_slot('myslot', 
'pgoutput');
+ slot_name |lsn
+---+---
+ sub1  | 0/19059A0
+(1 row)

Besides, I am thinking is it possible to slightly simplify the example. For
example, merge example 1 and 2, keep the steps of example 2 and in the step of
creating slot, mention what should we do if slot_name is not specified when
creating subscription.


Regards,
Shi yu




RE: [RFC] building postgres with meson - v13

2022-10-13 Thread shiy.f...@fujitsu.com
On Fri, Oct 14, 2022 12:40 AM Andres Freund  wrote:
> 
> Hi,
> 
> On 2022-10-13 09:24:51 +, shiy.f...@fujitsu.com wrote:
> > I noticed that `pg_config --configure` didn't show the options given when
> > building with meson.
> 
> Yes, that was noted somewhere on this thread.
> 
> 
> > Maybe it would be better if pg_config can output this information, to be
> > consistent with the output when building with `./configure` and `make`.
> >
> > The output when building with `./configure` and `make`:
> > $ pg_config --configure
> >  '--prefix=/home/postgres/install/' '--cache' 'gcc.cache' '--enable-dtrace' 
> > '--
> with-icu' '--enable-cassert'
> 
> It'd be a fair amount of work, both initially and to maintain it, to generate
> something compatible. I can see some benefit in showing some feature
> influencing output in --configure, but compatible output doesn't seem worth it
> to me.
> 

I agree that there are some benefits to showing that, which helps to confirm the
build options. Although that can be confirmed from the compile log, but the log
may not be available all the time.

And it's ok for me that the output is not exactly the same as before.

Regards,
Shi yu




RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-13 Thread shiy.f...@fujitsu.com
On Wed, Aug 24, 2022 12:25 AM Önder Kalacı  wrote:
> Hi,
> 
> Thanks for the review!
> 

Thanks for your reply.

> 
> >
> > 1.
> > In FilterOutNotSuitablePathsForReplIdentFull(), is
> > "nonPartialIndexPathList" a
> > good name for the list? Indexes on only expressions are also be filtered.
> >
> > +static List *
> > +FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)
> > +{
> > +   ListCell   *lc;
> > +   List *nonPartialIndexPathList = NIL;
> >
> >
> Yes, true. We only started filtering the non-partial ones first. Now
> changed to *suitableIndexList*, does that look right?
> 

That looks ok to me.

> 
> 
> > 3.
> > It looks we should change the comment for FindReplTupleInLocalRel() in this
> > patch.
> >
> > /*
> >  * Try to find a tuple received from the publication side (in
> > 'remoteslot') in
> >  * the corresponding local relation using either replica identity index,
> >  * primary key or if needed, sequential scan.
> >  *
> >  * Local tuple, if found, is returned in '*localslot'.
> >  */
> > static bool
> > FindReplTupleInLocalRel(EState *estate, Relation localrel,
> >
> >
> I made a small change, just adding "index". Do you expect a larger change?
> 
> 

I think that's sufficient.

> 
> 
> > 5.
> > +   if (!AttributeNumberIsValid(mainattno))
> > +   {
> > +   /*
> > +* There are two cases to consider. First, if the
> > index is a primary or
> > +* unique key, we cannot have any indexes with
> > expressions. So, at this
> > +* point we are sure that the index we deal is not
> > these.
> > +*/
> > +   Assert(RelationGetReplicaIndex(rel) !=
> > RelationGetRelid(idxrel) &&
> > +  RelationGetPrimaryKeyIndex(rel) !=
> > RelationGetRelid(idxrel));
> > +
> > +   /*
> > +* For a non-primary/unique index with an
> > expression, we are sure that
> > +* the expression cannot be used for replication
> > index search. The
> > +* reason is that we create relevant index paths
> > by providing column
> > +* equalities. And, the planner does not pick
> > expression indexes via
> > +* column equality restrictions in the query.
> > +*/
> > +   continue;
> > +   }
> >
> > Is it possible that it is a usable index with an expression? I think
> > indexes
> > with an expression has been filtered in
> > FilterOutNotSuitablePathsForReplIdentFull(). If it can't be a usable index
> > with
> > an expression, maybe we shouldn't use "continue" here.
> >
> 
> 
> 
> Ok, I think there are some confusing comments in the code, which I updated.
> Also, added one more explicit Assert to make the code a little more
> readable.
> 
> We can support indexes involving expressions but not indexes that are only
> consisting of expressions. FilterOutNotSuitablePathsForReplIdentFull()
> filters out the latter, see IndexOnlyOnExpression().
> 
> So, for example, if we have an index as below, we are skipping the
> expression while building the index scan keys:
> 
> CREATE INDEX people_names ON people (firstname, lastname, (id || '_' ||
> sub_id));
> 
> We can consider removing `continue`, but that'd mean we should also adjust
> the following code-block to handle indexprs. To me, that seems like an edge
> case to implement at this point, given such an index is probably not
> common. Do you think should I try to use the indexprs as well while
> building the scan key?
> 
> I'm mostly trying to keep the complexity small. If you suggest this
> limitation should be lifted, I can give it a shot. I think the limitation I
> leave here is with a single sentence: *The index on the subscriber can only
> use simple column references.  *
> 

Thanks for your explanation. I get it and think it's OK.

> > 6.
> > In the following case, I got a result which is different from HEAD, could
> > you
> > please look into it?
> >
> > -- publisher
> > CREATE TABLE test_replica_id_full (x int);
> > ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
> > CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
> >
> > -- subscriber
> > CREATE TABLE test_replica_id_full (x int, y int);
> > CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x,y);
> > CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres
> > port=5432' PUBLICATION tap_pub_rep_full;
> >
> > -- publisher
> > INSERT INTO test_replica_id_full VALUES (1);
> > UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;
> >
> > The data in subscriber:
> > on HEAD:
> > postgres=# select * from test_replica_id_full ;
> >  x | y
> > ---+---
> >  2 |
> > (1 row)
> >
> > After applying the patch:
> > postgres=# select * from test_replica_id_full ;
> >  x | y
> > ---+---
> >  1 |
> > (1 row)
> >

RE: [RFC] building postgres with meson - v13

2022-10-13 Thread shiy.f...@fujitsu.com
Hi,

I noticed that `pg_config --configure` didn't show the options given when
building with meson. 

For example, 
meson setup build -Dcache=gcc.cache -Ddtrace=enabled -Dicu=enabled 
-Dcassert=true -Dprefix=/home/postgres/install_meson/
meson compile -C build 
meson install -C build

$ pg_config --configure

The options I specified (like dtrace) are not shown. I found they actually work
in compilation.
When specifying `-Ddtrace=enabled`, there is a log like this. And with
`-Ddtrace=disabled`, no such log.

[120/1834] /usr/bin/dtrace -C -h -s 
../src/include/utils/../../backend/utils/probes.d -o 
src/include/utils/probes.h.tmp

Maybe it would be better if pg_config can output this information, to be
consistent with the output when building with `./configure` and `make`.

The output when building with `./configure` and `make`:
$ pg_config --configure
 '--prefix=/home/postgres/install/' '--cache' 'gcc.cache' '--enable-dtrace' 
'--with-icu' '--enable-cassert'


Regards,
Shi yu




RE: Fix some newly modified tab-complete changes

2022-10-10 Thread shiy.f...@fujitsu.com
On Mon, Oct 10, 2022 2:12 PM shiy.f...@fujitsu.com  
wrote:
> 
> On Tue, Oct 4, 2022 4:17 PM Peter Smith  wrote:
> >
> > But, while testing I noticed another different quirk
> >
> > It seems that neither the GRANT nor the REVOKE auto-complete
> > recognises the optional PRIVILEGE keyword
> >
> > e.g. GRANT ALL  --> ON  (but not PRIVILEGE)
> > e.g. GRANT ALL PRIV --> ???
> >
> > e.g. REVOKE ALL  --> ON (but not PRIVILEGE)..
> > e.g. REVOKE ALL PRIV --> ???
> >
> 
> I tried to add tab-completion for it. Pleases see attached updated patch.
> 

Sorry for attaching a wrong patch. Here is the right one.

Regards,
Shi yu


v4-0001-Fix-tab-completion-for-GRANT-REVOKE.patch
Description: v4-0001-Fix-tab-completion-for-GRANT-REVOKE.patch


RE: Fix some newly modified tab-complete changes

2022-10-10 Thread shiy.f...@fujitsu.com
On Tue, Oct 4, 2022 4:17 PM Peter Smith  wrote:
> 
> On Thu, Sep 29, 2022 at 12:50 PM shiy.f...@fujitsu.com
>  wrote:
> >
> > On Wed, Sep 28, 2022 1:49 PM Kyotaro Horiguchi
>  wrote:
> > >
> > > At Wed, 28 Sep 2022 14:14:01 +1000, Peter Smith
> > >  wrote in
> ...
> > > >
> > > > 2. tab complete for GRANT
> > > >
> > > > test_pub=# grant 
> > > > ALLEXECUTE
> > > > pg_execute_server_program  pg_read_server_files   postgres
> > > >   TRIGGER
> > > > ALTER SYSTEM   GRANT  pg_monitor
> > > >   pg_signal_backend  REFERENCES
> > > > TRUNCATE
> > > > CONNECTINSERT pg_read_all_data
> > > >   pg_stat_scan_tablesSELECT UPDATE
> > > > CREATE pg_checkpoint
> > > > pg_read_all_settings   pg_write_all_data  SET
> > > >   USAGE
> > > > DELETE pg_database_owner
> > > > pg_read_all_stats  pg_write_server_files  TEMPORARY
> > > >
> > > > 2a.
> > > > grant "GRANT" ??
> > >
> > > Yeah, for the mement I thought that might a kind of admin option but
> > > there's no such a privilege. REVOKE gets the same suggestion.
> > >
> >
> > Maybe that's for "REVOKE GRANT OPTION FOR". But it is used by both
> GRANT and
> > REVOKE. I think it's a separate problem, I have tried to fix it in the 
> > attached
> > 0002 patch.
> >
> 
> I checked your v2-0002 patch and AFAICT it does fix properly the
> previously reported GRANT/REVOKE problem.
> 

Thanks for reviewing and testing it.

> ~
> 
> But, while testing I noticed another different quirk
> 
> It seems that neither the GRANT nor the REVOKE auto-complete
> recognises the optional PRIVILEGE keyword
> 
> e.g. GRANT ALL  --> ON  (but not PRIVILEGE)
> e.g. GRANT ALL PRIV --> ???
> 
> e.g. REVOKE ALL  --> ON (but not PRIVILEGE)..
> e.g. REVOKE ALL PRIV --> ???
> 

I tried to add tab-completion for it. Pleases see attached updated patch.

Regards,
Shi yu


v3-0001-Fix-tab-completion-for-GRANT-REVOKE.patch
Description: v3-0001-Fix-tab-completion-for-GRANT-REVOKE.patch


RE: Fix some newly modified tab-complete changes

2022-09-28 Thread shiy.f...@fujitsu.com
On Wed, Sep 28, 2022 1:49 PM Kyotaro Horiguchi  wrote:
> 
> At Wed, 28 Sep 2022 14:14:01 +1000, Peter Smith
>  wrote in
> > On Tue, Sep 27, 2022 at 8:28 PM shiy.f...@fujitsu.com
> >  wrote:
> > >
> > > Hi hackers,
> > >
> > > I saw a problem when using tab-complete for "GRANT", "TABLES IN
> SCHEMA" should
> > > be "ALL TABLES IN SCHEMA" in the following case.
> > >
> > > postgres=# grant all on
> > > ALL FUNCTIONS IN SCHEMA   DATABASE  FUNCTION
> PARAMETER SCHEMATABLESPACE
> > > ALL PROCEDURES IN SCHEMA  DOMAINinformation_schema.
> PROCEDURE SEQUENCE  tbl
> > > ALL ROUTINES IN SCHEMAFOREIGN DATA WRAPPER  LANGUAGE
> public.   TABLE TYPE
> > > ALL SEQUENCES IN SCHEMA   FOREIGN SERVERLARGE OBJECT
> ROUTINE   TABLES IN SCHEMA
> > >
> > > I found that it is related to the recent commit 790bf615dd, and maybe it's
> > > better to fix it. I also noticed that some comments should be modified
> according
> > > to this new syntax. Attach a patch to fix them.
> > >
> >
> > Thanks for the patch! Below are my review comments.
> >
> > The patch looks good to me but I did find some other tab-completion
> > anomalies. IIUC these are unrelated to your work, but since I found
> > them while testing your patch I am reporting them here.
> 
> Looks fine to me, too.
> 

Thanks for reviewing it.

> > Perhaps you want to fix them in the same patch, or just raise them
> > again separately?
> >
> > ==
> >
> > 1. tab complete for CREATE PUBLICATION
> >
> > I donʼt think this is any new bug, but I found that it is possible to do 
> > this...
> >
> > test_pub=# create publication p for ALL TABLES IN SCHEMA 
> > information_schema  pg_catalog  pg_toastpublic
> >
> > or, even this...
> >
> > test_pub=# create publication p for XXX TABLES IN SCHEMA 
> > information_schema  pg_catalog  pg_toastpublic
> 
> Completion is responding to "IN SCHEMA" in these cases.  However, I
> don't reach this state only by completion becuase it doesn't suggest
> "IN SCHEMA" after "TABLES" nor "ALL TABLES".  I don't see a reason to
> change that behavior unless that fix doesn't cause any additional
> complexity.
> 

+1

> > ==
> >
> > 2. tab complete for GRANT
> >
> > test_pub=# grant 
> > ALLEXECUTE
> > pg_execute_server_program  pg_read_server_files   postgres
> >   TRIGGER
> > ALTER SYSTEM   GRANT  pg_monitor
> >   pg_signal_backend  REFERENCES
> > TRUNCATE
> > CONNECTINSERT pg_read_all_data
> >   pg_stat_scan_tablesSELECT UPDATE
> > CREATE pg_checkpoint
> > pg_read_all_settings   pg_write_all_data  SET
> >   USAGE
> > DELETE pg_database_owner
> > pg_read_all_stats  pg_write_server_files  TEMPORARY
> >
> > 2a.
> > grant "GRANT" ??
> 
> Yeah, for the mement I thought that might a kind of admin option but
> there's no such a privilege. REVOKE gets the same suggestion.
> 

Maybe that's for "REVOKE GRANT OPTION FOR". But it is used by both GRANT and
REVOKE. I think it's a separate problem, I have tried to fix it in the attached
0002 patch.

> > 2b.
> > grant "TEMPORARY" but not "TEMP" ??
> 
> TEMP is an alternative spelling so that's fine.
> 

Agreed.

> 
> I found the following suggestion.
> 
> CREATE PUBLICATION p FOR TABLES  -> ["IN SCHEMA", "WITH ("]
> 
> I believe "WITH (" doesn't come there.
> 

Fixed.

Attach the updated patch.

Regards,
Shi yu


v2-0001-Fix-some-newly-modified-tab-complete-changes-and-.patch
Description:  v2-0001-Fix-some-newly-modified-tab-complete-changes-and-.patch


v2-0002-Fix-tab-completion-for-GRANT-REVOKE.patch
Description: v2-0002-Fix-tab-completion-for-GRANT-REVOKE.patch


Fix some newly modified tab-complete changes

2022-09-27 Thread shiy.f...@fujitsu.com
Hi hackers,

I saw a problem when using tab-complete for "GRANT", "TABLES IN SCHEMA" should
be "ALL TABLES IN SCHEMA" in the following case.

postgres=# grant all on
ALL FUNCTIONS IN SCHEMA   DATABASE  FUNCTION  
PARAMETER SCHEMATABLESPACE
ALL PROCEDURES IN SCHEMA  DOMAINinformation_schema.   
PROCEDURE SEQUENCE  tbl
ALL ROUTINES IN SCHEMAFOREIGN DATA WRAPPER  LANGUAGE  
public.   TABLE TYPE
ALL SEQUENCES IN SCHEMA   FOREIGN SERVERLARGE OBJECT  
ROUTINE   TABLES IN SCHEMA

I found that it is related to the recent commit 790bf615dd, and maybe it's
better to fix it. I also noticed that some comments should be modified according
to this new syntax. Attach a patch to fix them.

Regards,
Shi yu


0001-Fix-some-newly-modified-tab-complete-changes-and-com.patch
Description:  0001-Fix-some-newly-modified-tab-complete-changes-and-com.patch


RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-19 Thread shiy.f...@fujitsu.com
On Mon, Sept 19, 2022 11:26 AM Wang, Wei/王 威  wrote:
> 
> 
> Improved as suggested.
> 

Thanks for updating the patch. Here are some comments on 0001 patch.

1.
+   case TRANS_LEADER_SERIALIZE:
 
-   oldctx = MemoryContextSwitchTo(ApplyContext);
+   /*
+* Notify handle methods we're processing a remote 
in-progress
+* transaction.
+*/
+   in_streamed_transaction = true;
 
-   MyLogicalRepWorker->stream_fileset = palloc(sizeof(FileSet));
-   FileSetInit(MyLogicalRepWorker->stream_fileset);
+   /*
+* Since no parallel apply worker is used for the first 
stream
+* start, serialize all the changes of the transaction.
+*
+* Start a transaction on stream start, this 
transaction will be


It seems that the following comment can be removed after using switch case.
+* Since no parallel apply worker is used for the first 
stream
+* start, serialize all the changes of the transaction.

2.
+   switch (apply_action)
+   {
+   case TRANS_LEADER_SERIALIZE:
+   if (!in_streamed_transaction)
+   ereport(ERROR,
+   
(errcode(ERRCODE_PROTOCOL_VIOLATION),
+errmsg_internal("STREAM STOP 
message without STREAM START")));

In apply_handle_stream_stop(), I think we can move this check to the beginning 
of
this function, to be consistent to other functions.

3. I think the some of the changes in 0005 patch can be merged to 0001 patch,
0005 patch can only contain the changes about new column 'apply_leader_pid'.

4.
+ * ParallelApplyWorkersList. After successfully, launching a new worker it's
+ * information is added to the ParallelApplyWorkersList. Once the worker

Should `it's` be `its` ?

Regards
Shi yu


RE: Handle infinite recursion in logical replication setup

2022-09-07 Thread shiy.f...@fujitsu.com
On Wed, Sep 7, 2022 12:23 PM vignesh C  wrote:
> 
> 
> Thanks for the comments, the attached v47 patch has the changes for the
> same.
> 

Thanks for updating the patch.

Here is a comment.

+   for (i = 0; i < subrel_count; i++)
+   {
+   Oid relid = subrel_local_oids[i];
+   char   *schemaname = 
get_namespace_name(get_rel_namespace(relid));
+   char   *tablename = get_rel_name(relid);
+
+   appendStringInfo(, "AND NOT (N.nspname = '%s' AND C.relname 
= '%s')\n",
+schemaname, tablename);
+   }

Would it be better to add "pfree(schemaname)" and "pfree(tablename)" after
calling appendStringInfo()?

Regards,
Shi yu


RE: Handle infinite recursion in logical replication setup

2022-09-05 Thread shiy.f...@fujitsu.com
On Tue, Sep 6, 2022 11:14 AM vignesh C  wrote:
> 
> Thanks for the comments, the attached patch has the changes for the same.
> 

Thanks for updating the patch. Here are some comments.

1.
+   if (subrel_count)
+   {
+   /*
+* In case of ALTER SUBSCRIPTION ... REFRESH, subrel_local_oids
+* contains the list of relation oids that are already present 
on the
+* subscriber. This check should be skipped for these tables.
+*/
+   appendStringInfo(, "AND C.oid NOT IN (\n"
+"SELECT C.oid \n"
+"FROM pg_class C\n"
+" JOIN pg_namespace N ON 
N.oid = C.relnamespace\n"
+"WHERE ");
+   get_skip_tables_str(subrel_local_oids, subrel_count, );
+   appendStringInfo(, ")");
+   }

I think we can skip the tables without using a subquery. See the SQL below.
Would it be better?

SELECT DISTINCT P.pubname AS pubname
FROM pg_publication P,
 LATERAL pg_get_publication_tables(P.pubname) GPT
 LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
 pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND P.pubname IN ('p1', 'p3')
AND NOT (N.nspname = 'public' AND C.relname = 'tbl')
AND NOT (N.nspname = 'public' AND C.relname = 't1');


2.
+   When using a subscription parameter combination of
+   copy_data=true and origin=NONE, the

Could we use "copy_data = true" and "origin = NONE" (add two spaces around the
equals sign)? I think it looks clearer that way. And it is consistent with other
places in this file.

Regards,
Shi yu


RE: Column Filtering in Logical Replication

2022-09-04 Thread shiy.f...@fujitsu.com
On Mon, Sep 5, 2022 8:28 AM Peter Smith  wrote:
> 
> I have rebased the remaining patch (v6-0001 is the same as v5-0002)
> 

Thanks for updating the patch. Here are some comments.

1.
+ the  will be successful but later
+ the WalSender on the publisher, or the subscriber may throw an error. In
+ this scenario, the user needs to recreate the subscription after adjusting

Should "WalSender" be changed to "walsender"? I saw "walsender" is used in other
places in the documentation.

2.
+test_pub=# CREATE TABLE t1(id int, a text, b text, c text, d text, e text, 
PRIMARY KEY(id));
+CREATE TABLE
+test_pub=#

+test_pub=# CREATE PUBLICATION p1 FOR TABLE t1 (id, b, a, d);
+CREATE PUBLICATION
+test_pub=#

I think the redundant "test_pub=#" can be removed.


Besides, I tested the examples in the patch, there's no problem.

Regards,
Shi yu


RE: Handle infinite recursion in logical replication setup

2022-08-31 Thread shiy.f...@fujitsu.com
On Wed, Aug 31, 2022 1:06 AM vignesh C  wrote:
> 
> The attached v43 patch has the changes for the same.
> 

Thanks for updating the patch.

Here is a comment on the 0001 patch.

+   if (!isnewtable)
+   {
+   pfree(nspname);
+   pfree(relname);
+   continue;
+   }

If it is a new table, in which case it would log a warning, should we also call
pfree()?

Regards,
Shi yu


RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-08-22 Thread shiy.f...@fujitsu.com
On Sat, Aug 20, 2022 7:02 PM Önder Kalacı  wrote:
> Hi,
> 
> I'm a little late to catch up with your comments, but here are my replies:

Thanks for your patch. Here are some comments.

1.
In FilterOutNotSuitablePathsForReplIdentFull(), is "nonPartialIndexPathList" a
good name for the list? Indexes on only expressions are also be filtered.

+static List *
+FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)
+{
+   ListCell   *lc;
+   List *nonPartialIndexPathList = NIL;

2.
+typedef struct LogicalRepPartMapEntry
+{
+   Oid partoid;/* LogicalRepPartMap's 
key */
+   LogicalRepRelMapEntry relmapentry;
+   Oid usableIndexOid; /* which index to use? (Invalid 
when no index
+* used) */
+} LogicalRepPartMapEntry;

For partition tables, is it possible to use relmapentry->usableIndexOid to mark
which index to use? Which means we don't need to add "usableIndexOid" to
LogicalRepPartMapEntry.

3.
It looks we should change the comment for FindReplTupleInLocalRel() in this
patch.

/*
 * Try to find a tuple received from the publication side (in 'remoteslot') in
 * the corresponding local relation using either replica identity index,
 * primary key or if needed, sequential scan.
 *
 * Local tuple, if found, is returned in '*localslot'.
 */
static bool
FindReplTupleInLocalRel(EState *estate, Relation localrel,

4.
@@ -2030,16 +2017,19 @@ apply_handle_delete_internal(ApplyExecutionData *edata,
 {
EState *estate = edata->estate;
Relationlocalrel = relinfo->ri_RelationDesc;
-   LogicalRepRelation *remoterel = >targetRel->remoterel;
+   LogicalRepRelMapEntry *targetRel = edata->targetRel;
+   LogicalRepRelation *remoterel = >remoterel;
EPQStateepqstate;
TupleTableSlot *localslot;

Do we need this change? I didn't see any place to use the variable targetRel
afterwards.

5.
+   if (!AttributeNumberIsValid(mainattno))
+   {
+   /*
+* There are two cases to consider. First, if the index 
is a primary or
+* unique key, we cannot have any indexes with 
expressions. So, at this
+* point we are sure that the index we deal is not 
these.
+*/
+   Assert(RelationGetReplicaIndex(rel) != 
RelationGetRelid(idxrel) &&
+  RelationGetPrimaryKeyIndex(rel) != 
RelationGetRelid(idxrel));
+
+   /*
+* For a non-primary/unique index with an expression, 
we are sure that
+* the expression cannot be used for replication index 
search. The
+* reason is that we create relevant index paths by 
providing column
+* equalities. And, the planner does not pick 
expression indexes via
+* column equality restrictions in the query.
+*/
+   continue;
+   }

Is it possible that it is a usable index with an expression? I think indexes
with an expression has been filtered in 
FilterOutNotSuitablePathsForReplIdentFull(). If it can't be a usable index with
an expression, maybe we shouldn't use "continue" here.

6.
In the following case, I got a result which is different from HEAD, could you
please look into it?

-- publisher
CREATE TABLE test_replica_id_full (x int); 
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL; 
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;

-- subscriber
CREATE TABLE test_replica_id_full (x int, y int); 
CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x,y); 
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' 
PUBLICATION tap_pub_rep_full;

-- publisher
INSERT INTO test_replica_id_full VALUES (1);
UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;

The data in subscriber:
on HEAD:
postgres=# select * from test_replica_id_full ;
 x | y
---+---
 2 |
(1 row)

After applying the patch:
postgres=# select * from test_replica_id_full ;
 x | y
---+---
 1 |
(1 row)

Regards,
Shi yu


RE: Perform streaming logical transactions by background workers and parallel apply

2022-08-17 Thread shiy.f...@fujitsu.com
On Wed, Aug 17, 2022 2:28 PM Wang, Wei/王 威  wrote:
> 
> On Tues, August 16, 2022 15:33 PM I wrote:
> > Attach the new patches.
> 
> I found that cfbot has a failure.
> After investigation, I think it is because the worker's exit state is not set
> correctly. So I made some slight modifications.
> 
> Attach the new patches.
> 

Thanks for updating the patch. Here are some comments.

0003 patch
==
1. src/backend/replication/logical/applybgworker.c
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg("cannot replicate target relation 
\"%s.%s\" using "
+   "subscription parameter 
streaming=parallel",
+   rel->remoterel.nspname, 
rel->remoterel.relname),
+errdetail("The unique column on subscriber is 
not the unique "
+  "column on publisher or 
there is at least one "
+  "non-immutable function."),
+errhint("Please change to use subscription 
parameter "
+"streaming=on.")));

Should we use "%s" instead of "streaming=parallel" and "streaming=on"?

2. src/backend/replication/logical/applybgworker.c
+* If any worker is handling the streaming transaction, this check 
needs to
+* be performed not only using the apply background worker, but also in 
the
+* main apply worker. This is because without these restrictions, main

this check needs to be performed not only using the apply background worker, but
also in the main apply worker.
->
this check not only needs to be performed by apply background worker, but also
by the main apply worker

3. src/backend/replication/logical/relation.c
+   if (ukey)
+   {
+   i = -1;
+   while ((i = bms_next_member(ukey, i)) >= 0)
+   {
+   attnum = AttrNumberGetAttrOffset(i + 
FirstLowInvalidHeapAttributeNumber);
+
+   if (entry->attrmap->attnums[attnum] < 0 ||
+   !bms_is_member(entry->attrmap->attnums[attnum], 
entry->remoterel.attunique))
+   {
+   entry->parallel_apply = PARALLEL_APPLY_UNSAFE;
+   return;
+   }
+   }
+
+   bms_free(ukey);

It looks we need to call bms_free() before return, right?

4. src/backend/replication/logical/relation.c
+   /* We don't need info for dropped or generated attributes */
+   if (att->attisdropped || att->attgenerated)
+   continue;

Would it be better to change the comment to:
We don't check dropped or generated attributes

5. src/test/subscription/t/032_streaming_apply.pl
+$node_publisher->wait_for_catchup($appname);
+
+# Then we check the foreign key on partition table.
+$node_publisher->wait_for_catchup($appname);

Here, wait_for_catchup() is called twice, we can remove the second one.

6. src/backend/replication/logical/applybgworker.c
+   /* If any workers (or the postmaster) have died, we have 
failed. */
+   if (status == APPLY_BGWORKER_EXIT)
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg("background worker %u failed to 
apply transaction %u",
+   
entry->wstate->shared->worker_id,
+   
entry->wstate->shared->stream_xid)));

Should we change the error message to "apply background worker %u failed to
apply transaction %u" ? To be consistent with the error message in
apply_bgworker_wait_for().

0004 patch
==
1.
I saw that the commit message says:
If the subscriber exits with an error, this flag will be set true, and
whenever the transaction is applied successfully, this flag is reset false.

"subretry" is set to false if a transaction is applied successfully, it looks
similar to what clear_subscription_skip_lsn() does, so maybe we should remove
the following change in apply_handle_stream_abort()? Or only call
set_subscription_retry() when rollbacking the toplevel transaction.

@@ -1671,6 +1688,9 @@ apply_handle_stream_abort(StringInfo s)
 */
serialize_stream_abort(xid, subxid);
}
+
+   /* Reset the retry flag. */
+   set_subscription_retry(false);
}
 
reset_apply_error_context_info();

2. src/backend/replication/logical/worker.c
+   /* Reset subretry */
+   values[Anum_pg_subscription_subretry - 1] = BoolGetDatum(retry);
+   

RE: Perform streaming logical transactions by background workers and parallel apply

2022-08-05 Thread shiy.f...@fujitsu.com
On Thu, Aug 4, 2022 2:36 PM Wang, Wei/王 威  wrote:
> 
> I also did some other improvements based on the suggestions posted in this
> thread. Attach the new patches.
> 

Thanks for updating the patch. Here are some comments on v20-0001 patch.

1.
+typedef struct ApplyBgworkerShared
+{
+   slock_t mutex;
+
+   /* Status of apply background worker. */
+   ApplyBgworkerStatus status;
+
+   /* proto version of publisher. */
+   uint32  proto_version;
+
+   TransactionId   stream_xid;
+
+   /* id of apply background worker */
+   uint32  worker_id;
+} ApplyBgworkerShared;

Would it be better to modify the comment of "proto_version" to "Logical protocol
version"?

2. commnent of handle_streamed_transaction()

+ * Exception: When the main apply worker is applying streaming transactions in
+ * parallel mode (e.g. when addressing LOGICAL_REP_MSG_RELATION or
+ * LOGICAL_REP_MSG_TYPE changes), then return false.

This comment doesn't look very clear, could we change it to:

Exception: In SUBSTREAM_PARALLEL mode, if the message type is
LOGICAL_REP_MSG_RELATION or LOGICAL_REP_MSG_TYPE, return false even if this is
the main apply worker.

3.
+/*
+ * There are three fields in message: start_lsn, end_lsn and send_time. Because
+ * we have updated these statistics in apply worker, we could ignore these
+ * fields in apply background worker. (see function LogicalRepApplyLoop)
+ */
+#define SIZE_STATS_MESSAGE (3 * sizeof(uint64))

updated these statistics in apply worker
->
updated these statistics in main apply worker

4.
+static void
+LogicalApplyBgwLoop(shm_mq_handle *mqh, volatile ApplyBgworkerShared *shared)
+{
+   shm_mq_result shmq_res;
+   PGPROC *registrant;
+   ErrorContextCallback errcallback;

I think we can define "shmq_res" in the for loop.

5.
+   /*
+* We use first byte of message for additional communication 
between
+* main Logical replication worker and apply background 
workers, so if
+* it differs from 'w', then process it first.
+*/

between main Logical replication worker and apply background workers
->
between main apply worker and apply background workers

Regards,
Shi yu



RE: Introduce wait_for_subscription_sync for TAP tests

2022-08-04 Thread shiy.f...@fujitsu.com
On Thu, Aug 4, 2022 5:49 PM shiy.f...@fujitsu.com  wrote:
> 
> On Thu, Aug 4, 2022 2:28 PM Masahiko Sawada 
> wrote:
> >
> > On Thu, Aug 4, 2022 at 10:37 AM Amit Kapila 
> > wrote:
> > >
> > > On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila
> 
> > wrote:
> > > >
> > > > Pushed this one and now I'll look at your other patch.
> > > >
> > >
> > > I have pushed the second patch as well after making minor changes in
> > > the comments. Alvaro [1] and Tom [2] suggest to back-patch this and
> > > they sound reasonable to me. Will you be able to produce back branch
> > > patches?
> >
> > Yes. I've attached patches for backbranches. The updates are
> > straightforward on v11 - v15. However, on v10, we don't use
> > wait_for_catchup() in some logical replication test cases. The commit
> > bbd3363e128dae refactored the tests to use wait_for_catchup but it's
> > not backpatched. So in the patch for v10, I didn't change the code
> > that was changed by the commit. Also, since wait_for_catchup requires
> > to specify $target_lsn, unlike the one in v11 or later, I changed
> > wait_for_subscription_sync() accordingly.
> >
> 
> Thanks for your patches.
> 
> In the patches for pg11 ~ pg14, it looks we need to add a "=pod" before the
> current change in PostgresNode.pm. Right?
> 

By the way, I notice that in 002_types.pl (on master branch), it seems the "as
well" in the following comment should be removed. Is it worth being fixed?

$node_subscriber->safe_psql('postgres',
"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' 
PUBLICATION tap_pub WITH (slot_name = tap_sub_slot)"
);

# Wait for initial sync to finish as well
$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');

Regards,
Shi yu



RE: Introduce wait_for_subscription_sync for TAP tests

2022-08-04 Thread shiy.f...@fujitsu.com
On Thu, Aug 4, 2022 2:28 PM Masahiko Sawada  wrote:
> 
> On Thu, Aug 4, 2022 at 10:37 AM Amit Kapila 
> wrote:
> >
> > On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila 
> wrote:
> > >
> > > Pushed this one and now I'll look at your other patch.
> > >
> >
> > I have pushed the second patch as well after making minor changes in
> > the comments. Alvaro [1] and Tom [2] suggest to back-patch this and
> > they sound reasonable to me. Will you be able to produce back branch
> > patches?
> 
> Yes. I've attached patches for backbranches. The updates are
> straightforward on v11 - v15. However, on v10, we don't use
> wait_for_catchup() in some logical replication test cases. The commit
> bbd3363e128dae refactored the tests to use wait_for_catchup but it's
> not backpatched. So in the patch for v10, I didn't change the code
> that was changed by the commit. Also, since wait_for_catchup requires
> to specify $target_lsn, unlike the one in v11 or later, I changed
> wait_for_subscription_sync() accordingly.
> 

Thanks for your patches.

In the patches for pg11 ~ pg14, it looks we need to add a "=pod" before the
current change in PostgresNode.pm. Right?

Regards,
Shi yu


RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-08-03 Thread shiy.f...@fujitsu.com
On Wed, Aug 3, 2022 12:06 PM Masahiko Sawada  wrote:
> 
> I've attached updated patches that incorporated the above comments as
> well as the comments from Shi yu. Please review them.
> 

Thanks for updating the patch.

I noticed that in SnapBuildXidSetCatalogChanges(), "i" is initialized in the if
branch in REL10 patch, which is different from REL11 patch. Maybe we can modify
REL11 patch to be consistent with REL10 patch.

The rest of the patch looks good to me.

Regards,
Shi yu


RE: Handle infinite recursion in logical replication setup

2022-08-02 Thread shiy.f...@fujitsu.com
On Fri, Jul 29, 2022 1:22 PM vignesh C  wrote:
> 
> On Fri, Jul 29, 2022 at 8:31 AM Peter Smith 
> wrote:
> >
> 
> Thanks for the comments, the attached v41 patch has the changes for the
> same.
> 

Thanks for updating the patch.

A comment for 0002 patch.

In the example in section 31.11.4 (Generic steps for adding a new primary to an
existing set of primaries), I think there should be a Step-6, corresponding to
the steps mentioned before. And part of Step-5 should actually be part of
Step-6.

Regards,
Shi yu


RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-08-02 Thread shiy.f...@fujitsu.com
On Mon, Aug 1, 2022 10:31 PM Amit Kapila  wrote:
> 
> On Mon, Aug 1, 2022 at 7:46 AM Masahiko Sawada
>  wrote:
> >
> > On Fri, Jul 29, 2022 at 3:45 PM Amit Kapila 
> wrote:
> > >
> >
> > I've attached updated patches for all branches. Please review them.
> >
> 
> Thanks, the patches look mostly good to me. I have made minor edits by
> removing 'likely' from a few places as those don't seem to be adding
> much value, changed comments at a few places, and was getting
> compilation in error in v11/10 (snapbuild.c:2111:3: error: ‘for’ loop
> initial declarations are only allowed in C99 mode) which I have fixed.
> See attached, unless there are major comments/suggestions, I am
> planning to push this day after tomorrow (by Wednesday) after another
> pass.
> 

Thanks for updating the patch.

Here are some minor comments:

1.
patches for REL10 ~ REL13:
+ * Mark the transaction as containing catalog changes. In addition, if the
+ * given xid is in the list of the initial running xacts, we mark the
+ * its subtransactions as well. See comments for NInitialRunningXacts and
+ * InitialRunningXacts for additional info.

"mark the its subtransactions"
->
"mark its subtransactions"

2.
patches for REL10 ~ REL15:
In the comment in catalog_change_snapshot.spec, maybe we can use "RUNNING_XACTS"
instead of "RUNNING_XACT" "XACT_RUNNING", same as the patch for master branch.

Regards,
Shi yu



RE: Handle infinite recursion in logical replication setup

2022-07-31 Thread shiy.f...@fujitsu.com
On Fri, Jul 29, 2022 1:22 PM vignesh C  wrote:
> 
> 
> Thanks for the comments, the attached v41 patch has the changes for the
> same.
> 

Thanks for updating the patch.

I wonder in the case that the publisher uses PG15 (or before), subscriber uses
PG16, should we have this check (check if publication tables were also
subscribing from other publishers)? In this case, even if origin=none is
specified, it doesn't work because the publisher doesn't filter the origin. So
maybe we don't need the check for initial sync. Thoughts?

Regards,
Shi yu


RE: Introduce wait_for_subscription_sync for TAP tests

2022-07-27 Thread shiy.f...@fujitsu.com
On Tue, Jul 26, 2022 3:42 PM Masahiko Sawada  wrote:
> 
> I've attached an updated patch as well as a patch to remove duplicated
> waits in 007_ddl.pl.
> 

Thanks for your patch. Here are some comments.

1.
I think some comments need to be changed in the patch.
For example:
# Also wait for initial table sync to finish
# Wait for initial sync to finish as well

Words like "Also" and "as well" can be removed now, we originally used them
because we wait for catchup and "also" wait for initial sync.

2.
In the following places, we can remove wait_for_catchup() and then call it in
wait_for_subscription_sync().

2.1.
030_origin.pl:
@@ -128,8 +120,7 @@ $node_B->safe_psql(
 
 $node_C->wait_for_catchup($appname_B2);
 
-$node_B->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+$node_B->wait_for_subscription_sync;
 
2.2.
031_column_list.pl:
@@ -385,7 +373,7 @@ $node_subscriber->safe_psql(
ALTER SUBSCRIPTION sub1 SET PUBLICATION pub2, pub3
 ));
 
-wait_for_subscription_sync($node_subscriber);
+$node_subscriber->wait_for_subscription_sync;
 
 $node_publisher->wait_for_catchup('sub1');

2.3.
100_bugs.pl:
@@ -281,8 +276,7 @@ $node_subscriber->safe_psql('postgres',
 $node_publisher->wait_for_catchup('tap_sub');
 
 # Also wait for initial table sync to finish
-$node_subscriber->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+$node_subscriber->wait_for_subscription_sync;
 
 is( $node_subscriber->safe_psql(
'postgres', "SELECT * FROM tab_replidentity_index"),

Regards,
Shi yu


RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-26 Thread shiy.f...@fujitsu.com
On Tue, Jul 26, 2022 3:52 PM Masahiko Sawada  wrote:
> 
> On Tue, Jul 26, 2022 at 2:18 PM Amit Kapila 
> wrote:
> >
> > On Tue, Jul 26, 2022 at 7:00 AM Masahiko Sawada
>  wrote:
> > >
> > > On Mon, Jul 25, 2022 at 7:57 PM shiy.f...@fujitsu.com
> > >  wrote:
> > > >
> > > >
> > > > case 3
> > > > -
> > > > There are 64 catalog modifying transactions.
> > > > Decode 100k/500k/1M transactions.
> > > >
> > > > 100k500k1M
> > > > master  0.0600  0.1666  0.4876
> > > > patched 0.0620  0.1653  0.4795
> > > >
> > > >
> > > > Summary of the tests:
> > > > After applying this patch, there is a overhead of about 3% in the case
> decoding
> > > > 100k transactions with 64 catalog modifying transactions. This is an
> extreme
> > > > case, so maybe it's okay.
> > >
> > > Yes. If we're worried about the overhead and doing bsearch() is the
> > > cause, probably we can try simplehash instead of the array.
> > >
> >
> > I am not sure if we need to go that far for this extremely corner
> > case. Let's first try your below idea.
> >
> > > An improvement idea is that we pass the parsed->xinfo down to
> > > SnapBuildXidHasCatalogChanges(), and then return from that function
> > > before doing bearch() if the parsed->xinfo doesn't have
> > > XACT_XINFO_HAS_INVALS. That would save calling bsearch() for
> > > non-catalog-modifying transactions. Is it worth trying?
> > >
> >
> > I think this is worth trying and this might reduce some of the
> > overhead as well in the case presented by Shi-San.
> 
> Okay, I've attached an updated patch that does the above idea. Could
> you please do the performance tests again to see if the idea can help
> reduce the overhead, Shi yu?
> 

Thanks for your improvement. I have tested the case which shows overhead before
(decoding 100k transactions with 64 catalog modifying transactions) for the v9
patch, the result is as follows.

master  0.0607
patched 0.0613

There's almost no difference compared with master (less than 1%), which looks
good to me.

Regards,
Shi yu


RE: Handle infinite recursion in logical replication setup

2022-07-26 Thread shiy.f...@fujitsu.com
On Sun, Jul 24, 2022 1:28 AM vignesh C  wrote:
> 
> Added a  note for the same and referred it to the conflicts section.
> 
> Thanks for the comments, the attached v38 patch has the changes for the
> same.
> 

Thanks for updating the patch. A comment on the test in 0001 patch.

+# Alter subscription ... refresh publication should fail when a new table is
+# subscribing data from a different publication should fail
+($result, $stdout, $stderr) = $node_A->psql(
+   'postgres', "
+ALTER SUBSCRIPTION tap_sub_A2 REFRESH PUBLICATION");
+like(
+   $stderr,
+   qr/ERROR: ( [A-Z0-9]+:)? could not replicate table "public.tab_new"/,
+   "Create subscription with origin and copy_data having replicated table 
in publisher"
+);

The comment says "should fail" twice, the latter one can be removed.

Besides, "Create subscription with origin and copy_data" should be changed to
"Alter subscription with origin and copy_data" I think.

Regards,
Shi yu


RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-25 Thread shiy.f...@fujitsu.com
Hi,

I did some performance test for the master branch patch (based on v6 patch) to
see if the bsearch() added by this patch will cause any overhead.

I tested them three times and took the average.

The results are as follows, and attach the bar chart.

case 1
-
No catalog modifying transaction.
Decode 800k pgbench transactions. (8 clients, 100k transactions per client)

master  7.5417
patched 7.4107

case 2
-
There's one catalog modifying transaction.
Decode 100k/500k/1M transactions.

100k500k1M
master  0.0576  0.1491  0.4346
patched 0.0586  0.1500  0.4344

case 3
-
There are 64 catalog modifying transactions.
Decode 100k/500k/1M transactions.

100k500k1M
master  0.0600  0.1666  0.4876
patched 0.0620  0.1653  0.4795

(Because the result of case 3 shows that there is a overhead of about 3% in the
case decoding 100k transactions with 64 catalog modifying transactions, I
tested the next run of 100k xacts with or without catalog modifying
transactions, to see if it affects subsequent decoding.)

case 4.1
-
After the test steps in case 3 (64 catalog modifying transactions, decode 100k
transactions), run 100k xacts and then decode.

master  0.3699
patched 0.3701

case 4.2
-
After the test steps in case 3 (64 catalog modifying transactions, decode 100k
transactions), run 64 DDLs(without checkpoint) and 100k xacts, then decode.

master  0.3687
patched 0.3696

Summary of the tests:
After applying this patch, there is a overhead of about 3% in the case decoding
100k transactions with 64 catalog modifying transactions. This is an extreme
case, so maybe it's okay. And case 4.1 and case 4.2 shows that the patch has no
effect on subsequent decoding. In other cases, there are no significant
differences.

For your information, here are the parameters specified in postgresql.conf in
the test.

shared_buffers = 8GB
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off

Regards,
Shi yu


RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-17 Thread shiy.f...@fujitsu.com
On Fri, Jul 15, 2022 10:39 PM Masahiko Sawada  wrote:
> 
> This patch should have the fix for the issue that Shi yu reported. Shi
> yu, could you please test it again with this patch?
> 

Thanks for updating the patch!
I have tested and confirmed that the problem I found has been fixed.

Regards,
Shi yu


RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-15 Thread shiy.f...@fujitsu.com
On Mon, Jul 11, 2022 9:54 PM Masahiko Sawada  wrote:
> 
> I've attached an updated patch, please review it.
> 

Thanks for your patch. Here are some comments for the REL14-v1 patch.

1.
+   Sizesz = sizeof(TransactionId) * nxacts;;

There is a redundant semicolon at the end.

2.
+   workspace = MemoryContextAlloc(rb->context, 
rb->n_initial_running_xacts);

Should it be:
+   workspace = MemoryContextAlloc(rb->context, sizeof(TransactionId) * 
rb->n_initial_running_xacts);

3.
+   /* bound check if there is at least one transaction to be removed */
+   if (NormalTransactionIdPrecedes(rb->initial_running_xacts[0],
+   
running->oldestRunningXid))
+   return;
+

Here, I think it should return if rb->initial_running_xacts[0] is older than
oldestRunningXid, right? Should it be changed to:

+   if (!NormalTransactionIdPrecedes(rb->initial_running_xacts[0],
+   
running->oldestRunningXid))
+   return;

4.
+   if ((parsed->xinfo & XACT_XINFO_HAS_INVALS) != 0)

Maybe we can change it like the following, to be consistent with other places in
this file. It's also fine if you don't change it.

+   if (parsed->xinfo & XACT_XINFO_HAS_INVALS)


Regards,
Shi yu


RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-13 Thread shiy.f...@fujitsu.com
On Tue, Jul 12, 2022 5:23 PM Masahiko Sawada  wrote:
> 
> On Tue, Jul 12, 2022 at 5:58 PM shiy.f...@fujitsu.com
>  wrote:
> >
> > It happened when executing the following code because it tried to free a
> NULL
> > pointer (catchange_xip).
> >
> > /* be tidy */
> > if (ondisk)
> > pfree(ondisk);
> > +   if (catchange_xip)
> > +   pfree(catchange_xip);
> >  }
> >
> > It seems to be related to configure option. I could reproduce it when using
> > `./configure --enable-debug`.
> > But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og -
> ggdb"`.
> 
> Hmm, I could not reproduce this problem even if I use ./configure
> --enable-debug. And it's weird that we checked if catchange_xip is not
> null but we did pfree for it:
> 
> #1  pfree (pointer=0x0) at mcxt.c:1177
> #2  0x0078186b in SnapBuildSerialize (builder=0x1fd5e78,
> lsn=25719712) at snapbuild.c:1792
> 
> Is it reproducible in your environment?

Thanks for your reply! Yes, it is reproducible. And I also reproduced it on the
v4 patch you posted [1].

[1] 
https://www.postgresql.org/message-id/CAD21AoAyNPrOFg%2BQGh%2B%3D4205TU0%3DyrE%2BQyMgzStkH85uBZXptQ%40mail.gmail.com

> If so, could you test it again
> with the following changes?
> 
> diff --git a/src/backend/replication/logical/snapbuild.c
> b/src/backend/replication/logical/snapbuild.c
> index d015c06ced..a6e76e3781 100644
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -1788,7 +1788,7 @@ out:
> /* be tidy */
> if (ondisk)
> pfree(ondisk);
> -   if (catchange_xip)
> +   if (catchange_xip != NULL)
> pfree(catchange_xip);
>  }
> 

I tried this and could still reproduce the problem.

Besides, I tried the suggestion from Amit [2],  it could be fixed by checking
the value of catchange_xcnt instead of catchange_xip before pfree.

[2] 
https://www.postgresql.org/message-id/CAA4eK1%2BXPdm8G%3DEhUJA12Pi1YvQAfcz2%3DkTd9a4BjVx4%3Dgk-MA%40mail.gmail.com

diff --git a/src/backend/replication/logical/snapbuild.c 
b/src/backend/replication/logical/snapbuild.c
index c482e906b0..68b9c4ef7d 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1573,7 +1573,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
Sizeneeded_length;
SnapBuildOnDisk *ondisk = NULL;
TransactionId   *catchange_xip = NULL;
-   size_t  catchange_xcnt;
+   size_t  catchange_xcnt = 0;
char   *ondisk_c;
int fd;
chartmppath[MAXPGPATH];
@@ -1788,7 +1788,7 @@ out:
/* be tidy */
if (ondisk)
pfree(ondisk);
-   if (catchange_xip)
+   if (catchange_xcnt != 0)
pfree(catchange_xip);
 }


Regards,
Shi yu


RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-12 Thread shiy.f...@fujitsu.com
On Tue, Jul 12, 2022 8:49 AM Masahiko Sawada  wrote:
> 
> I've attached an updated patch.
> 

Hi,

I met a segmentation fault in test_decoding test after applying the patch for 
master
branch. Attach the backtrace.

It happened when executing the following code because it tried to free a NULL
pointer (catchange_xip).

/* be tidy */
if (ondisk)
pfree(ondisk);
+   if (catchange_xip)
+   pfree(catchange_xip);
 }

It seems to be related to configure option. I could reproduce it when using
`./configure --enable-debug`.
But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og -ggdb"`.

Regards,
Shi yu
#0  0x00910333 in GetMemoryChunkContext (pointer=0x0) at 
../../../../src/include/utils/memutils.h:129
#1  pfree (pointer=0x0) at mcxt.c:1177
#2  0x0078186b in SnapBuildSerialize (builder=0x1fd5e78, lsn=25719712) 
at snapbuild.c:1792
#3  0x00782797 in SnapBuildProcessRunningXacts (builder=0x1fd5e78, 
lsn=25719712, running=0x27dfe50) at snapbuild.c:1199
#4  0x00774273 in standby_decode (ctx=0x1fc3e08, buf=0x7ffd8c95b5d0) at 
decode.c:346
#5  0x00773ab3 in LogicalDecodingProcessRecord 
(ctx=ctx@entry=0x1fc3e08, record=0x1fc41a0) at decode.c:119
#6  0x0077815d in pg_logical_slot_get_changes_guts (fcinfo=0x1fb5d88, 
confirm=, binary=) at logicalfuncs.c:271
#7  0x0067b63d in ExecMakeTableFunctionResult (setexpr=0x1fb42e0, 
econtext=0x1fb41b0, argContext=, expectedDesc=0x1fd7da8, 
randomAccess=false)
at execSRF.c:234
#8  0x0068b76f in FunctionNext (node=node@entry=0x1fb3fa0) at 
nodeFunctionscan.c:94
#9  0x0067be37 in ExecScanFetch (recheckMtd=0x68b450 , 
accessMtd=0x68b470 , node=0x1fb3fa0) at execScan.c:133
#10 ExecScan (node=0x1fb3fa0, accessMtd=0x68b470 , 
recheckMtd=0x68b450 ) at execScan.c:199
#11 0x0067344b in ExecProcNode (node=0x1fb3fa0) at 
../../../src/include/executor/executor.h:259
#12 ExecutePlan (execute_once=, dest=0x1fc02e8, 
direction=, numberTuples=0, sendTuples=, 
operation=CMD_SELECT,
use_parallel_mode=, planstate=0x1fb3fa0, estate=0x1fb3d78) 
at execMain.c:1636
#13 standard_ExecutorRun (queryDesc=0x1f9e178, direction=, 
count=0, execute_once=) at execMain.c:363
#14 0x007dda9e in PortalRunSelect (portal=0x1f51338, forward=, count=0, dest=) at pquery.c:924
#15 0x007decf7 in PortalRun (portal=portal@entry=0x1f51338, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0x1fc02e8, 
altdest=altdest@entry=0x1fc02e8, qc=0x7ffd8c95bbf0) at pquery.c:768
#16 0x007db4ff in exec_simple_query (
query_string=0x1ee29a8 "SELECT data FROM 
pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', 
'1', 'include-xids', '0');")
at postgres.c:1243
#17 0x007dc742 in PostgresMain (dbname=, 
username=) at postgres.c:4482
#18 0x0076132b in BackendRun (port=, port=) at postmaster.c:4503
#19 BackendStartup (port=) at postmaster.c:4231
#20 ServerLoop () at postmaster.c:1805
#21 0x007621fb in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x1edbf20) at postmaster.c:1477
#22 0x004f0f69 in main (argc=3, argv=0x1edbf20) at main.c:202


RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-11 Thread shiy.f...@fujitsu.com
On Tue, Jul 12, 2022 8:49 AM Masahiko Sawada  wrote:
> 
> I've attached an updated patch.
> 
> While trying this idea, I noticed there is no API to get the length of
> dlist, as we discussed offlist. Alternative idea was to use List
> (T_XidList) but I'm not sure it's a great idea since deleting an xid
> from the list is O(N), we need to implement list_delete_xid, and we
> need to make sure allocating list node in the reorder buffer context.
> So in the patch, I added a variable, catchange_ntxns, to keep track of
> the length of the dlist. Please review it.
> 

Thanks for your patch. Here are some comments on the master patch.

1.
In catalog_change_snapshot.spec, should we use "RUNNING_XACTS record" instead of
"RUNNING_XACT record" / "XACT_RUNNING record" in the comment?

2.
+* Since catchange.xip is sorted, we find the lower bound of
+* xids that sill are interesting.

Typo?
"sill" -> "still"

3.
+* This array is set once when restoring the snapshot, xids are removed
+* from the array when decoding xl_running_xacts record, and then 
eventually
+* becomes an empty.

+   /* catchange list becomes an empty */
+   pfree(builder->catchange.xip);
+   builder->catchange.xip = NULL;

Should "becomes an empty" be modified to "becomes empty"?

4.
+ * changes that are smaller than ->xmin. Those won't ever get checked via
+ * the ->committed array and ->catchange, respectively. The committed xids will

Should we change 
"the ->committed array and ->catchange"
to
"the ->committed or ->catchange array"
?

Regards,
Shi yu



RE: Perform streaming logical transactions by background workers and parallel apply

2022-07-06 Thread shiy.f...@fujitsu.com
On Tue, Jun 28, 2022 11:22 AM Wang, Wei/王 威  wrote:
> 
> I also improved patches as suggested by Peter-san in [1] and [2].
> Thanks for Shi Yu to improve the patches by addressing the comments in [2].
> 
> Attach the new patches.
> 

Thanks for updating the patch.

Here are some comments.

0001 patch
==
1.
+   /* Check If there are free worker slot(s) */
+   LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);

I think "Check If" should be "Check if".

0003 patch
==
1.
Should we call apply_bgworker_relation_check() in apply_handle_truncate()?

0004 patch
==
1.
@@ -3932,6 +3958,9 @@ start_apply(XLogRecPtr origin_startpos)
}
PG_CATCH();
{
+   /* Set the flag that we will retry later. */
+   set_subscription_retry(true);
+
if (MySubscription->disableonerr)
DisableSubscriptionAndExit();
Else

I think we need to emit the error and recover from the error state before
setting the retry flag, like what we do in DisableSubscriptionAndExit().
Otherwise if an error is detected when setting the retry flag, we won't get the
error message reported by the apply worker.

Regards,
Shi yu


RE: Handle infinite recursion in logical replication setup

2022-07-04 Thread shiy.f...@fujitsu.com
On Mon, Jul 4, 2022 4:17 PM shiy.f...@fujitsu.com  wrote:
> 
> On Sun, Jul 3, 2022 11:00 PM vignesh C  wrote:
> >
> > Thanks for the comments, the attached v27 patch has the changes for the
> > same.
> >
> 
> Thanks for updating the patch.
> 
> A comment on 0003 patch:
> 
> I think we should call ExecClearTuple() before getting next tuple, so it 
> should
> be called if the table is in ready state. How about modifying it to:
> 

By the way, I have tested pg_dump/psql changes in this patch with older version
(server: pg10 ~ pg15, pg_dump/psql: pg16), and it worked ok.

Regards,
Shi yu


RE: Handle infinite recursion in logical replication setup

2022-07-04 Thread shiy.f...@fujitsu.com
On Sun, Jul 3, 2022 11:00 PM vignesh C  wrote:
> 
> Thanks for the comments, the attached v27 patch has the changes for the
> same.
> 

Thanks for updating the patch.

A comment on 0003 patch:
+   /*
+* No need to throw an error for the tables that are in ready 
state,
+* as the walsender will send the changes from WAL in case of 
tables
+* in ready state.
+*/
+   if (isreadytable)
+   continue;
+
...
+   ereport(ERROR,
+   
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+   errmsg("table: \"%s.%s\" might have replicated 
data in the publisher",
+  nspname, relname),
+   errdetail("CREATE/ALTER SUBSCRIPTION with 
origin = local and copy_data = on is not allowed when the publisher might have 
replicated data."),
+   errhint("Use CREATE/ALTER SUBSCRIPTION with 
copy_data = off/force."));
+
+   ExecClearTuple(slot);

I think we should call ExecClearTuple() before getting next tuple, so it should
be called if the table is in ready state. How about modifying it to:
if (!isreadytable)
ereport(ERROR,

errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("table: \"%s.%s\" might have 
replicated data in the publisher",
   nspname, relname),
errdetail("CREATE/ALTER SUBSCRIPTION 
with origin = local and copy_data = on is not allowed when the publisher might 
have replicated data."),
errhint("Use CREATE/ALTER SUBSCRIPTION 
with copy_data = off/force."));

ExecClearTuple(slot);

Regards,
Shi yu



RE: Handle infinite recursion in logical replication setup

2022-06-29 Thread shiy.f...@fujitsu.com
On Tue, Jun 28, 2022 2:18 PM vignesh C  wrote:
> 
> Thanks for the comments, the attached  v25 patch has the changes for the
> same.
> 

Thanks for updating the patch. Here are some comments.

0002 patch:
==
1.
+# Test the CREATE SUBSCRIPTION 'origin' parameter and its interaction with
+# 'copy_data' parameter.

It seems we should move "and its interaction with 'copy_data' parameter" to
0003 patch.

0003 patch
==
1.
When using ALTER SUBSCRIPTION ... REFRESH, subscription will throw an error if
any table is subscribed in publisher, even if the table has been subscribed
before refresh (which won't do the initial copy when refreshing). It looks the
previously subscribed tables don't need this check. Would it be better that we
only check the tables which need to do the initial copy?

2.
+   errmsg("table:%s.%s might have replicated data 
in the publisher",
+  nspname, relname),

I think the table name needs to be enclosed in double quotes, which is
consistent with other messages.

Regards,
Shi yu


RE: tablesync copy ignores publication actions

2022-06-22 Thread shiy.f...@fujitsu.com
On Wed, Jun 22, 2022 4:49 PM Peter Smith  wrote:
> 
> On Wed, Jun 22, 2022 at 2:18 PM Amit Kapila 
> wrote:
> >
> > On Thu, Jun 16, 2022 at 6:07 AM Peter Smith 
> wrote:
> >
> > >
> > > Thank you for your review comments. Those reported mistakes are fixed
> > > in the attached patch v3.
> > >
> >
> > This patch looks mostly good to me except for a few minor comments
> > which are mentioned below. It is not very clear in which branch(es) we
> > should commit this patch? As per my understanding, this is a
> > pre-existing behavior but we want to document it because (a) It was
> > not already documented, and (b) we followed it for row filters in
> > PG-15 it seems that should be explained. So, we have the following
> > options (a) commit it only for PG-15, (b) commit for PG-15 and
> > backpatch the relevant sections, or (c) commit it when branch opens
> > for PG-16. What do you or others think?
> 
> Even though this is a very old docs omission, AFAIK nobody ever raised
> it as a problem before. It only became more important because of the
> PG15 row-filters. So I think option (a) is ok.
> 

I also think option (a) is ok.

> 
> PSA patch v4 to address all the above review comments.
> 

Thanks for updating the patch. It looks good to me.

Besides, I tested the examples in the patch, and there's no problem.

Regards,
Shi yu


RE: Handle infinite recursion in logical replication setup

2022-06-22 Thread shiy.f...@fujitsu.com
On Mon, Jun 20, 2022 7:55 PM vignesh C  wrote:
> 
> Thanks for the comment, the v22 patch attached has the changes for the
> same.

Thanks for updating the patch, here are some comments on 0003 patch.

1. 032_origin.pl
+###
+# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) bidirectional
+# replication setup when the existing nodes (node_A & node_B) and the new node
+# (node_C) does not have any data.
+###

"does not have any data" should be "do not have any data" I think.

2.
The comment for 032_origin.pl:

# Test the CREATE SUBSCRIPTION 'origin' parameter.

After applying this patch, this file tests no only 'origin' parameter, but also
"copy_data" parameter, so should we modify this comment?

Besides, should we change the file name in this patch? It looks more like test
cases for bidirectional logical replication.

3. subscriptioncmds.c
/* Set default values for the boolean supported options. */
...
if (IsSet(supported_opts, SUBOPT_CREATE_SLOT))
opts->create_slot = true;
if (IsSet(supported_opts, SUBOPT_COPY_DATA))
-   opts->copy_data = true;
+   opts->copy_data = COPY_DATA_ON;
if (IsSet(supported_opts, SUBOPT_REFRESH))
opts->refresh = true;
if (IsSet(supported_opts, SUBOPT_BINARY))

"copy_data" option is not Boolean now, which is inconsistent with the comment.
So maybe we can change the comment here? ("the boolean supported options" ->
"the supported options")

Regards,
Shi yu


RE: Replica Identity check of partition table on subscriber

2022-06-21 Thread shiy.f...@fujitsu.com
On Tuesday, June 21, 2022 4:49 PM Amit Kapila  wrote:
> 
> On Tue, Jun 21, 2022 at 12:50 PM Amit Langote 
> wrote:
> >
> > On Tue, Jun 21, 2022 at 3:35 PM houzj.f...@fujitsu.com
> >  wrote:
> >
> > Attached a patch containing the above to consider as an alternative.
> >
> 
> Thanks, the patch looks good to me. I'll push this after doing some testing.

The patch looks good to me as well.

I also verified that the patch can be applied cleanly on back-branches and I
confirmed that the bug exists on back branches before this patch and is fixed
after applying this patch. The regression tests also passed with and without
RELCACHE_FORCE_RELEASE option in my machine.

Regards,
Shi yu


RE: Replica Identity check of partition table on subscriber

2022-06-20 Thread shiy.f...@fujitsu.com
On Mon, Jun 20, 2022 1:33 PM Amit Kapila  wrote:
> 
> On Fri, Jun 17, 2022 at 11:22 AM shiy.f...@fujitsu.com
>  wrote:
> >
> > On Fri Jun 17, 2022 11:06 AM shiy.f...@fujitsu.com 
> wrote:
> > >
> > > Attached the new version of patch set. I also moved the partitioned table
> > > check
> > > in logicalrep_rel_mark_updatable() to check_relation_updatable() as
> > > discussed
> > > [2].
> > >
> >
> > Attached back-branch patches of the first patch.
> >
> 
> One minor comment:
> + /*
> + * If it is a partitioned table, we don't check it, we will check its
> + * partition later.
> + */
> 
> Can we change the above comment to: "For partitioned tables, we only
> need to care if the target partition is updatable (aka has PK or RI
> defined for it)."?
> 

Thanks for your comment. Modified in the attached patches. 

Regards,
Shi yu


v11-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patch
Description:  v11-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patch


v11-pg13-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patch
Description:  v11-pg13-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patch


v11-pg14-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patch
Description:  v11-pg14-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patch


RE: Handle infinite recursion in logical replication setup

2022-06-19 Thread shiy.f...@fujitsu.com
On Thu, Jun 16, 2022 6:18 PM vignesh C  wrote:
> 
> Thanks for the comments, the attached v21 patch has the changes for the
> same.
> 

Thanks for updating the patch. Here are some comments.

0002 patch
==
1.
+  publisher to only send changes that originated locally. Setting
+  origin to any means that that
+  the publisher sends any changes regardless of their origin. The
+  default is any.

It seems there's a redundant "that" at the end of second line.

2.
+
+  
+   suborigin text
+  
+  
+   Possible origin values are local or
+   any. The default is any.
+   If local, the subscription will request the publisher
+   to only send changes that originated locally. If any
+   the publisher sends any changes regardless of their origin.
+  
+ .

A comma can be added after "If any".

3.
@@ -4589,6 +4598,8 @@ dumpSubscription(Archive *fout, const SubscriptionInfo 
*subinfo)
if (strcmp(subinfo->subdisableonerr, "t") == 0)
appendPQExpBufferStr(query, ", disable_on_error = true");
 
+   appendPQExpBuffer(query, ", origin = %s", subinfo->suborigin);
+
if (strcmp(subinfo->subsynccommit, "off") != 0)
appendPQExpBuffer(query, ", synchronous_commit = %s", 
fmtId(subinfo->subsynccommit));

Do we need to append anything if it's the default value ("any")? I saw that some
other parameters, they will be appended only if they are not the default value.

0003 patch
==
1. 
in create_subscription.sgml:
  (You cannot combine setting connect
  to false with
  setting create_slot, enabled,
  or copy_data to true.)

In this description about "connect" parameter in CREATE SUBSCIPTION document,
maybe it would be better to change "copy_data to true" to "copy_data to
true/force".

2.
+   appendStringInfoString(,
+  "SELECT DISTINCT N.nspname 
AS schemaname,\n"
+  "
C.relname AS tablename,\n"
+  "
PS.srrelid as replicated\n"
+  "FROM pg_publication P,\n"
+  " LATERAL 
pg_get_publication_tables(P.pubname) GPT\n"
+  " LEFT JOIN 
pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+  " pg_class C JOIN 
pg_namespace N ON (N.oid = C.relnamespace)\n"
+  "WHERE C.oid = GPT.relid AND 
P.pubname IN (");

"PS.srrelid as replicated" can be modified to "PS.srrelid AS replicated".

Besides, I think we can filter out the tables which are not subscribing data in
this SQL statement, then later processing can be simplified.

Something like:
SELECT DISTINCT N.nspname AS schemaname,
C.relname AS tablename
FROM pg_publication P,
 LATERAL pg_get_publication_tables(P.pubname) GPT
 LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
 pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.oid = GPT.relid AND P.pubname IN ('pa') AND PS.srrelid IS NOT NULL;

0004 patch
==
1. Generic steps for adding a new node to an existing set of nodes

+Step-2: Lock the required tables of the new node in EXCLUSIVE mode until
+the setup is complete. (This lock is necessary to prevent any modifications

+Step-4. Lock the required tables of the existing nodes except the first 
node
+in EXCLUSIVE mode until the setup is complete. (This lock is necessary to

Should "in EXCLUSIVE mode" be modified to "in EXCLUSIVE
mode"?

2. Generic steps for adding a new node to an existing set of nodes

+data. There is no need to lock the required tables on
+node1 because any data changes made will be synchronized
+while creating the subscription with copy_data = force).

I think it would be better to say "on the first node" here, instead of "node1",
because in this section, node1 is not mentioned before.

Regards,
Shi yu




RE: Replica Identity check of partition table on subscriber

2022-06-16 Thread shiy.f...@fujitsu.com
On Fri Jun 17, 2022 11:06 AM shiy.f...@fujitsu.com  
wrote:
> 
> Attached the new version of patch set. I also moved the partitioned table
> check
> in logicalrep_rel_mark_updatable() to check_relation_updatable() as
> discussed
> [2].
> 

Attached back-branch patches of the first patch.

Regards,
Shi yu


v10-pg13-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patch
Description:  v10-pg13-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patch


v10-pg14-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patch
Description:  v10-pg14-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patch


v10-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patch
Description:  v10-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patch


RE: Replica Identity check of partition table on subscriber

2022-06-16 Thread shiy.f...@fujitsu.com
On Thu, Jun 16, 2022 2:13 PM Amit Langote  wrote:
> 
> Hi,
> 
> On Thu, Jun 16, 2022 at 2:07 PM shiy.f...@fujitsu.com
>  wrote:
> > On Wed, Jun 15, 2022 8:30 PM Amit Kapila 
> wrote:
> > > I have pushed the first bug-fix patch today.
> >
> > Attached the remaining patches which are rebased.
> 
> Thanks.
> 
> Comments on v9-0001:

Thanks for your comments.

> 
> + * Don't throw any error here just mark the relation entry as not updatable,
> + * as replica identity is only for updates and deletes but inserts can be
> + * replicated even without it.
> 
> I know you're simply copying the old comment, but I think we can
> rewrite it to be slightly more useful:
> 
> We just mark the relation entry as not updatable here if the local
> replica identity is found to be insufficient and leave it to
> check_relation_updatable() to throw the actual error if needed.
> 

Modified as you suggested in another mail [1].

> +   /* Check that replica identity matches. */
> +   logicalrep_rel_mark_updatable(entry);
> 
> Maybe the comment (there are 2 instances) should say:
> 
> Set if the table's replica identity is enough to apply update/delete.
> 

Modified as suggested.

> Finally,
> 
> +# Alter REPLICA IDENTITY on subscriber.
> +# No REPLICA IDENTITY in the partitioned table on subscriber, but what we
> check
> +# is the partition, so it works fine.
> 
> For consistency with other recently added comments, I'd suggest the
> following wording:
> 
> Test that replication works correctly as long as the leaf partition
> has the necessary REPLICA IDENTITY, even though the actual target
> partitioned table does not.
> 

Modified as suggested.

> On v9-0002:
> 
> +   /* cleanup the invalid attrmap */
> 
> It seems that "invalid" here really means no-longer-useful, so we
> should use that phrase as a nearby comment does:
> 
> Release the no-longer-useful attrmap, if any.
> 

Modified as suggested.

Attached the new version of patch set. I also moved the partitioned table check
in logicalrep_rel_mark_updatable() to check_relation_updatable() as discussed
[2].

[1] 
https://www.postgresql.org/message-id/CA%2BHiwqG3Xi%3DwH4rBHm61ku-j0gm%2B-rc5VmDHxf%3DTeFkUsHtooA%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CA%2BHiwqHfN789ekiYVE%2B0xsLswMosMrWBwv4cPvYgWREWejw7HA%40mail.gmail.com

Regards,
Shi yu



v10-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patch
Description:  v10-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patch


v10-0002-Fix-memory-leak-about-attrmap.patch
Description: v10-0002-Fix-memory-leak-about-attrmap.patch


RE: Replica Identity check of partition table on subscriber

2022-06-15 Thread shiy.f...@fujitsu.com
On Wed, Jun 15, 2022 8:30 PM Amit Kapila  wrote:
> 
> I have pushed the first bug-fix patch today.
> 

Thanks.

Attached the remaining patches which are rebased.

Regards,
Shi yu



v9-0002-fix-memory-leak-about-attrmap.patch
Description: v9-0002-fix-memory-leak-about-attrmap.patch


v9-0001-Check-partition-table-replica-identity-on-subscri.patch
Description:  v9-0001-Check-partition-table-replica-identity-on-subscri.patch


RE: tablesync copy ignores publication actions

2022-06-15 Thread shiy.f...@fujitsu.com
On Tue, Jun 14, 2022 3:36 PM Peter Smith  wrote:
> 
> PSA v2 of the patch, based on all feedback received.
> 
> ~~~
> 
> Main differences from v1:
> 
> * Rewording and more explanatory text.
> 
> * The examples were moved to the "Subscription" [1] page and also
> extended to show some normal replication and row filter examples, from
> [Amit].
> 
> * Added some text to CREATE PUBLICATION 'publish' param [2], from
> [Euler][Amit].
> 
> * Added some text to CREATE SUBSCRIPTION Notes [3], from [Shi-san].
> 
> * Added some text to the "Publication page" [4] to say the 'publish'
> is only for DML operations.
> 
> * I changed the note in "Row Filter - Initial Data Synchronization"
> [5] to be a warning because I felt users could be surprised to see
> data exposed by the initial copy, which a DML operation would not
> expose.
> 

Thanks for updating the patch. Two comments:

1.
+ it means the copied table t3 contains all rows even 
when
+ they do not patch the row filter of publication pub3b.

Typo. I think "they do not patch the row filter" should be "they do not match
the row filter", right?

2.
@@ -500,7 +704,6 @@
   
  
 
-
   
 
   

It seems we should remove this change.

Regards,
Shi yu


  1   2   >