from:"shiy.f...@fujitsu.com"

RE: logical replication empty transactions

2022-03-01 Thread shiy.f...@fujitsu.com

Hi,

Here are some comments on the v21 patch.

1.
+   WalSndKeepalive(false, 0);

Maybe we can use InvalidXLogRecPtr here, instead of 0.

2.
+   pq_sendint64(&output_message, writePtr ? writePtr : sentPtr);

Similarly, should we use XLogRecPtrIsInvalid()?

3.
@@ -1183,6 +1269,20 @@ pgoutput_change(LogicalDecodingContext *ctx, 
ReorderBufferTXN *txn,
Assert(false);
}
 
+   if (in_streaming)
+   {
+   /* If streaming, send STREAM START if we haven't yet */
+   if (txndata && !txndata->sent_stream_start)
+   pgoutput_send_stream_start(ctx, txn);
+   }
+   else
+   {
+   /* If not streaming, send BEGIN if we haven't yet */
+   if (txndata && !txndata->sent_begin_txn)
+   pgoutput_send_begin(ctx, txn);
+   }
+
+
/* Avoid leaking memory by using and resetting our own context */
old = MemoryContextSwitchTo(data->context);


I am not sure if it is suitable to send begin or stream_start here, because the
row filter is not checked yet. That means, empty transactions caused by row
filter are not skipped.

4.
@@ -1617,9 +1829,21 @@ pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn)
 {
+   PGOutputTxnData *txndata = txn->output_plugin_private;
+   boolsent_begin_txn = txndata->sent_begin_txn;
+
Assert(rbtxn_is_streamed(txn));
 
-   OutputPluginUpdateProgress(ctx);
+   pfree(txndata);
+   txn->output_plugin_private = NULL;
+
+   if (!sent_begin_txn)
+   {
+   elog(DEBUG1, "Skipping replication of an empty transaction in 
stream prepare");
+   return;
+   }
+
+   OutputPluginUpdateProgress(ctx, false);
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_stream_prepare(ctx->out, txn, prepare_lsn);
OutputPluginWrite(ctx, true);

I notice that the patch skips stream prepared transaction, this would cause an
error on subscriber side when committing this transaction on publisher side, so
I think we'd better not do that.

For example:
(set logical_decoding_work_mem = 64kB, max_prepared_transactions = 10 in
postgresql.conf)

-- publisher
create table test (a int, b text, primary key(a));
create table test2 (a int, b text, primary key(a));
create publication pub for table test;

-- subscriber 
create table test (a int, b text, primary key(a));
create table test2 (a int, b text, primary key(a));
create subscription sub connection 'dbname=postgres port=5432' publication pub 
with(two_phase=on, streaming=on);

-- publisher
begin;
INSERT INTO test2 SELECT i, md5(i::text) FROM generate_series(1, 1000) s(i);
prepare transaction 't';
commit prepared 't';

The error message in subscriber log:
ERROR:  prepared transaction with identifier "pg_gid_16391_722" does not exist


Regards,
Shi yu

RE: Failed transaction statistics to measure the logical replication progress

2022-03-02 Thread shiy.f...@fujitsu.com

Hi,

A comments on the v26 patch.

The following document about pg_stat_subscription_stats view only says that
"showing statistics about errors", should we add something about transactions
here? 

 
  
pg_stat_subscription_statspg_stat_subscription_stats
  One row per subscription, showing statistics about errors.
  See 
  pg_stat_subscription_stats for details.
  
 


I noticed that the v24 patch has some changes about the description of this
view. Maybe we can modify to "showing statistics about errors and transactions".

Regards,
Shi yu

RE: row filtering for logical replication

2022-03-02 Thread shiy.f...@fujitsu.com

On Thu, Mar 3, 2022 10:40 AM Amit Kapila  wrote:
> 
> On Wed, Mar 2, 2022 at 5:42 PM Euler Taveira  wrote:
> >
> > On Wed, Mar 2, 2022, at 8:45 AM, Tomas Vondra wrote:
> >
> > While working on the column filtering patch, which touches about the
> > same places, I noticed two minor gaps in testing:
> >
> > 1) The regression tests do perform multiple ALTER PUBLICATION commands,
> > tweaking the row filter. But there are no checks the row filter was
> > actually modified / stored in the catalog. It might be just thrown away
> > and no one would notice.
> >
> > The test that row filter was modified is available in a previous section. 
> > The
> > one that you modified (0001) is testing the supported objects.
> >
> 
> Right. But if Tomas thinks it is good to add for these ones as well
> then I don't mind.
> 
> > 153 ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl3 WHERE (e > 1000
> AND e < 2000);
> > 154 \dRp+ testpub5
> > 155 ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl2;
> > 156 \dRp+ testpub5
> > 157 -- remove testpub_rf_tbl1 and add testpub_rf_tbl3 again (another WHERE
> expression)
> > 158 ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (e > 300
> AND e < 500);
> > 159 \dRp+ testpub5
> >
> > IIRC this test was written before adding the row filter information into the
> > psql. We could add \d+ testpub_rf_tbl3 before and after the modification.
> >
> 
> 
> Agreed. We can use \d instead of \d+ as row filter is available with \d.
> 
> > 2) There are no pg_dump tests.
> >
> > WFM.
> >
> 
> This is a miss. I feel we can add a few more.
> 

Agree that we can add some tests, attach the patch which fixes these two points.

Regards,
Shi yu 


0001-Add-some-tests-for-row-filter-in-logical-replication.patch
Description:  0001-Add-some-tests-for-row-filter-in-logical-replication.patch

RE: Optionally automatically disable logical replication subscriptions on error

2022-03-06 Thread shiy.f...@fujitsu.com

On Wed, Mar 2, 2022 5:39 PM osumi.takami...@fujitsu.com 
 wrote:
> 
> Attached an updated patch v26.
> 

Thanks for your patch. A comment on the document.

@@ -7771,6 +7771,16 @@ SCRAM-SHA-256$:&l

+   subdisableonerr bool
+  
+  
+   If true, the subscription will be disabled if one of its workers
+   detects an error
+  
+ 
+
+ 
+  
subconninfo text

The document for "subdisableonerr" option is placed after "The following
parameters control what happens during subscription creation: ". I think it
should be placed after "The following parameters control the subscription's
replication behavior after it has been created: ", right?

Regards,
Shi yu

RE: logical replication empty transactions

2022-03-07 Thread shiy.f...@fujitsu.com

On Fri, Mar 4, 2022 9:41 AM Ajin Cherian  wrote:
> 
> I have split the patch into two. I have kept the logic of skipping
> streaming changes in the second patch.
> I will work on the second patch once we can figure out a solution for
> the COMMIT PREPARED after restart problem.
> 

Thanks for updating the patch.

A comment on v23-0001 patch.

@@ -1429,6 +1520,19 @@ pgoutput_message(LogicalDecodingContext *ctx, 
ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;
 
+   /*
+* Output BEGIN if we haven't yet.
+* Avoid for non-transactional messages.
+*/
+   if (in_streaming || transactional)
+   {
+   PGOutputTxnData *txndata = (PGOutputTxnData *) 
txn->output_plugin_private;
+
+   /* Send BEGIN if we haven't yet */
+   if (txndata && !txndata->sent_begin_txn)
+   pgoutput_send_begin(ctx, txn);
+   }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_message(ctx->out,
 xid,

I think we don't need to send BEGIN if in_streaming is true, right? The first
patch doesn't skip streamed transaction, so should we modify
+   if (in_streaming || transactional)
to
+   if (!in_streaming && transactional)
?

Regards,
Shi yu

RE: Skipping logical replication transactions on subscriber side

2022-03-14 Thread shiy.f...@fujitsu.com

On Fri, Mar 11, 2022 4:20 PM Masahiko Sawada  wrote:
> 
> I've attached an updated version patch. This patch can be applied on
> top of the latest disable_on_error patch[1].
> 

Thanks for your patch. Here are some comments for the v13 patch.

1. doc/src/sgml/ref/alter_subscription.sgml
+  Specifies the transaction's finish LSN of the remote transaction 
whose changes

Could it be simplified to "Specifies the finish LSN of the remote transaction
whose ...".

2.
I met a failed assertion, the backtrace is attached. This is caused by the
following code in maybe_start_skipping_changes().

+   /*
+* It's a rare case; a past subskiplsn was left because the 
server
+* crashed after preparing the transaction and before clearing 
the
+* subskiplsn. We clear it without a warning message so as not 
confuse
+* the user.
+*/
+   if (unlikely(MySubscription->skiplsn < lsn))
+   {
+   clear_subscription_skip_lsn(MySubscription->skiplsn, 
InvalidXLogRecPtr, 0,
+   
false);
+   Assert(!IsTransactionState());
+   }

We want to clear subskiplsn in the case mentioned in comment. But if the next
transaction is a steaming transaction and this function is called by
apply_spooled_messages(), we are inside a transaction here. So, I think this
assertion is not suitable for streaming transaction. Thoughts?

3.
+   XLogRecPtr  subskiplsn; /* All changes which committed 
at this LSN are
+* skipped */

To be consistent, should the comment be changed to "All changes which finished
at this LSN are skipped"?

4.
+  After logical replication worker successfully skips the transaction or 
commits
+  non-empty transaction, the LSN (stored in
+  
pg_subscription.subskiplsn)
+  is cleared.

Besides "commits non-empty transaction", subskiplsn would also be cleared in
some two-phase commit cases I think. Like prepare/commit/rollback a transaction,
even if it is an empty transaction. So, should we change it for these cases?

5.
+ * Clear subskiplsn of pg_subscription catalog with origin state update.

Should "with origin state update" modified to "with origin state updated"?

Regards,
Shi yu
#0  0x7fe3d1db170f in raise () from /lib64/libc.so.6
#1  0x7fe3d1d9bb25 in abort () from /lib64/libc.so.6
#2  0x00e0a657 in ExceptionalCondition 
(conditionName=conditionName@entry=0xfe9265 "!IsTransactionState()", 
errorType=errorType@entry=0xeb6917 "FailedAssertion",
fileName=fileName@entry=0xfdc44c "worker.c", 
lineNumber=lineNumber@entry=3846) at assert.c:69
#3  0x00ae5a52 in maybe_start_skipping_changes (lsn=lsn@entry=22983032) 
at worker.c:3846
#4  0x00aedd35 in apply_spooled_messages (xid=xid@entry=722, 
lsn=22983032) at worker.c:1385
#5  0x00aee3b0 in apply_handle_stream_commit (s=s@entry=0x7ffd1354ee00) 
at worker.c:1486
#6  0x00aecf83 in apply_dispatch (s=s@entry=0x7ffd1354ee00) at 
worker.c:2520
#7  0x00aed41c in LogicalRepApplyLoop (last_received=22983080, 
last_received@entry=0) at worker.c:2751
#8  0x00aedb25 in start_apply (origin_startpos=0) at worker.c:3514
#9  0x00aef656 in ApplyWorkerMain (main_arg=) at 
worker.c:3770
#10 0x00a72881 in StartBackgroundWorker () at bgworker.c:858
#11 0x00a8e159 in do_start_bgworker (rw=rw@entry=0x27e99f0) at 
postmaster.c:5916
#12 0x00a8e30a in maybe_start_bgworkers () at postmaster.c:6141
#13 0x00a8f833 in sigusr1_handler (postgres_signal_arg=) 
at postmaster.c:5305
#14 
#15 0x7fe3d1e6d4ab in select () from /lib64/libc.so.6
#16 0x00a90d68 in ServerLoop () at postmaster.c:1765
#17 0x00a933c7 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x27bdaf0) at postmaster.c:1473
#18 0x009203c4 in main (argc=3, argv=0x27bdaf0) at main.c:202

RE: Column Filtering in Logical Replication

2022-03-14 Thread shiy.f...@fujitsu.com

On Mon, Mar 14, 2022 5:08 AM Tomas Vondra  wrote:
> 
> On 3/12/22 05:30, Amit Kapila wrote:
> >> ...
> >
> > Okay, please find attached. I have done basic testing of this, if we
> > agree with this approach then this will require some more testing.
> >
> 
> Thanks, the proposed changes seem like a clear improvement, so I've
> added them, with some minor tweaks (mostly to comments).
> 
> I've also included the memory context rename (entry_changes to the
> change proposed by Wang Wei, using a single SQL command in tablesync.
> 
> And I've renamed the per-entry memory context to entry_cxt, and used it
> for the column list.
> 

Thanks for your patch.
Here are some comments for column filter main patch (0003 patch).

1. doc/src/sgml/catalogs.sgml
@@ -6263,6 +6263,19 @@ SCRAM-SHA-256$:&l
Reference to schema

+
+ 
+  
+   prattrs int2vector
+   (references pg_attribute.attnum)
+  
+  
+   This is an array of values that indicates which table columns are
+   part of the publication.  For example, a value of 1 3
+   would mean that the first and the third table columns are published.
+   A null value indicates that all columns are published.
+  
+ 

This change was added to pg_publication_namespace view. I think it should be
added to pg_publication_rel view, right?

2. src/backend/replication/pgoutput/pgoutput.c
@@ -188,6 +202,7 @@ static EState *create_estate_for_relation(Relation rel);
 static void pgoutput_row_filter_init(PGOutputData *data,
 List 
*publications,

RelationSyncEntry *entry);
+
 static bool pgoutput_row_filter_exec_expr(ExprState *state,

  ExprContext *econtext);
 static bool pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,

Should we remove this change?

3. src/backend/commands/publicationcmds.c
+/*
+ * Check if all columns referenced in the column list are part of the
+ * REPLICA IDENTITY index or not.
+ *
+ * Returns true if any invalid column is found.
+ */

The comment for pub_collist_contains_invalid_column() seems wrong.  Should it be
"Check if all REPLICA IDENTITY columns are covered by the column list or not"?

4.
The patch doesn't allow delete and update operations if the target table uses
replica identity full and it is published with column list specified, even if
column list includes all columns in the table.

For example:
create table tbl (a int, b int, c int);
create publication pub for table tbl (a, b, c);
alter table tbl replica identity full;

postgres=# delete from tbl;
ERROR:  cannot delete from table "tbl"
DETAIL:  Column list used by the publication does not cover the replica 
identity.

Should we allow this case? I think it doesn't seem to cause harm.

5. 
Maybe we need some changes for tab-complete.c.

Regards,
Shi yu

RE: Skipping logical replication transactions on subscriber side

2022-03-16 Thread shiy.f...@fujitsu.com

On Wed, Mar 16, 2022 4:23 PM Masahiko Sawada  wrote:
> 
> I've attached an updated version patch.
> 

Thanks for updating the patch. Here are some comments for the v15 patch.

1. src/backend/replication/logical/worker.c

+ * to skip applying the changes when starting to apply changes.  The 
subskiplsn is
+ * cleared after successfully skipping the transaction or applying non-empty
+ * transaction. The latter prevents the mistakenly specified subskiplsn from

Should "applying non-empty transaction" be modified to "finishing a
transaction"? To be consistent with the description in the
alter_subscription.sgml.

2. src/test/subscription/t/029_on_error.pl

+# Test of logical replication subscription self-disabling feature.

Should we add something about "skip logical replication transactions" in this
comment?

Regards,
Shi yu

RE: Table refer leak in logical replication

2021-04-05 Thread shiy.f...@fujitsu.com

> BTW, it seems better to add a testcase for this ?

I think the test for it can be added in 
src/test/subscription/t/003_constraints.pl, which is like what in my patch.

Regards,
Shi yu


tests_for_table_refer_leak.diff
Description: tests_for_table_refer_leak.diff

RE: psql - add SHOW_ALL_RESULTS option

2021-04-07 Thread shiy.f...@fujitsu.com

Hi

I met a problem after commit 3a51306722.

While executing  a SQL statement with psql,  I can't interrupt it by pressing 
ctrl+c.

For example:
postgres=# insert into test select generate_series(1,1000);
^C^CINSERT 0 1000

Press ctrl+c before finishing INSERT, and psql still continuing to INSERT.

Is it the result expected? And I think maybe it is better to allow users to 
interrupt by pressing ctrl+c.

Regards,
Shi yu

RE: psql - add SHOW_ALL_RESULTS option

2021-04-07 Thread shiy.f...@fujitsu.com

> Attached a patch which attempts to fix this by moving the cancellation
> cancelling request after processing results.

Thank you for your fixing. I tested and the problem has been solved after 
applying your patch.

Regards,
Shi yu

RE: Could you help testing logical replication?

2021-04-12 Thread shiy.f...@fujitsu.com

> Then I get timeout error occurs and the subscriber worker keep re-launching
> over and over (you did not mention see such errors?)
I test again and get errors, too. I didn't check log after timeout in the 
previous test. 

Regards,
Tang

RE: Could you help testing logical replication?

2021-04-12 Thread shiy.f...@fujitsu.com

Sorry for sending a wrong mail.  Please ignore it.

> -Original Message-
> From: Shi, Yu/侍 雨
> Sent: Monday, April 12, 2021 6:51 PM
> To: Tang, Haiying/唐 海英 
> Cc: pgsql-hackers@lists.postgresql.org
> Subject: RE: Could you help testing logical replication?
> 
> > Then I get timeout error occurs and the subscriber worker keep
> > re-launching over and over (you did not mention see such errors?)
> I test again and get errors, too. I didn't check log after timeout in the 
> previous
> test.
> 
> Regards,
> Tang
>

撤回: Could you help testing logical replication?

2021-04-12 Thread shiy.f...@fujitsu.com

Shi, Yu/侍 雨 将撤回邮件“Could you help testing logical replication?”。

撤回: Could you help testing logical replication?

2021-04-12 Thread shiy.f...@fujitsu.com

Shi, Yu/侍 雨 将撤回邮件“Could you help testing logical replication?”。

RE: Data is copied twice when specifying both child and parent table in publication

2021-10-21 Thread shiy.f...@fujitsu.com

On Tuesday, October 19, 2021 10:47 AM houzj.f...@fujitsu.com 
 wrote:
> 
> On Monday, October 18, 2021 5:03 PM Amit Langote
>  wrote:
> > I can imagine that the behavior seen here may look surprising, but not
> > sure if I would call it a bug as such.  I do remember thinking about
> > this case and the current behavior is how I may have coded it to be.
> >
> > Looking at this command in Hou-san's email:
> >
> >   create publication pub for table tbl1, tbl1_part1 with
> > (publish_via_partition_root=on);
> >
> > It's adding both the root partitioned table and the leaf partition
> > *explicitly*, and it's not clear to me if the latter's inclusion in
> > the publication should be assumed because the former is found to have
> > been added to the publication, that is, as far as the latter's
> > visibility to the subscriber is concerned.  It's not a stretch to
> > imagine that a user may write the command this way to account for a
> > subscriber node on which tbl1 and tbl1_part1 are unrelated tables.
> >
> > I don't think we assume anything on the publisher side regarding the
> > state/configuration of tables on the subscriber side, at least with
> > publication commands where tables are added to a publication
> > explicitly, so it is up to the user to make sure that the tables are
> > not added duplicatively.  One may however argue that the way we've
> > decided to handle FOR ALL TABLES does assume something about
> > partitions where it skips advertising them to subscribers when
> > publish_via_partition_root flag is set to true, but that is exactly to
> > avoid the duplication of data that goes to a subscriber.
> 
> Hi,
> 
> Thanks for the explanation.
> 
> I think one reason that I consider this behavior a bug is that: If we add
> both the root partitioned table and the leaf partition explicitly to the
> publication (and set publish_via_partition_root = on), the behavior of the
> apply worker is inconsistent with the behavior of table sync worker.
> 
> In this case, all changes in the leaf the partition will be applied using the
> identity and schema of the partitioned(root) table. But for the table sync, it
> will execute table sync for both the leaf and the root table which cause
> duplication of data.
> 
> Wouldn't it be better to make the behavior consistent here ?
> 

I agree with this point. 

About this case,

> >   create publication pub for table tbl1, tbl1_part1 with
> > (publish_via_partition_root=on);

As a user, although partitioned table includes the partition, publishing 
partitioned
table and its partition is allowed. So, I think we should take this case into
consideration. Initial data is copied once via the parent table seems 
reasonable.

Regards
Shi yu

RE: "out of relcache_callback_list slots" after multiple calls to pg_logical_slot_get_binary_changes

2023-02-22 Thread shiy.f...@fujitsu.com

On Wed, Feb 22, 2023 2:20 PM Michael Paquier  wrote:
> 
> On Wed, Feb 22, 2023 at 12:07:06PM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 22 Feb 2023 12:29:59 +1100, Peter Smith 
> wrote in
> >> If you are going to do that, then won't just copying the
> >> CacheRegisterSyscacheCallback(PUBLICATIONOID...  into function
> >> init_rel_sync_cache() be effectively the same as doing that?
> >
> > I'm not sure if it has anything to do with the relation sync cache.
> > On the other hand, moving all the content of init_rel_sync_cache() up
> > to pgoutput_startup() doesn't seem like a good idea.. Another option,
> > as you see, was to separate callback registration code.
> 
> Both are kept separate in the code, so keeping this separation makes
> sense to me.
> 
> +   /* Register callbacks if we didn't do that. */
> +   if (!callback_registered)
> +   CacheRegisterSyscacheCallback(PUBLICATIONOID,
> + publication_invalidation_cb,
> + (Datum) 0);
> 
> /* Initialize relation schema cache. */
> init_rel_sync_cache(CacheMemoryContext);
> +   callback_registered = true;
> [...]
> +   /* Register callbacks if we didn't do that. */
> +   if (!callback_registered)
> 
> I am a bit confused by the use of one single flag called
> callback_registered to track both the publication callback and the
> relation callbacks.  Wouldn't it be cleaner to use two flags?  I don't
> think that we'll have soon a second code path calling
> init_rel_sync_cache(), but if we do then the callback load could again
> be messed up.
> 

Thanks for your reply. Using two flags makes sense to me.
Attach the updated patch.

Regards,
Shi Yu


v3-0001-Avoid-duplicate-registration-of-callbacks-in-pgou.patch
Description:  v3-0001-Avoid-duplicate-registration-of-callbacks-in-pgou.patch

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-23 Thread shiy.f...@fujitsu.com

On Wed, Feb 22, 2023 9:48 PM Kuroda, Hayato/黒田 隼人  
wrote:
> 
> Thank you for reviewing! PSA new version.
> 

Thanks for your patch. Here is a comment.

+   elog(DEBUG2, "time-delayed replication for txid %u, delay_time 
= %d ms, remaining wait time: %ld ms",
+ctx->write_xid, (int) ctx->min_send_delay,
+remaining_wait_time_ms);

I tried and saw that the xid here looks wrong, what it got is the xid of the
previous transaction. It seems `ctx->write_xid` has not been updated and we
can't use it.

Regards,
Shi Yu

RE: Allow logical replication to copy tables in binary format

2023-02-23 Thread shiy.f...@fujitsu.com

On Thu, Feb 23, 2023 12:40 PM Kuroda, Hayato/黒田 隼人  
wrote:
> 
> > > I'm not sure the combination of "copy_format = binary" and "copy_data =
> false"
> > > should be accepted or not. How do you think?
> >
> > It seems quite useless indeed to specify the format of a copy that won't
> happen.
> 
> I understood that the conbination of "copy_format = binary" and "copy_data =
> false"
> should be rejected in parse_subscription_options() and AlterSubscription(). 
> Is it
> right?
> I'm expecting that is done in next version.
> 

The copy_data option only takes effect once in CREATE SUBSCIPTION or ALTER
SUBSCIPTION REFRESH PUBLICATION command, but the copy_format option can take
affect multiple times if the subscription is refreshed multiple times. Even if
the subscription is created with copy_date=false, copy_format can take affect
when executing ALTER SUBSCIPTION REFRESH PUBLICATION. So, I am not sure we want
to reject this usage.

Besides, here are my comments on the v9 patch.
1.
src/bin/pg_dump/pg_dump.c
if (fout->remoteVersion >= 16)
-   appendPQExpBufferStr(query, " s.suborigin\n");
+   {
+   appendPQExpBufferStr(query, " s.suborigin,\n");
+   appendPQExpBufferStr(query, " s.subcopyformat\n");
+   }
else
-   appendPQExpBuffer(query, " '%s' AS suborigin\n", 
LOGICALREP_ORIGIN_ANY);
+   {
+   appendPQExpBuffer(query, " '%s' AS suborigin,\n", 
LOGICALREP_ORIGIN_ANY);
+   appendPQExpBuffer(query, " '%c' AS subcopyformat\n", 
LOGICALREP_COPY_AS_TEXT);
+   }

src/bin/psql/describe.c
if (pset.sversion >= 16)
+   {
appendPQExpBuffer(&buf,
  ", suborigin AS 
\"%s\"\n",
  
gettext_noop("Origin"));
+   /* Copy format is only supported in v16 and higher */
+   appendPQExpBuffer(&buf,
+ ", subcopyformat AS 
\"%s\"\n",
+ gettext_noop("Copy 
Format"));
+   }

I think we can call only once appendPQExpBuffer() for the two options which are 
supported in v16.
For example,
if (pset.sversion >= 16)
{
appendPQExpBuffer(&buf,
  ", suborigin AS 
\"%s\"\n"
  ", subcopyformat AS 
\"%s\"\n",
  
gettext_noop("Origin"),
  gettext_noop("Copy 
Format"));
}

2.
src/bin/psql/tab-complete.c
@@ -1926,7 +1926,7 @@ psql_completion(const char *text, int start, int end)
/* ALTER SUBSCRIPTION  SET ( */
else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && 
TailMatches("SET", "("))
COMPLETE_WITH("binary", "disable_on_error", "origin", 
"slot_name",
- "streaming", "synchronous_commit");
+ "streaming", "synchronous_commit", 
"copy_format");
/* ALTER SUBSCRIPTION  SKIP ( */
else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && 
TailMatches("SKIP", "("))
COMPLETE_WITH("lsn");
@@ -3269,7 +3269,8 @@ psql_completion(const char *text, int start, int end)
else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", 
"("))
COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
  "disable_on_error", "enabled", 
"origin", "slot_name",
- "streaming", "synchronous_commit", 
"two_phase");
+ "streaming", "synchronous_commit", 
"two_phase",
+ "copy_format");


The options should be listed in alphabetical order. See commit d547f7cf5ef.

Regards,
Shi Yu

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-02 Thread shiy.f...@fujitsu.com

On Wed, Mar 1, 2023 9:22 PM Önder Kalacı  wrote:
> 
> Hi Andres, Amit, Shi Yu, all
> 
> Andres Freund , 28 Şub 2023 Sal, 21:39 tarihinde 
> şunu yazdı:
> Hi,
> 
> On 2023-02-25 16:00:05 +0530, Amit Kapila wrote:
> > On Tue, Feb 21, 2023 at 7:55 PM Önder Kalacı  
> > wrote:
> > >> I think this overhead seems to be mostly due to the need to perform
> > >> tuples_equal multiple times for duplicate values.
> 
> I think more work needs to be done to determine the source of the
> overhead. It's not clear to me why there'd be an increase in tuples_equal()
> calls in the tests upthread.
> 
> You are right, looking closely, in fact, we most of the time do much less 
> tuples_equal() with index scan.
> 
> I've done some profiling with perf, and created flame graphs for the apply 
> worker, with the
> test described above: -- case 1 (All values are duplicated). I used the 
> following commands:
> - perf record -F 99 -p 122555 -g -- sleep 60
> -  perf script | ./http://stackcollapse-perf.pl > out.perf-folded
> -  ./http://flamegraph.pl out.perf-folded > perf_[index|seq]_scan.svg
> 
> I attached both flame graphs. I do not see anything specific regarding what 
> the patch does, but
> instead the difference mostly seems to come down to index scan vs sequential 
> scan related
> functions. As I continue to investigate, I thought it might be useful to 
> share the flame graphs
> so that more experienced hackers could comment on the difference.   
> 
> Regarding my own end-to-end tests: In some runs, the sequential scan is 
> indeed faster for case-1. But, 
> when I execute update tbl set a=a+1; for 50 consecutive times, and measure 
> end to end performance, I see
> much better results for index scan, only case-1 is on-par as mostly I'd 
> expect.
> 
> Case-1, running the update 50 times and waiting all changes applied
> • index scan: 2minutes 36 seconds
> • sequential scan: 2minutes 30 seconds
> Case-2, running the update 50 times and waiting all changes applied
> • index scan: 1 minutes, 2 seconds
> • sequential scan: 2minutes 30 seconds
> Case-7, running the update 50 times and waiting all changes applied
> • index scan: 6 seconds
> • sequential scan: 2minutes 26seconds
> 
> 
> > # Result
> The time executing update (the average of 3 runs is taken, the unit is
> milliseconds):
> 
> Shi Yu, could it be possible for you to re-run the tests with some more runs, 
> and share the average?
> I suspect maybe your test results have a very small pool size, and some runs 
> are making
> the average slightly problematic.
>  
> In my tests, I shared the total time, which is probably also fine.
>

Thanks for your reply, I re-tested (based on
v25_0001_use_index_on_subs_when_pub_rep_ident_full.patch) and took the average
of 100 runs. The results are as follows. The unit is milliseconds.

case1
sequential scan: 1348.57
index scan: 3785.15

case2
sequential scan: 1350.26
index scan: 1754.01

case3
sequential scan: 1350.13
index scan: 1340.97

There was still some degradation in the first two cases. There are some gaps in
our test results. Some information about my test is as follows.

a. Some parameters specified in postgresql.conf.
shared_buffers = 8GB
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off

b. Executed SQL.
I executed TRUNCATE and INSERT before each UPDATE. I am not sure if you did the
same, or just executed 50 consecutive UPDATEs. If the latter one, there would be
lots of old tuples and this might have a bigger impact on sequential scan. I
tried this case (which executes 50 consecutive UPDATEs) and also saw that the
overhead is smaller than before.


Besides, I looked into the regression of this patch with `gprof`. Some results
are as follows. I think with single buffer lock, sequential scan can scan
multiple tuples (see heapgettup()), while index scan can only scan one tuple. So
in case1, which has lots of duplicate values and more tuples need to be scanned,
index scan takes longer time.

- results of `gprof`
case1:
master
  %   cumulative   self  self total   
 time   seconds   secondscalls  ms/call  ms/call  name
  1.37  0.66 0.01   654312 0.00 0.00  LWLockAttemptLock
  0.00  0.73 0.00   573358 0.00 0.00  LockBuffer
  0.00  0.73 0.0010014 0.00 0.06  heap_getnextslot

patched
  %   cumulative   self  self total   
 time   seconds   secondscalls  ms/call  ms/call  name
  9.70  1.27 0.36 50531459 0.00 0.00  LWLockAttemptLock
  3.23  2.42 0.12 100259200 0.00 0.00  LockBuffer
  6.20  1.50 0.23 50015101 0.00 0.00  heapam_index_fetch_tuple
  4.04  2.02 0.15 50015101 0.00 0.00  index_fetch_heap
  1.35  3.21 0.0510119 0.00 0.00  index_getnext_slot

case7:
master
  %   cumulative   self  self total   
 time   seconds

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-06 Thread shiy.f...@fujitsu.com

On Monay, Mar 6, 2023 7:19 PM Önder Kalacı  wrote:
> 
> Yeah, seems easier to follow to me as well. Reflected it in the comment as 
> well.  
> 

Thanks for updating the patch. Here are some comments on v33-0001 patch.

1.
+   if (RelationReplicaIdentityFullIndexScanEnabled(localrel) &&
+   remoterel->replident == REPLICA_IDENTITY_FULL)

RelationReplicaIdentityFullIndexScanEnabled() is introduced in 0002 patch, so
the call to it should be moved to 0002 patch I think.

2.
+#include "optimizer/cost.h"

Do we need this in the latest patch? I tried and it looks it could be removed
from src/backend/replication/logical/relation.c.

3.
+# now, create a unique index and set the replica
+$node_publisher->safe_psql('postgres',
+   "CREATE UNIQUE INDEX test_replica_id_full_unique ON 
test_replica_id_full(x);");
+$node_subscriber->safe_psql('postgres',
+   "CREATE UNIQUE INDEX test_replica_id_full_unique ON 
test_replica_id_full(x);");
+

Should the comment be "now, create a unique index and set the replica identity"?

4.
+$node_publisher->safe_psql('postgres',
+   "ALTER TABLE test_replica_id_full REPLICA IDENTITY USING INDEX 
test_replica_id_full_unique;");
+$node_subscriber->safe_psql('postgres',
+   "ALTER TABLE test_replica_id_full REPLICA IDENTITY USING INDEX 
test_replica_id_full_unique;");
+
+# wait for the synchronization to finish
+$node_subscriber->wait_for_subscription_sync;

There's no new tables to need to be synchronized here, should we remove the call
to wait_for_subscription_sync?

Regards,
Shi Yu

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-07 Thread shiy.f...@fujitsu.com

On Tue, Mar 7, 2023 9:47 PM Önder Kalacı  wrote:
> 
> I'm attaching v35. 
> 

I noticed that if the index column only exists on the subscriber side, this 
index
can also be chosen. This seems a bit odd because the index column isn't sent
from publisher.

e.g.
-- pub
CREATE TABLE test_replica_id_full (x int, y int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
-- sub
CREATE TABLE test_replica_id_full (x int, y int, z int);
CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(z);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' 
PUBLICATION tap_pub_rep_full;

I didn't see in any cases the behavior changed after applying the patch, which
looks good. Besides, I tested the performance for such case.

Steps:
1. create tables, index, publication, and subscription
-- pub
create table tbl (a int);
alter table tbl replica identity full;
create publication pub for table tbl;
-- sub
create table tbl (a int, b int);
create index idx_b on tbl(b);
create subscription sub connection 'dbname=postgres port=5432' publication pub;
2. setup synchronous replication
3. execute SQL:
truncate tbl;
insert into tbl select i from generate_series(0,1)i;
update tbl set a=a+1;

The time of UPDATE (take the average of 10 runs):
master: 1356.06 ms
patched: 3968.14 ms

For the cases that all values of extra columns on the subscriber are NULL, index
scan can't do better than sequential scan. This is not a real scenario and I
think it only degrades when there are many NULL values in the index column, so
this is probably not a case to worry about. I just share this case and then we
can discuss should we pick the index which only contain the extra columns on the
subscriber.

Regards,
Shi Yu

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-12 Thread shiy.f...@fujitsu.com

On Fri, Mar 10, 2023 8:17 PM Amit Kapila  wrote:
> 
> On Fri, Mar 10, 2023 at 5:16 PM Önder Kalacı  wrote:
> >
> >>
> >> wip_for_optimize_index_column_match
> >> +static bool
> >> +IndexContainsAnyRemoteColumn(IndexInfo  *indexInfo,
> >> + LogicalRepRelation  *remoterel)
> >> +{
> >> + for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
> >> + {
> >>
> >> Wouldn't it be better to just check if the first column is not part of
> >> the remote column then we can skip that index?
> >
> >
> > Reading [1], I think I can follow what you suggest. So, basically,
> > if the leftmost column is not filtered, we have the following:
> >
> >>  but the entire index would have to be scanned, so in most cases the 
> >> planner
> would prefer a sequential table scan over using the index.
> >
> >
> > So, in our case, we could follow a similar approach. If the leftmost column 
> > of
> the index
> > is not sent over the wire from the pub, we can prefer the sequential scan.
> >
> > Is my understanding of your suggestion accurate?
> >
> 
> Yes. I request an opinion from Shi-San who has reported the problem.
> 

I also agree with this.
And I think we can mention this in the comments if we do so.

Regards,
Shi Yu

RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-13 Thread shiy.f...@fujitsu.com

On Sun, Mar 12, 2023 4:00 AM Önder Kalacı  wrote:
> 
> Attaching a patch that could possibly solve the problem. 
> 

Thanks for your patch. I tried it and it worked well.
Here are some minor comments.

1.
@@ -243,6 +243,17 @@ tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2,
Form_pg_attribute att;
TypeCacheEntry *typentry;
 
+
+   Form_pg_attribute attr = 
TupleDescAttr(slot1->tts_tupleDescriptor, attrnum);
+

I think we can use "att" instead of a new variable. They have the same value.

2. 
+# The bug was that when when the REPLICA IDENTITY FULL is used with dropped

There is an extra "when".

Regards,
Shi Yu

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-13 Thread shiy.f...@fujitsu.com

On Mon, Mar 13, 2023 10:16 PM Önder Kalacı  wrote:
> 
> Attaching v47.
> 

Thanks for updating the patch. Here are some comments.

1.
in RemoteRelContainsLeftMostColumnOnIdx():

+   if (indexInfo->ii_NumIndexAttrs < 1)
+   return false;

Did you see any cases that the condition is true? I think there is at least one
column in the index. If so, we can use an Assert().

+   if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol))
+   return false;

Similarly, I think `attrmap->maplen` is the number of columns and it is always
greater than keycol. If you agree, we can check it with an Assert(). Besides, It
seems we don't need AttrNumberGetAttrOffset().

2.
+# make sure that the subscriber has the correct data after the update UPDATE

"update UPDATE" seems to be a typo.

3.
+# now, drop the index with the expression, and re-create index on column 
lastname

The comment says "re-create index on column lastname" but it seems we didn't do
that, should it be modified to something like: 
# now, drop the index with the expression, we will use sequential scan

Besides these, the patch LGTM.

Regards,
Shi Yu

RE: Allow logical replication to copy tables in binary format

2023-03-15 Thread shiy.f...@fujitsu.com

On Thu, Mar 16, 2023 2:26 AM Melih Mutlu  wrote:
> 
> Right, it needs to be ordered. Fixed.
> 

Hi,

Thanks for updating the patch. I tested some cases like toast data, combination
of row filter and column lists, and it works well.

Here is a comment:

+# Ensure the COPY command is executed in binary format on the publisher
+$node_publisher->wait_for_log(qr/LOG: ( [a-z0-9]+:)? COPY (.+)? TO STDOUT WITH 
\(FORMAT binary\)/);

The test failed with `log_error_verbosity = verbose` because it couldn't match
the following log:
2023-03-16 09:45:50.096 CST [2499415] pg_16398_sync_16391_7210954376230900539 
LOG:  0: statement: COPY public.test_arrays (a, b, c) TO STDOUT WITH 
(FORMAT binary)

I think we should make it pass, see commit 19408aae7f.
Should it be changed to:

$node_publisher->wait_for_log(qr/LOG: ( [A-Z0-9]+:)? statement: COPY (.+)? TO 
STDOUT WITH \(FORMAT binary\)/);

Besides, for the same reason, this line also needs to be modified.
+$node_publisher->wait_for_log(qr/LOG: ( [a-z0-9]+:)? COPY (.+)? TO STDOUT\n/);

Regards,
Shi Yu

RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-16 Thread shiy.f...@fujitsu.com

On Thu, Mar 16, 2023 5:23 PM Amit Kapila  wrote:
> 
> On Mon, Mar 13, 2023 at 6:26 PM Önder Kalacı 
> wrote:
> >
> > Attaching v2
> >
> 
> Can we change the comment to: "Ignore dropped and generated columns as
> the publisher doesn't send those."? After your change, att =
> TupleDescAttr(slot1->tts_tupleDescriptor, attrnum); is done twice in
> the same function.
> 
> In test cases, let's change the comment to: "The bug was that when the
> REPLICA IDENTITY FULL is used with dropped or generated columns, we
> fail to apply updates and deletes.". Also, I think we don't need to
> provide the email link as anyway commit message will have a link to
> the discussion.
> 
> Did you check this in the back branches?
> 

I tried to reproduce this bug in backbranch.

Generated column is introduced in PG12, and I reproduced generated column 
problem
in PG12~PG15.

For dropped column problem, I reproduced it in PG10~PG15. (Logical replication
was introduced in PG10)

So I think we should backpatch the fix for generated column to PG12, and
backpatch the fix for dropped column to PG10.

Regards,
Shi Yu

RE: Allow logical replication to copy tables in binary format

2023-03-16 Thread shiy.f...@fujitsu.com

On Thu, Mar 16, 2023 9:20 PM Melih Mutlu   wrote:
> 
> Hi,
> 
> Please see the attached v16.
> 

Thanks for updating the patch.

+# Cannot sync due to type mismatch
+$node_subscriber->wait_for_log(qr/ERROR: ( [A-Z0-9]+:)? incorrect binary data 
format/);

+# Ensure the COPY command is executed in text format on the publisher
+$node_publisher->wait_for_log(qr/LOG: ( [A-Z0-9]+:)? statement: COPY (.+)? TO 
STDOUT\n/);

It looks that you forgot to pass `offset` into wait_for_log().

Regards,
Shi Yu

RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-17 Thread shiy.f...@fujitsu.com

On Friday, March 17, 2023 3:38 PM Önder Kalacı  wrote:
> 
> Hi Amit, all
>  
> You can first submit the fix for dropped columns with patches till
> v10. Once that is committed, then you can send the patches for
> generated columns.
> 
> Alright, attaching 2 patches for dropped columns, the names of the files 
> shows which 
> versions the patch can be applied to:
> v2-0001-Ignore-dropped-columns-HEAD-REL_15-REL_14-REL_13.patch
> v2-0001-Ignore-dropped-columns-REL_12-REL_11.patch 
> 
> And, then on top of that, you can apply the patch for generated columns on 
> all applicable
> versions (HEAD, 15, 14, 13 and 12). It applies cleanly.  The name of the file
> is: v2-0001-Ignore-generated-columns.patch
> 
> 
> But unfortunately I couldn't test the patch with PG 12 and below. I'm getting 
> some
> unrelated compile errors and Postgrees CI is not available on these versions 
> . I'll try
> to fix that, but I thought it would still be good to share the patches as you 
> might
> already have the environment to run the tests. 
> 

Thanks for updating the patch.

I couldn't apply v2-0001-Ignore-dropped-columns-HEAD-REL_15-REL_14-REL_13.patch
cleanly in v13 and v14. It looks the patch needs some changes in these versions.

```
Checking patch src/backend/executor/execReplication.c...
Hunk #1 succeeded at 243 (offset -46 lines).
Hunk #2 succeeded at 263 (offset -46 lines).
Checking patch src/test/subscription/t/100_bugs.pl...
error: while searching for:
$node_publisher->stop('fast');
$node_subscriber->stop('fast');

done_testing();

error: patch failed: src/test/subscription/t/100_bugs.pl:373
Applied patch src/backend/executor/execReplication.c cleanly.
Applying patch src/test/subscription/t/100_bugs.pl with 1 reject...
Rejected hunk #1.
```

Besides, I tried v2-0001-Ignore-dropped-columns-REL_12-REL_11.patch in v12. The
test failed and here's some information.

```
Can't locate object method "new" via package "PostgreSQL::Test::Cluster" 
(perhaps you forgot to load "PostgreSQL::Test::Cluster"?) at t/100_bugs.pl line 
74.
# Looks like your test exited with 2 just after 1.
```

+my $node_publisher_d_cols = 
PostgreSQL::Test::Cluster->new('node_publisher_d_cols');

It seems this usage is not supported in v12 and we should use get_new_node()
like other test cases.

Regards,
Shi Yu

RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-20 Thread shiy.f...@fujitsu.com

On Fri, Mar 17, 2023 11:29 PM Önder Kalacı  wrote:
> 
> Thanks for sharing. Fixed
> 
> 
> This time I was able to run all the tests with all the patches applied.
> 
> Again, the generated column fix also has some minor differences
> per version. So, overall we have 6 patches with very minor 
> differences :) 

Thanks for updating the patches. It seems you forgot to attach the patches of
dropped columns for HEAD and pg15, I think they are the same as v2.

On HEAD, we can re-use clusters in other test cases, which can save some time.
(see fccaf259f22f4a)

In the patches for pg12 and pg11, I am not sure why not add the test at end of
the file 100_bugs.pl. I think it would be better to be consistent with other
versions.

The attached patches modify these two points. Besides, I made some minor
changes, ran pgindent and pgperltidy. These are patches for dropped columns,
because I think this would be submitted first, and we can discuss the fix for
generated columns later.

Regards,
Shi Yu

v4-pg12-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch
Description:  v4-pg12-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch

v4-pg13-pg14-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch
Description:  v4-pg13-pg14-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch

v4-pg15-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch
Description:  v4-pg15-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch

v4-HEAD-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FULL.patch
Description:  v4-HEAD-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FULL.patch

v4-pg11-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch
Description:  v4-pg11-0001-Ignore-dropped-columns-when-REPLICA-IDENTITY-FUL.patch

RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-21 Thread shiy.f...@fujitsu.com

On Tue, Mar 21, 2023 4:51 PM Önder Kalacı  wrote:
> 
> Hi Amit, Shi Yu
> 
> Now attaching the similar patches for generated columns.
> 

Thanks for your patches. Here are some comments.

1.
 $node_publisher->safe_psql(
'postgres', qq(
ALTER TABLE dropped_cols DROP COLUMN b_drop;
+   ALTER TABLE generated_cols DROP COLUMN b_gen;
 ));
 $node_subscriber->safe_psql(
'postgres', qq(
ALTER TABLE dropped_cols DROP COLUMN b_drop;
+   ALTER TABLE generated_cols DROP COLUMN b_gen;
 ));

I think we want to test generated columns, so we don't need to drop columns.
Otherwise the generated column problem can't be detected.

2. 
# The bug was that when the REPLICA IDENTITY FULL is used with dropped columns,
# we fail to apply updates and deletes

Maybe we should mention generated columns in comment of the test.

3.
I ran pgindent and it modified some lines. Maybe we can improve the patch
as the following.

@@ -292,8 +292,8 @@ tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2,
att = TupleDescAttr(slot1->tts_tupleDescriptor, attrnum);

/*
-* Ignore dropped and generated columns as the publisher
-* doesn't send those
+* Ignore dropped and generated columns as the publisher 
doesn't send
+* those
 */
if (att->attisdropped || att->attgenerated)
continue;

Regards,
Shi Yu

RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-21 Thread shiy.f...@fujitsu.com

On Tuesday, March 21, 2023 8:03 PM Önder Kalacı  wrote:
> 
> Attached patches again.
> 

Thanks for updating the patch.

@@ -408,15 +412,18 @@ $node_subscriber->wait_for_subscription_sync;
 $node_publisher->safe_psql(
'postgres', qq(
ALTER TABLE dropped_cols DROP COLUMN b_drop;
+   ALTER TABLE generated_cols DROP COLUMN c_drop;
 ));
 $node_subscriber->safe_psql(
'postgres', qq(
ALTER TABLE dropped_cols DROP COLUMN b_drop;
+   ALTER TABLE generated_cols DROP COLUMN c_drop;
 ));

Is there any reasons why we drop column here? Dropped column case has been
tested on table dropped_cols. The generated column problem can be detected
without dropping columns on my machine.

Regards,
Shi Yu

RE: Allow logical replication to copy tables in binary format

2023-03-21 Thread shiy.f...@fujitsu.com

On Wed Mar 22, 2023 7:29 AM Peter Smith  wrote:
> 
> Thanks for all the patch updates. Patch v19 LGTM.
> 

+1

Regards,
Shi Yu

RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-22 Thread shiy.f...@fujitsu.com

On Wed, Mar 22, 2023 2:53 PM Önder Kalacı  wrote:
> 
> We don't really need to, if you check the first patch, we don't have DROP for 
> generated case. I mostly
> wanted to make the test a little more interesting, but it also seems to be a 
> little confusing.
> 
> Now attaching v2 where we do not drop the columns. I don't have strong 
> preference on
> which patch to proceed with, mostly wanted to attach this version to progress 
> faster (in case
> you/Amit considers this one better).
> 

Thanks for updating the patches.
The v2 patch LGTM.

Regards,
Shi Yu

RE: Force streaming every change in logical decoding

2022-12-14 Thread shiy.f...@fujitsu.com

On Sat, Dec 10, 2022 2:03 PM Dilip Kumar  wrote:
> 
> On Tue, Dec 6, 2022 at 11:53 AM shiy.f...@fujitsu.com
>  wrote:
> >
> > Hi hackers,
> >
> > In logical decoding, when logical_decoding_work_mem is exceeded, the
> changes are
> > sent to output plugin in streaming mode. But there is a restriction that the
> > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC
> to
> > allow sending every change to output plugin without waiting until
> > logical_decoding_work_mem is exceeded.
> >
> > This helps to test streaming mode. For example, to test "Avoid streaming the
> > transaction which are skipped" [1], it needs many
> XLOG_XACT_INVALIDATIONS
> > messages. With the new option, it can be tested with fewer changes and in
> less
> > time. Also, this new option helps to test more scenarios for "Perform
> streaming
> > logical transactions by background workers" [2].
> 
> Some comments on the patch
> 

Thanks for your comments.

> 1. Can you add one test case using this option
> 

I added a simple test to confirm the new option works.

> 2. +  xreflabel="force_stream_mode">
> +  force_stream_mode
> (boolean)
> +  
> +   force_stream_mode configuration
> parameter
> +  
> +  
> 
> This GUC name "force_stream_mode" somehow appears like we are forcing
> this streaming mode irrespective of whether the
> subscriber has requested for this mode or not.  But actually it is not
> that, it is just streaming each change if
> it is enabled.  So we might need to think on the name (at least we
> should avoid using *mode* in the name IMHO).
> 

I think a similar GUC is force_parallel_mode, and if the query is parallel
unsafe or max_worker_processes is exceeded, force_parallel_mode will not work.
This is similar to what we do in this patch. So, maybe it's ok to use "mode". I
didn't change it in the new version of patch. What do you think?

> 3.
> -while (rb->size >= logical_decoding_work_mem * 1024L)
> +while ((!force_stream && rb->size >= logical_decoding_work_mem *
> 1024L) ||
> +   (force_stream && rb->size > 0))
>  {
> 
> It seems like if force_stream is on then indirectly it is enabling
> force serialization as well.  Because once we enter into the loop
> based on "force_stream" then it will either stream or serialize but I
> guess we do not want to force serialize based on this parameter.
> 

Agreed, I refactored the code and modified this point.

Please see the attached patch. I also fix Peter's comments[1]. The GUC name and
design are still under discussion, so I didn't modify them.

By the way, I noticed that the comment for ReorderBufferCheckMemoryLimit() on
HEAD missed something. I fix it in this patch, too.

[1] 
https://www.postgresql.org/message-id/CAHut%2BPtOjZ_e-KLf26i1XLH2ffPEZGOmGSKy0wDjwyB_uvzxBQ%40mail.gmail.com

Regards,
Shi yu


v2-0001-Allow-streaming-every-change-without-waiting-till.patch
Description:  v2-0001-Allow-streaming-every-change-without-waiting-till.patch

RE: Force streaming every change in logical decoding

2022-12-21 Thread shiy.f...@fujitsu.com

On Wed, Dec 21, 2022 4:54 PM Amit Kapila  wrote:
> 
> On Wed, Dec 21, 2022 at 1:55 PM Peter Smith 
> wrote:
> >
> > On Wed, Dec 21, 2022 at 6:22 PM Masahiko Sawada
>  wrote:
> > >
> > > On Tue, Dec 20, 2022 at 7:49 PM Amit Kapila 
> wrote:
> > > >
> > > > On Tue, Dec 20, 2022 at 2:46 PM Hayato Kuroda (Fujitsu)
> > > >  wrote:
> > > > >
> > > > > Dear hackers,
> > > > >
> > > > > > We have discussed three different ways to provide GUC for these
> > > > > > features. (1) Have separate GUCs like force_server_stream_mode,
> > > > > > force_server_serialize_mode, force_client_serialize_mode (we can
> use
> > > > > > different names for these) for each of these; (2) Have two sets of
> > > > > > GUCs for server and client. We can have logical_decoding_mode with
> > > > > > values as 'stream' and 'serialize' for the server and then
> > > > > > logical_apply_serialize = true/false for the client. (3) Have one 
> > > > > > GUC
> > > > > > like logical_replication_mode with values as 'server_stream',
> > > > > > 'server_serialize', 'client_serialize'.
> > > > >
> > > > > I also agreed for adding new GUC parameters (and I have already done
> partially
> > > > > in parallel apply[1]), and basically options 2 made sense for me. But 
> > > > > is
> it OK
> > > > > that we can choose "serialize" mode even if subscribers require
> streaming?
> > > > >
> > > > > Currently the reorder buffer transactions are serialized on publisher
> only when
> > > > > the there are no streamable transaction. So what happen if the
> > > > > logical_decoding_mode = "serialize" but streaming option streaming is
> on? If we
> > > > > break the first one and serialize changes on publisher anyway, it may
> be not
> > > > > suitable for testing the normal operation.
> > > > >
> > > >
> > > > I think the change will be streamed as soon as the next change is
> > > > processed even if we serialize based on this option. See
> > > > ReorderBufferProcessPartialChange. However, I see your point that
> when
> > > > the streaming option is given, the value 'serialize' for this GUC may
> > > > not make much sense.
> > > >
> > > > > Therefore, I came up with the variant of (2): logical_decoding_mode
> can be
> > > > > "normal" or "immediate".
> > > > >
> > > > > "normal" is a default value, which is same as current HEAD. Changes
> are streamed
> > > > > or serialized when the buffered size exceeds
> logical_decoding_work_mem.
> > > > >
> > > > > When users set to "immediate", the walsenders starts to stream or
> serialize all
> > > > > changes. The choice is depends on the subscription option.
> > > > >
> > > >
> > > > The other possibility to achieve what you are saying is that we allow
> > > > a minimum value of logical_decoding_work_mem as 0 which would
> mean
> > > > stream or serialize each change depending on whether the streaming
> > > > option is enabled. I think we normally don't allow a minimum value
> > > > below a certain threshold for other *_work_mem parameters (like
> > > > maintenance_work_mem, work_mem), so we have followed the same
> here.
> > > > And, I think it makes sense from the user's perspective because below
> > > > a certain threshold it will just add overhead by either writing small
> > > > changes to the disk or by sending those over the network. However, it
> > > > can be quite useful for testing/debugging. So, not sure, if we should
> > > > restrict setting logical_decoding_work_mem below a certain threshold.
> > > > What do you think?
> > >
> > > I agree with (2), having separate GUCs for publisher side and
> > > subscriber side. Also, on the publisher side, Amit's idea, controlling
> > > the logical decoding behavior by changing logical_decoding_work_mem,
> > > seems like a good idea.
> > >
> > > But I'm not sure it's a good idea if we lower the minimum value of
> > > logical_decoding_work_mem to 0. I agree it's helpful for testing and
> > > debugging but setting logical_decoding_work_mem = 0 doesn't benefit
> > > users at all, rather brings risks.
> > >
> > > I prefer the idea Kuroda-san previously proposed; setting
> > > logical_decoding_mode = 'immediate' means setting
> > > logical_decoding_work_mem = 0. We might not need to have it as an
> enum
> > > parameter since it has only two values, though.
> >
> > Did you mean one GUC (logical_decoding_mode) will cause a side-effect
> > implicit value change on another GUC value
> > (logical_decoding_work_mem)?
> >
> 
> I don't think that is required. The only value that can be allowed for
> logical_decoding_mode will be "immediate", something like we do for
> recovery_target. The default will be "". The "immediate" value will
> mean that stream each change if the "streaming" option is enabled
> ("on" of "parallel") or if "streaming" is not enabled then that would
> mean serializing each change.
> 

I agreed and updated the patch as Amit suggested.
Please see the attached patch.

Regards,
Shi yu


v3-0001-Allow-streaming-or-serializing-each-change-in-log.patch
Description:  v3-0

RE: Force streaming every change in logical decoding

2022-12-21 Thread shiy.f...@fujitsu.com

On Wed, Dec 21, 2022 4:05 PM Peter Smith  wrote:
> 
> Here are some review comments for patch v2.
> 
> Since the GUC is still under design maybe these comments can be
> ignored for now, but I guess similar comments will apply in future
> anyhow (just with some name changes).
> 

Thanks for your comments.

> ==
> 
> 3. More docs.
> 
> It might be helpful if this developer option is referenced also on the
> page "31.10.1 Logical Replication > Configuration Settings >
> Publishers" [1]
> 

The new added GUC is developer option, and it seems that most developer options
are not referenced on other pages. So, I am not sure if we need to add this to
other docs.

Other comments are fixed [1]. (Some of them are ignored because of the new
design.)

[1] 
https://www.postgresql.org/message-id/OSZPR01MB6310AAE12BC281158880380DFDEB9%40OSZPR01MB6310.jpnprd01.prod.outlook.com

Regards,
Shi yu

RE: Force streaming every change in logical decoding

2022-12-22 Thread shiy.f...@fujitsu.com

On Thu, Dec 22, 2022 5:24 PM Amit Kapila  wrote:
> 
> On Thu, Dec 22, 2022 at 1:15 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Thu, 22 Dec 2022 12:35:46 +0530, Amit Kapila
>  wrote in
> > > I have addressed these comments in the attached. Additionally, I have
> > > modified the docs and commit messages to make those clear. I think
> > > instead of adding new tests with this patch, it may be better to
> > > change some of the existing tests related to streaming to use this
> > > parameter as that will clearly show one of the purposes of this patch.
> >
> > Being late but I'm warried that we might sometime regret that the lack
> > of the explicit default. Don't we really need it?
> >
> 
> For this, I like your proposal for "buffered" as an explicit default value.
> 
> > +Allows streaming or serializing changes immediately in logical
> decoding.
> > +The allowed values of logical_decoding_mode
> are the
> > +empty string and immediate. When set to
> > +immediate, stream each change if
> > +streaming option is enabled, otherwise, 
> > serialize
> > +each change.  When set to an empty string, which is the default,
> > +decoding will stream or serialize changes when
> > +logical_decoding_work_mem is reached.
> >
> > With (really) fresh eyes, I took a bit long time to understand what
> > the "streaming" option is. Couldn't we augment the description by a
> > bit?
> >
> 
> Okay, how about modifying it to: "When set to
> immediate, stream each change if
> streaming option (see optional parameters set by
> CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change.
> 

I updated the patch to use "buffered" as the explicit default value, and include
Amit's changes about document.

Besides, I tried to reduce data size in streaming subscription tap tests by this
new GUC (see 0002 patch). But I didn't covert all streaming tap tests because I
think we also need to cover the case that there are lots of changes. So, 015* is
not modified. And 017* is not modified because streaming transactions and
non-streaming transactions are tested alternately in this test.

I collected the time to run these tests before and after applying the patch set
on my machine. In debug version, it saves about 5.3 s; and in release version,
it saves about 1.8 s. The time of each test is attached.

Regards,
Shi yu


v5-0001-Add-logical_decoding_mode-GUC.patch
Description: v5-0001-Add-logical_decoding_mode-GUC.patch


v5-0002-Converting-streaming-tap-tests-by-using-logical_d.patch
Description:  v5-0002-Converting-streaming-tap-tests-by-using-logical_d.patch
The unit is milliseconds.

- debug
+-+---+-+
| test_no |  HEAD | patched |
+-+---+-+
|   016   |  3425 |   3118  |
|   018   |  4873 |   3159  |
|   019   |  3241 |   3116  |
|   022   |  8273 |   7373  |
|   023   |  7156 |   4823  |
| |   | |
|   sum   | 26968 |  21589  |
+-+---+-+


- release
+-+---+-+
| test_no |  HEAD | patched |
+-+---+-+
|   016   |  1776 |   1649  |
|   018   |  2248 |   1689  |
|   019   |  1653 |   1648  |
|   022   |  4858 |   4462  |
|   023   |  3992 |   3292  |
| |   | |
|   sum   | 14527 |  12740  |
+-+---+-+

RE: Force streaming every change in logical decoding

2022-12-23 Thread shiy.f...@fujitsu.com

On Fri, Dec 23, 2022 1:50 PM Amit Kapila 
> 
> On Thu, Dec 22, 2022 at 6:18 PM shiy.f...@fujitsu.com
>  wrote:
> >
> >
> > Besides, I tried to reduce data size in streaming subscription tap tests by 
> > this
> > new GUC (see 0002 patch). But I didn't covert all streaming tap tests
> because I
> > think we also need to cover the case that there are lots of changes. So, 
> > 015*
> is
> > not modified. And 017* is not modified because streaming transactions and
> > non-streaming transactions are tested alternately in this test.
> >
> 
> I think we can remove the newly added test from the patch and instead
> combine the 0001 and 0002 patches. I think we should leave the
> 022_twophase_cascade as it is because it can impact code coverage,
> especially the below part of the test:
> # 2PC PREPARE with a nested ROLLBACK TO SAVEPOINT
> $node_A->safe_psql(
> 'postgres', "
> BEGIN;
> INSERT INTO test_tab VALUES (, 'foobar');
> SAVEPOINT sp_inner;
> INSERT INTO test_tab SELECT i, md5(i::text) FROM
> generate_series(3, 5000) s(i);
> 
> Here, we will stream first time after the subtransaction, so can
> impact the below part of the code in ReorderBufferStreamTXN:
> if (txn->snapshot_now == NULL)
> {
> ...
> dlist_foreach(subxact_i, &txn->subtxns)
> {
> ReorderBufferTXN *subtxn;
> 
> subtxn = dlist_container(ReorderBufferTXN, node, subxact_i.cur);
> ReorderBufferTransferSnapToParent(txn, subtxn);
> }
> ...
> 

OK, I removed the modification in 022_twophase_cascade.pl and combine the two 
patches.

Please see the attached patch.
I also fixed Kuroda-san's comments[1].

[1] 
https://www.postgresql.org/message-id/TYAPR01MB5866CD99CF86EAC84119BC91F5E99%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Regards,
Shi yu


v6-0001-Add-logical_decoding_mode-GUC.patch
Description: v6-0001-Add-logical_decoding_mode-GUC.patch

Ignore dropped columns when checking the column list in logical replication

2023-01-03 Thread shiy.f...@fujitsu.com

Hi hackers,

I saw a problem related to column list.

There's a restriction that different column lists for same table can't be used
in the publications of single subscription. But we will get unexpected errors in
some cases because the dropped columns are not ignored.

For example:
-- publisher
create table tbl1 (a int primary key, b int, c int);
create publication pub1 for table tbl1(a,b);
create publication pub2 for table tbl1;
alter table tbl1 drop column c;

-- subscriber
create table tbl1 (a int primary key, b int, c int);
create subscription sub connection 'dbname=postgres port=5432' publication 
pub1, pub2;

-- publisher
insert into tbl1 values (1,2);

The publisher and subscriber will report the error:
ERROR:  cannot use different column lists for table "public.tbl1" in different 
publications

This is caused by:
a. walsender (in pgoutput_column_list_init())
/*
 * If column list includes all the columns of the table,
 * set it to NULL.
 */
if (bms_num_members(cols) == 
RelationGetNumberOfAttributes(relation))
{
bms_free(cols);
cols = NULL;
}

The returned value of RelationGetNumberOfAttributes() contains dropped columns.

b. table synchronization (in fetch_remote_table_info())
appendStringInfo(&cmd,
 "SELECT DISTINCT"
 "  (CASE WHEN (array_length(gpt.attrs, 1) = 
c.relnatts)"
 "   THEN NULL ELSE gpt.attrs END)"
 "  FROM pg_publication p,"
 "  LATERAL pg_get_publication_tables(p.pubname) gpt,"
 "  pg_class c"
 " WHERE gpt.relid = %u AND c.oid = gpt.relid"
 "   AND p.pubname IN ( %s )",
 lrel->remoteid,
 pub_names.data);

If the result of the above SQL contains more than one tuple, an error will be
report (cannot use different column lists for table ...). In this SQL, attrs is
NULL if `array_length(gpt.attrs, 1) = c.relnatts`, but `c.relnatts` contains
dropped columns, what we want is the count of alive columns.

I tried to fix them in the attached patch.

Regards,
Shi yu


v1-0001-Ignore-dropped-columns-when-checking-the-column-l.patch
Description:  v1-0001-Ignore-dropped-columns-when-checking-the-column-l.patch

RE: Force streaming every change in logical decoding

2023-01-05 Thread shiy.f...@fujitsu.com

On Thu, Dec 22, 2022 3:17 PM Masahiko Sawada  wrote:
> >  I think
> > instead of adding new tests with this patch, it may be better to
> > change some of the existing tests related to streaming to use this
> > parameter as that will clearly show one of the purposes of this patch.
> 
> +1. I think test_decoding/sql/stream.sql and spill.sql are good
> candidates and we change logical replication TAP tests in a separate
> patch.
> 

Hi,

I tried to reduce the data size in test_decoding tests by using the new added
GUC "logical_decoding_mode", and found that the new GUC is not suitable for
some cases.

For example, in the following cases in stream.sql (and there are some similar
cases in twophase_stream.sql), the changes in sub transaction exceed
logical_decoding_work_mem, but they won't be streamed because the it is rolled
back. (But the transaction is marked as streamed.) After the sub transaction,
there is a small amount of inserts, as logical_decoding_work_mem is not
exceeded, it will be streamed when decoding COMMIT. If we use
logical_decoding_mode=immediate, the small amount of inserts in toplevel
transaction will be streamed immediately. This is different from before, so I'm
not sure we can use logical_decoding_mode in this case.

```
-- streaming test with sub-transaction
BEGIN;
savepoint s1;
SELECT 'msg5' FROM pg_logical_emit_message(true, 'test', repeat('a', 50));
INSERT INTO stream_test SELECT repeat('a', 2000) || g.i FROM generate_series(1, 
35) g(i);
TRUNCATE table stream_test;
rollback to s1;
INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 
20) g(i);
COMMIT;
```

Besides, some cases in spill.sql can't use the new GUC because they want to
serialize when processing the specific toplevel transaction or sub transaction.
For example, in the case below, the changes in the subxact exceed
logical_decoding_work_mem and are serialized, and the insert after subxact is
not serialized. If logical_decoding_mode is used, both of them will be
serialized. This is not what we want to test.

```
-- spilling subxact, non-spilling main xact
BEGIN;
SAVEPOINT s;
INSERT INTO spill_test SELECT 'serialize-subbig-topsmall--1:'||g.i FROM 
generate_series(1, 5000) g(i);
RELEASE SAVEPOINT s;
INSERT INTO spill_test SELECT 'serialize-subbig-topsmall--2:'||g.i FROM 
generate_series(5001, 5001) g(i);
COMMIT;
```

Although the rest cases in spill.sql can use new GUC, but it needs set and reset
logical_decoding_mode many times. Besides, the tests in this file were added
before logical_decoding_work_mem was introduced, I am not sure if we can modify
these cases by using the new GUC.

Also, it looks the tests for toast in stream.sql can't be changed, too.

Due to the above reasons, it seems that currently few tests can be modified to
use logical_decoding_mode. If later we find some tests can changed to use
it, we can do it in a separate thread. I will close the CF entry.

Regards,
Shi yu

Fix pg_publication_tables to exclude generated columns

2023-01-09 Thread shiy.f...@fujitsu.com

Hi hackers,

I noticed that there is a problem about system view pg_publication_tables when
looking into [1]. The column "attnames" contains generated columns when no
column list is specified, but generated columns shouldn't be included because
they are not replicated (see send_relation_and_attrs()).

I think one way to fix it is to modify pg_publication_tables query to exclude
generated columns. But in this way, we need to bump catalog version when fixing
it in back-branch. Another way is to modify function
pg_get_publication_tables()'s return value to contain all supported columns if
no column list is specified, and we don't need to change system view.

Attach the patch for HEAD, and we can ignore the changes of the system view in
PG15.

[1] 
https://www.postgresql.org/message-id/OSZPR01MB631087C65BA81E1FEE5A60D2FDF59%40OSZPR01MB6310.jpnprd01.prod.outlook.com

Regards,
Shi yu




v1-0002-Test-for-column-list-check-in-tablesync.patch
Description: v1-0002-Test-for-column-list-check-in-tablesync.patch


v1-0001-Modify-pg_get_publication_tables-to-show-all-colu.patch
Description:  v1-0001-Modify-pg_get_publication_tables-to-show-all-colu.patch

RE: Fix pg_publication_tables to exclude generated columns

2023-01-09 Thread shiy.f...@fujitsu.com

On Mon, Jan 9, 2023 11:06 PM Tom Lane  wrote:
> 
> Amit Kapila  writes:
> > On Mon, Jan 9, 2023 at 5:29 PM shiy.f...@fujitsu.com
> >  wrote:
> >> I think one way to fix it is to modify pg_publication_tables query to 
> >> exclude
> >> generated columns. But in this way, we need to bump catalog version when
> fixing
> >> it in back-branch. Another way is to modify function
> >> pg_get_publication_tables()'s return value to contain all supported columns
> if
> >> no column list is specified, and we don't need to change system view.
> 
> > That sounds like a reasonable approach to fix the issue.
> 
> We could just not fix it in the back branches.  I'd argue that this is
> as much a definition change as a bug fix, so it doesn't really feel
> like something to back-patch anyway.
> 

If this is not fixed in back-branch, in some cases we will get an error when
creating/refreshing subscription because we query pg_publication_tables in
column list check.

e.g.

-- publisher
CREATE TABLE test_mix_4 (a int PRIMARY KEY, b int, c int, d int GENERATED 
ALWAYS AS (a + 1) STORED);
CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 (a, b, c);
CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4;

-- subscriber
CREATE TABLE test_mix_4 (a int PRIMARY KEY, b int, c int, d int);

postgres=# CREATE SUBSCRIPTION sub1 CONNECTION 'port=5432' PUBLICATION 
pub_mix_7, pub_mix_8;
ERROR:  cannot use different column lists for table "public.test_mix_4" in 
different publications

I think it might be better to fix it in back-branch. And if we fix it by
modifying pg_get_publication_tables(), we don't need to bump catalog version in
back-branch, I think this seems acceptable.

Regards,
Shi yu

RE: Allow logical replication to copy tables in binary format

2023-01-11 Thread shiy.f...@fujitsu.com

On Mon, Nov 14, 2022 8:08 PM Melih Mutlu  wrote:
>
> Attached patch with updated version of this patch.

Thanks for your patch.

I tried to do a performance test for this patch, the result looks good to me.
(The steps are similar to what Melih shared [1].)

The time to synchronize about 1GB data in binary (the average of the middle
eight was taken):
HEAD: 16.854 s
Patched: 6.805 s

Besides, here are some comments.

1.
+# Binary enabled subscription should fail
+$node_subscriber_binary->wait_for_log("ERROR:  insufficient data left in 
message");

Should it be changed to "ERROR: ( [A-Z0-9]+:)? ", like other subscription tests.

2.
+# Binary disabled subscription should succeed
+$node_publisher->wait_for_catchup('tap_sub');

If we want to wait for table synchronization to finish, should we call
wait_for_subscription_sync()?

3.
I also think it might be better to support copy binary only for publishers of
v16 or later. Do you plan to implement it in the patch?

[1] 
https://www.postgresql.org/message-id/CAGPVpCQEKDVKQPf6OFQ-9WiRYB1YRejm--YJTuwgzuvj1LEJ2A%40mail.gmail.com

Regards,
Shi yu

RE: Fix pg_publication_tables to exclude generated columns

2023-01-11 Thread shiy.f...@fujitsu.com

On Wed, Jan 11, 2023 2:40 PM Amit Kapila  wrote:
> 
> On Wed, Jan 11, 2023 at 10:07 AM Tom Lane  wrote:
> >
> > Amit Kapila  writes:
> > >> On Mon, Jan 9, 2023 11:06 PM Tom Lane  wrote:
> > >>> We could just not fix it in the back branches.  I'd argue that this is
> > >>> as much a definition change as a bug fix, so it doesn't really feel
> > >>> like something to back-patch anyway.
> >
> > > So, if we don't backpatch then it could lead to an error when it
> > > shouldn't have which is clearly a bug. I think we should backpatch
> > > this unless Tom or others are against it.
> >
> > This isn't a hill that I'm ready to die on ... but do we have any field
> > complaints about this?  If not, I still lean against a back-patch.
> > I think there's a significant risk of breaking case A while fixing
> > case B when we change this behavior, and that's something that's
> > better done only in a major release.
> >
> 
> Fair enough, but note that there is a somewhat related problem for
> dropped columns [1] as well. While reviewing that it occurred to me
> that generated columns also have a similar problem which leads to this
> thread (it would have been better if there is a mention of the same in
> the initial email). Now, as symptoms are similar, I think we shouldn't
> back-patch that as well, otherwise, it will appear to be partially
> fixed. What do you think?
> 
> [1] - https://www.postgresql.org/message-
> id/OSZPR01MB631087C65BA81E1FEE5A60D2FDF59%40OSZPR01MB6310.jpnpr
> d01.prod.outlook.com
> 

I agree to only fix them on HEAD.

I merged this patch and the one in [1] as they are similar problems. Please
see the attached patch.

I removed the changes in tablesync.c which simplified the query in
fetch_remote_table_info(), because it only works for publishers of v16. Those
changes are based on pg_get_publication_tables() returning all columns when no
column list is specified, but publishers of v15 return NULL in that case.

[1] 
https://www.postgresql.org/message-id/OSZPR01MB631087C65BA81E1FEE5A60D2FDF59%40OSZPR01MB6310.jpnprd01.prod.outlook.com


Regards,
Shi yu


v2-0001-Ignore-dropped-columns-and-generated-columns-when.patch
Description:  v2-0001-Ignore-dropped-columns-and-generated-columns-when.patch


v2-0002-Tests-for-checking-column-list-restriction.patch
Description: v2-0002-Tests-for-checking-column-list-restriction.patch

RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-01-16 Thread shiy.f...@fujitsu.com

On Wed, Jan 11, 2023 4:31 PM Melih Mutlu  wrote:
> 
> Hi hackers,
> 
> Rebased the patch to resolve conflicts.
> 

Thanks for your patch. Here are some comments.

0001 patch

1. walsender.c
+   /* Create a tuple to send consisten WAL location */

"consisten" should be "consistent" I think.

2. logical.c
+   if (need_full_snapshot)
+   {
+   LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+
+   SpinLockAcquire(&slot->mutex);
+   slot->effective_catalog_xmin = xmin_horizon;
+   slot->data.catalog_xmin = xmin_horizon;
+   slot->effective_xmin = xmin_horizon;
+   SpinLockRelease(&slot->mutex);
+
+   xmin_horizon = 
GetOldestSafeDecodingTransactionId(!need_full_snapshot);
+   ReplicationSlotsComputeRequiredXmin(true);
+
+   LWLockRelease(ProcArrayLock);
+   }

It seems that we should first get the safe decoding xid, then inform the slot
machinery about the new limit, right? Otherwise the limit will be
InvalidTransactionId and that seems inconsistent with the comment.

3. doc/src/sgml/protocol.sgml
+   is used in the currenct transaction. This command is currently only 
supported
+   for logical replication.
+   slots.

We don't need to put "slots" in a new line.

0002 patch

1.
In pg_subscription_rel.h, I think the type of "srrelslotname" can be changed to
NameData, see "subslotname" in pg_subscription.h.

2.
+* Find the logical replication sync worker if 
exists store
+* the slot number for dropping associated 
replication slots
+* later.

Should we add comma after "if exists"?

3.
+   PG_FINALLY();
+   {
+   pfree(cmd.data);
+   }
+   PG_END_TRY();
+   \
+   return tablelist;
+}

Do we need the backslash?

4.
+   /*
+* Advance to the LSN got from walrcv_create_slot. This is WAL
+* logged for the purpose of recovery. Locks are to prevent the
+* replication origin from vanishing while advancing.

"walrcv_create_slot" should be changed to
"walrcv_create_slot/walrcv_slot_snapshot" I think.

5.
+   /* Replication drop might still exist. Try to drop */
+   replorigin_drop_by_name(originname, true, false);

Should "Replication drop" be "Replication origin"?

6.
I saw an assertion failure in the following case, could you please look into it?
The backtrace is attached.

-- pub
CREATE TABLE tbl1 (a int, b text);
CREATE TABLE tbl2 (a int primary key, b text);
create publication pub for table tbl1, tbl2;
insert into tbl1 values (1, 'a');
insert into tbl1 values (1, 'a');

-- sub
CREATE TABLE tbl1 (a int primary key, b text);
CREATE TABLE tbl2 (a int primary key, b text);
create subscription sub connection 'dbname=postgres port=5432' publication pub;

Subscriber log:
2023-01-17 14:47:10.054 CST [1980841] LOG:  logical replication apply worker 
for subscription "sub" has started
2023-01-17 14:47:10.060 CST [1980843] LOG:  logical replication table 
synchronization worker for subscription "sub", table "tbl1" has started
2023-01-17 14:47:10.070 CST [1980845] LOG:  logical replication table 
synchronization worker for subscription "sub", table "tbl2" has started
2023-01-17 14:47:10.073 CST [1980843] ERROR:  duplicate key value violates 
unique constraint "tbl1_pkey"
2023-01-17 14:47:10.073 CST [1980843] DETAIL:  Key (a)=(1) already exists.
2023-01-17 14:47:10.073 CST [1980843] CONTEXT:  COPY tbl1, line 2
2023-01-17 14:47:10.074 CST [1980821] LOG:  background worker "logical 
replication worker" (PID 1980843) exited with exit code 1
2023-01-17 14:47:10.083 CST [1980845] LOG:  logical replication table 
synchronization worker for subscription "sub", table "tbl2" has finished
2023-01-17 14:47:10.083 CST [1980845] LOG:  logical replication table 
synchronization worker for subscription "sub" has moved to sync table "tbl1".
TRAP: failed Assert("node != InvalidRepOriginId"), File: "origin.c", Line: 892, 
PID: 1980845

Regards,
Shi yu
#0  0x7f38bb78baff in raise () from /lib64/libc.so.6
#1  0x7f38bb75eea5 in abort () from /lib64/libc.so.6
#2  0x00b44501 in ExceptionalCondition (conditionName=0xd0deac "node != 
InvalidRepOriginId", fileName=0xd0da4d "origin.c", lineNumber=892) at 
assert.c:66
#3  0x008f93f5 in replorigin_advance (node=0, remote_commit=22144256, 
local_commit=0, go_backward=true, wal_log=true) at origin.c:892
#4  0x0090cf2f in LogicalRepSyncTableStart 
(origin_startpos=0x7ffcbe54e808) at tablesync.c:1641
#5  0x00913ee4 in start_table_sync (origin_startpos=0x7ffcbe54e808, 
myslotname=0x7ffcbe54e790) at worker.c:4419
#6  0x009140be in run_tablesync_worker (options=0x7ffcbe54e7c0, 
slotname=0x0, originname=0x7ffcbe54e810 "pg_16398_3", originname_size=64, 
origin_startpos=0x7ffcbe54e808)
at worker.c:4497
#7  0x00

Update comments in multixact.c

2023-01-17 Thread shiy.f...@fujitsu.com

Hi,

I noticed that commit 5212d447fa updated some comments in multixact.c because
SLRU truncation for multixacts is performed during VACUUM, instead of
checkpoint. Should the following comments which mentioned checkpointer be
changed, too?

1.
* we compute it (using nextMXact if none are valid).  Each backend is
* required not to attempt to access any SLRU data for MultiXactIds older
* than its own OldestVisibleMXactId[] setting; this is necessary because
* the checkpointer could truncate away such data at any instant.

2.
 * We set the OldestVisibleMXactId for a given transaction the first time
 * it's going to inspect any MultiXactId.  Once we have set this, we are
 * guaranteed that the checkpointer won't truncate off SLRU data for
 * MultiXactIds at or after our OldestVisibleMXactId.

Regards,
Shi yu

RE: Update comments in multixact.c

2023-01-18 Thread shiy.f...@fujitsu.com

On Wed, Jan 18, 2023 6:04 AM Peter Geoghegan  wrote:
> 
> On Tue, Jan 17, 2023 at 1:33 AM shiy.f...@fujitsu.com
>  wrote:
> > I noticed that commit 5212d447fa updated some comments in multixact.c
> because
> > SLRU truncation for multixacts is performed during VACUUM, instead of
> > checkpoint. Should the following comments which mentioned checkpointer be
> > changed, too?
> 
> Yes, I think so.

Thanks for your reply.

Attach a patch which fixed them.

Regards,
Shi yu


v1-0001-Update-comments-in-multixact.c.patch
Description: v1-0001-Update-comments-in-multixact.c.patch

RE: Data is copied twice when specifying both child and parent table in publication

2023-03-31 Thread shiy.f...@fujitsu.com

On Fri, Mar 31, 2023 2:16 AM Jacob Champion  wrote:
> 
> On Wed, Mar 29, 2023 at 2:00 AM Amit Kapila 
> wrote:
> > Pushed.
> 
> While rebasing my logical-roots patch over the top of this, I ran into
> another situation where mixed viaroot settings can duplicate data. The
> key idea is to subscribe to two publications with mixed settings, as
> before, and add a partition root that's already been replicated with
> viaroot=false to the other publication with viaroot=true.
> 
>   pub=# CREATE TABLE part (a int) PARTITION BY RANGE (a);
>   pub=# CREATE PUBLICATION pub_all FOR ALL TABLES;
>   pub=# CREATE PUBLICATION pub_other FOR TABLE other WITH
> (publish_via_partition_root);
>   -- populate with data, then switch to subscription side
>   sub=# CREATE SUBSCRIPTION sub CONNECTION ... PUBLICATION pub_all,
> pub_other;
>   -- switch back to publication
>   pub=# ALTER PUBLICATION pub_other ADD TABLE part;
>   -- and back to subscription
>   sub=# ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
>   -- data is now duplicated
> 
> (Standalone reproduction attached.)
> 
> This is similar to what happens if you alter the
> publish_via_partition_root setting for an existing publication, but
> I'd argue it's easier to hit by accident. Is this part of the same
> class of bugs, or is it different (or even expected) behavior?
> 

I noticed that a similar problem has been discussed in this thread, see [1] [2]
[3] [4]. It seems complicated to fix it if we want to automatically skip tables
that have been synchronized previously by code, and this may overkill in some
cases (e.g. The target table in subscriber is not a partitioned table, and the
user want to synchronize all data in the partitioned table from the publisher).
Besides, it seems not a common case. So I'm not sure we should fix it. Maybe we
can just add some documentation for it as Peter mentioned.

[1] 
https://www.postgresql.org/message-id/CAJcOf-eQR_%3Dq0f4ZVHd342QdLvBd_995peSr4xCU05hrS3TeTg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com
 (the second issue in it)
[3] 
https://www.postgresql.org/message-id/CA%2BHiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ%40mail.gmail.com
[4] 
https://www.postgresql.org/message-id/CAA4eK1%2BNWreG%3D2sKiMz8vFzTsFhEHCjgQMyAu6zj3sdLmcheYg%40mail.gmail.com

Regards,
Shi Yu

RE: logical replication empty transactions

2022-03-23 Thread shiy.f...@fujitsu.com

On Thursday, March 24, 2022 11:19 AM Hou, Zhijie/侯 志杰  
wrote:
> 
> Attach the new version patch which include the following changes:
> 
> - Fix a typo
> - Change the requestreply flag of the newly added WalSndKeepalive to false,
>   because the subscriber can judge whether it's necessary to post a reply
> based
>   on the received LSN.
> - Add a testcase to make sure there is no data in subscriber side when the
>   transaction is skipped.
> - Change the name of flag skipped_empty_xact to skipped_xact which seems
> more
>   understandable.
> - Merge Amit's suggested changes.
> 

Hi,

This patch skips sending BEGIN/COMMIT messages for empty transactions and saves
network bandwidth. So I tried to do a test to see how does it affect bandwidth.

This test refers to the previous test by Peter[1]. I temporarily modified the
code in worker.c to log the length of the data received by the subscriber (after
calling walrcv_receive()). At the conclusion of the test run, the logs are
processed to extract the numbers.

[1] 
https://www.postgresql.org/message-id/CAHut%2BPuyqcDJO0X2BxY%2B9ycF%2Bew3x77FiCbTJQGnLDbNmMASZQ%40mail.gmail.com

The number of transactions is fixed (1000), and I tested different mixes of
empty and not-empty transactions sent - 0%, 25%, 50%, 100%. The patch will send
keepalive message when skipping empty transaction in synchronous replication
mode, so I tested both synchronous replication and asynchronous replication.

The results are as follows, and attach the bar chart.

Sync replication - size of sending data

0%  25% 50% 75% 100%
HEAD335211  281655  223661  170271  115108
patched 335217  256617  173878  98095   18108

Async replication - size of sending data

0%  25% 50% 75% 100%
HEAD339379  285835  236343  184227  115000
patched 335077  260953  180022  11  18126

The details of the test is also attached.

Summary of result:
In both synchronous replication mode and asynchronous replication mode, as more
empty transactions, the improvement is more obvious. Even if when there is no
empty transaction, I can't see any overhead.

Regards,
Shi yu
-- create table 
CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), 
d bigint DEFAULT 999);
CREATE TABLE test_tab_nopub (a int primary key, b text, c timestamptz DEFAULT 
now(), d bigint DEFAULT 999);

test_empty_not_published.sql:
BEGIN;
INSERT INTO test_tab_nopub VALUES(1, 'foo');
UPDATE test_tab_nopub SET b = 'bar' WHERE a = 1;
DELETE FROM test_tab_nopub WHERE a = 1;
COMMIT;

test_empty_published.sql:
BEGIN;
INSERT INTO test_tab VALUES(1, 'foo');
UPDATE test_tab SET b = 'bar' WHERE a = 1;
DELETE FROM test_tab WHERE a = 1;
COMMIT;

-- create publication
create publication pub for table test_tab;

-- create subscription
CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432 dbname=postgres' 
PUBLICATION pub;"

-- empty transaction: 0%
pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_published.sql postgres
-- empty transaction: 25%
pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_not_published.sql@5 -f 
test_empty_published.sql@15 postgres
-- empty transaction: 50%
pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_not_published.sql@10 -f 
test_empty_published.sql@10 postgres
-- empty transaction: 75%
pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_not_published.sql@15 -f 
test_empty_published.sql@5 postgres
-- empty transaction: 100%
pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_not_published.sql postgres

RE: logical replication empty transactions

2022-03-29 Thread shiy.f...@fujitsu.com

On Tue, Mar 29, 2022 5:15 PM Hou, Zhijie/侯 志杰  wrote:
> 
> Thanks for the comment.
> Attach the new version patch with this change.
> 

Hi,

I did a performance test for this patch to see if it affects performance when
publishing empty transactions, based on the v32 patch.

In this test, I use synchronous logical replication, and publish a table with no
operations on it. The test uses pgbench, each run takes 15 minutes, and I take
median of 3 runs. Drop and recreate db after each run.

The results are as follows, and attach the bar chart. The details of the test is
also attached.

TPS - publishing empty transactions (scale factor 1)

4 threads   8 threads   16 threads
HEAD4818.2837   4353.6243   3888.5995
patched 5111.2936   4555.1629   4024.4286

TPS - publishing empty transactions (scale factor 100)

4 threads   8 threads   16 threads
HEAD9066.6465   16118.0453  21485.1207
patched 9357.3361   16638.6409  24503.6829

There is an improvement of more than 3% after applying this patch, and in the
best case, it improves by 14%, which looks good to me.

Regards,
Shi yu
Some parameters specified in postgresql.conf
=
shared_buffers = 8GB
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off

Steps
=
-- in publisher
-- using scale factor 100
pgbench -i postgres -p 5432 -s 100
-- using default scale factor, 1
pgbench -i postgres -p 5432

-- create table in publisher and subscriber
CREATE TABLE tbl_empty (a int);

-- create publication
create publication pub for table tbl_empty;

-- create subscription
CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432 dbname=postgres' 
PUBLICATION pub;"

Then set synchronous replication and restart publisher server.

Run pgbench and record the TPS.

pgbench -b tpcb-like -c 4 -j 4 -T 900 -n postgres

About the test machine
=
The processor spec of the test machine is IntelÂ® XeonÂ® Silver 4210 CPU 
@2.20GHz with 10 cores/20 threads/2 sockets.

RE: Logical replication timeout problem

2022-03-30 Thread shiy.f...@fujitsu.com

On Wed, Mar 30, 2022 3:54 PM wangw.f...@fujitsu.com  
wrote:
> 
> Rebase the patch because the commit d5a9d86d in current HEAD.
> 

Thanks for your patch, I tried this patch and confirmed that there is no timeout
problem after applying this patch, and I could reproduce this problem on HEAD.

Regards,
Shi yu

RE: Skipping schema changes in publication

2022-04-14 Thread shiy.f...@fujitsu.com

On Tue, Apr 12, 2022 2:23 PM vignesh C  wrote:
> 
> The patch does not apply on top of HEAD because of the recent commit,
> attached patch is rebased on top of HEAD.
> 

Thanks for your patch. Here are some comments for 0001 patch.

1. doc/src/sgml/catalogs.sgml
@@ -6438,6 +6438,15 @@ SCRAM-SHA-256$:&l
A null value indicates that all columns are published.

+
+
+  
+   pnskip bool
+  
+  
+   True if the schema is skip schema
+  
+ 

This change is added to pg_publication_rel, I think it should be added to
pg_publication_namespace, right?

2.
postgres=# alter publication p1 add skip all tables in schema s1,s2;
ERROR:  schema "s1" is already member of publication "p1"

This error message seems odd to me, can we improve it? Something like:
schema "s1" is already skipped in publication "p1"

3.
create table tbl (a int primary key);
create schema s1;
create schema s2;
create table s1.tbl (a int);
create publication p1 for all tables skip all tables in schema s1,s2;

postgres=# \dRp+
   Publication p1
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root
--++-+-+-+---+--
 postgres | t  | t   | t   | t   | t | f
Skip tables from schemas:
"s1"
"s2"

postgres=# select * from pg_publication_tables;
 pubname | schemaname | tablename
-++---
 p1  | public | tbl
 p1  | s1 | tbl
(2 rows)

There shouldn't be a record of s1.tbl, since all tables in schema s1 are 
skipped.

I found that it is caused by the following code:

src/backend/catalog/pg_publication.c
+   foreach(cell, pubschemalist)
+   {
+   PublicationSchInfo *pubsch = (PublicationSchInfo *) 
lfirst(cell);
+
+   skipschemaidlist = lappend_oid(result, pubsch->oid);
+   }

The first argument to append_oid() seems wrong, should it be:

skipschemaidlist = lappend_oid(skipschemaidlist, pubsch->oid);

4. src/backend/commands/publicationcmds.c

/*
 * Convert the PublicationObjSpecType list into schema oid list and
 * PublicationTable list.
 */
static void
ObjectsInPublicationToOids(List *pubobjspec_list, ParseState *pstate,
   List **rels, List **schemas)

Should we modify the comment of ObjectsInPublicationToOids()?
"schema oid list" should be "PublicationSchInfo list".

Regards,
Shi yu

RE: Data is copied twice when specifying both child and parent table in publication

2022-04-19 Thread shiy.f...@fujitsu.com

On Tue, Apr 19, 2022 3:05 PM houzj.f...@fujitsu.com  
wrote:
>
> > -Original Message-
> > From: Wang, Wei/王 威 
> On Thursday, April 7, 2022 11:08 AM
> >
> > On Thur, Mar 10, 2021 at 10:08 AM houzj.f...@fujitsu.com wrote:
> > > Hi,
> > >
> > > When reviewing some logical replication related features. I noticed 
> > > another
> > > possible problem if the subscriber subscribes multiple publications which
> > > publish parent and child table.
> > >
> > > For example:
> > >
> > > pub
> > > create table t (a int, b int, c int) partition by range (a);
> > > create table t_1 partition of t for values from (1) to (10);
> > >
> > > create publication pub1 for table t
> > >   with (PUBLISH_VIA_PARTITION_ROOT);
> > > create publication pub2 for table t_1
> > >   with (PUBLISH_VIA_PARTITION_ROOT);
> > >
> > > sub
> > >  prepare table t and t_1
> > > CREATE SUBSCRIPTION sub CONNECTION 'port=1 dbname=postgres'
> > > PUBLICATION pub1, pub2;
> > >
> > > select * from pg_subscription_rel ;
> > >  srsubid | srrelid | srsubstate | srsublsn
> > > -+-++---
> > >16391 |   16385(t) | r  | 0/150D100
> > >16391 |   16388(t_1) | r  | 0/150D138
> > >
> > > If subscribe two publications one of them publish parent table with
> > > (pubviaroot=true) and another publish child table. Both the parent table 
> > > and
> > > child table will exist in pg_subscription_rel which also means we will do
> > > initial copy for both tables.
> > >
> > > But after initial copy, we only publish change with the schema of the 
> > > parent
> > > table(t). It looks a bit inconsistent.
> > >
> > > Based on the document of PUBLISH_VIA_PARTITION_ROOT option. I think
> > the
> > > expected behavior could be we only store the top most parent(table t) in
> > > pg_subscription_rel and do initial copy for it if pubviaroot is on. I 
> > > haven't
> > > thought about how to fix this and will investigate this later.
> > Hi,
> > I try to fix this bug. Attach the patch.
> >
> > The current HEAD get table list for one publication by invoking function
> > pg_get_publication_tables. If multiple publications are subscribed, then 
> > this
> > function is invoked multiple times. So option PUBLISH_VIA_PARTITION_ROOT
> > works
> > independently on every publication, I think it does not work correctly on
> > different publications of the same subscription.
> >
> > So I fix this bug by the following two steps:
> > First step,
> > I get oids of subscribed tables by publication list. Then for tables with 
> > the
> > same topmost root table, I filter them base on the option
> > PUBLISH_VIA_PARTITION_ROOT(see new function filter_partitions_oids).
> > After filtering, I get the final oid list.
> > Second step,
> > I get the required informations(nspname and relname) base on the oid list of
> > first step.
> 
> Thanks for updating the patch.
> I confirmed that the bug is fixed by this patch.
> 
> One suggestion is that can we simplify the code by moving the logic of 
> checking
> the ancestor into the SQL ?. For example, we could filter the outpout of
> pg_publication_tables by adding A WHERE clause which checks whether the table
> is a partition and if its ancestor is also in the output. I think we can also
> filter the needless partition in this approach.
> 

I agreed with you and I tried to fix this problem in a simpler way. What we want
is to exclude the partitioned table whose ancestor is also need to be
replicated, so how about implementing that by using the following SQL when
getting the table list from publisher?

SELECT DISTINCT ns.nspname, c.relname
FROM pg_catalog.pg_publication_tables t
JOIN pg_catalog.pg_namespace ns ON ns.nspname = t.schemaname
JOIN pg_catalog.pg_class c ON c.relname = t.tablename AND c.relnamespace = 
ns.oid
WHERE t.pubname IN ('p0','p2')
AND (c.relispartition IS FALSE OR NOT EXISTS (SELECT 1 FROM 
pg_partition_ancestors(c.oid)
WHERE relid IN ( SELECT DISTINCT (schemaname||'.'||tablename)::regclass::oid
FROM pg_catalog.pg_publication_tables t
WHERE t.pubname IN ('p0','p2') ) AND relid != c.oid));

Please find the attached patch which used this approach, I also merged the test
in Wang's patch into it.

Regards,
Shi yu


v2-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch
Description:  v2-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch

RE: Data is copied twice when specifying both child and parent table in publication

2022-04-27 Thread shiy.f...@fujitsu.com

On Sun, Apr 24, 2022 2:16 PM Wang, Wei/王 威  wrote:
> 
> Attach the new patches.[suggestions by Amit-San]
> The patch for HEAD:
> 1. Add a new function to get tables info by a publications array.
> The patch for REL14:
> 1. Use an alias to make the statement understandable. BTW, I adjusted the
> alignment.
> 2. Improve the test cast about the column list and row filter to cover this 
> bug.
> 

Thanks for your patches.

Here's a comment on the patch for REL14.

+   appendStringInfo(&cmd, "SELECT DISTINCT ns.nspname, c.relname\n"
+" FROM 
pg_catalog.pg_publication_tables t\n"
+"  JOIN pg_catalog.pg_namespace 
ns\n"
+" ON ns.nspname = 
t.schemaname\n"
+"  JOIN pg_catalog.pg_class c\n"
+" ON c.relname = t.tablename 
AND c.relnamespace = ns.oid\n"
+" WHERE t.pubname IN (%s)\n"
+" AND (c.relispartition IS FALSE\n"
+"  OR NOT EXISTS\n"
+"( SELECT 1 FROM 
pg_partition_ancestors(c.oid) as relid\n"
+"  WHERE relid IN\n"
+"(SELECT DISTINCT 
(schemaname || '.' || tablename)::regclass::oid\n"
+" FROM 
pg_catalog.pg_publication_tables t\n"
+" WHERE t.pubname IN 
(%s))\n"
+"  AND relid != c.oid))\n",
+pub_names.data, pub_names.data);

I think we can use an alias like 'pa' for pg_partition_ancestors, and modify 
the SQL as follows. 

+   appendStringInfo(&cmd, "SELECT DISTINCT ns.nspname, c.relname\n"
+" FROM 
pg_catalog.pg_publication_tables t\n"
+"  JOIN pg_catalog.pg_namespace 
ns\n"
+" ON ns.nspname = 
t.schemaname\n"
+"  JOIN pg_catalog.pg_class c\n"
+" ON c.relname = t.tablename 
AND c.relnamespace = ns.oid\n"
+" WHERE t.pubname IN (%s)\n"
+" AND (c.relispartition IS FALSE\n"
+"  OR NOT EXISTS\n"
+"( SELECT 1 FROM 
pg_partition_ancestors(c.oid) pa\n"
+"  WHERE pa.relid IN\n"
+"(SELECT DISTINCT 
(t.schemaname || '.' || t.tablename)::regclass::oid\n"
+" FROM 
pg_catalog.pg_publication_tables t\n"
+" WHERE t.pubname IN 
(%s))\n"
+"  AND pa.relid != c.oid))\n",
+pub_names.data, pub_names.data);

Regards,
Shi yu

RE: Fix some newly modified tab-complete changes

2022-11-16 Thread shiy.f...@fujitsu.com

On Thu, Nov 10, 2022 12:54 PM Michael Paquier  wrote:
> 
> On Tue, Oct 18, 2022 at 05:17:32PM +1100, Peter Smith wrote:
> > I re-tested and confirm that the patch does indeed fix the quirk I'd
> > previously reported.
> >
> > But, looking at the patch code, I don't know if it is the best way to
> > fix the problem or not. Someone with more experience of the
> > tab-complete module can judge that.
> 
> It seems to me that the patch as proposed has more problems than
> that.  I have spotted a few issues at quick glance, there may be
> more.
> 
> For example, take this one:
> +   else if (TailMatches("GRANT") ||
> +TailMatches("REVOKE", "GRANT", "OPTION", "FOR"))
> COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_roles,
> 
> "REVOKE GRANT OPTION FOR" completes with a list of role names, which
> is incorrect.
> 
> FWIW, I am not much a fan of the approach taken by the patch to
> duplicate the full list of keywords to append after REVOKE or GRANT,
> at the only difference of "GRANT OPTION FOR".  This may be readable if
> unified with a single list, with extra items appended for GRANT and
> REVOKE?
> 
> Note that REVOKE has a "ADMIN OPTION FOR" clause, which is not
> completed to.

Thanks a lot for looking into this patch.

I have fixed the problems you saw, and improved the patch as you suggested.

Besides, I noticed that the tab completion for "ALTER DEFAULT PRIVILEGES ...
GRANT/REVOKE ..." missed "CREATE". Fix it in 0001 patch.

And commit e3ce2de09 supported GRANT ... WITH INHERIT ..., but there's no tab
completion for it. Add this in 0002 patch.

Please see the attached patches.

Regards,
Shi yu


v5-0002-Add-tab-completion-for-GRANT-WITH-INHERIT.patch
Description: v5-0002-Add-tab-completion-for-GRANT-WITH-INHERIT.patch


v5-0001-Fix-tab-completion-for-GRANT-REVOKE.patch
Description: v5-0001-Fix-tab-completion-for-GRANT-REVOKE.patch

RE: Avoid streaming the transaction which are skipped (in corner cases)

2022-11-28 Thread shiy.f...@fujitsu.com

On Sun, Nov 27, 2022 1:33 PM Dilip Kumar  wrote:
> 
> On Sat, Nov 26, 2022 at 12:15 PM Amit Kapila 
> wrote:
> >
> > On Fri, Nov 25, 2022 at 5:38 PM Amit Kapila 
> wrote:
> > >
> > > On Fri, Nov 25, 2022 at 1:35 PM Dilip Kumar 
> wrote:
> > > >
> > > > During DecodeCommit() for skipping a transaction we use ReadRecPtr to
> > > > check whether to skip this transaction or not.  Whereas in
> > > > ReorderBufferCanStartStreaming() we use EndRecPtr to check whether
> to
> > > > stream or not. Generally it will not create a problem but if the
> > > > commit record itself is adding some changes to the transaction(e.g.
> > > > snapshot) and if the "start_decoding_at" is in between ReadRecPtr and
> > > > EndRecPtr then streaming will decide to stream the transaction where
> > > > as DecodeCommit will decide to skip it.  And for handling this case in
> > > > ReorderBufferForget() we call stream_abort().
> > > >
> > >
> > > The other cases are probably where we don't have FilterByOrigin or
> > > dbid check, for example,
> XLOG_HEAP2_NEW_CID/XLOG_XACT_INVALIDATIONS.
> > > We anyway actually don't send anything for such cases except empty
> > > start/stop messages. Can we add some flag to txn which says that there
> > > is at least one change like DML that we want to stream?
> > >
> >
> > We can probably think of using txn_flags for this purpose.
> 
> In the attached patch I have used txn_flags to identify whether it has
> any streamable change or not and the transaction will not be selected
> for streaming unless it has at least one streamable change.
> 

Thanks for your patch.

I saw that the patch added a check when selecting largest transaction, but in
addition to ReorderBufferCheckMemoryLimit(), the transaction can also be
streamed in ReorderBufferProcessPartialChange(). Should we add the check in
this function, too?

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 9a58c4bfb9..108737b02f 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -768,7 +768,8 @@ ReorderBufferProcessPartialChange(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
 */
if (ReorderBufferCanStartStreaming(rb) &&
!(rbtxn_has_partial_change(toptxn)) &&
-   rbtxn_is_serialized(txn))
+   rbtxn_is_serialized(txn) &&
+   rbtxn_has_streamable_change(txn))
ReorderBufferStreamTXN(rb, toptxn);
 }

Regards,
Shi yu

Force streaming every change in logical decoding

2022-12-05 Thread shiy.f...@fujitsu.com

Hi hackers,

In logical decoding, when logical_decoding_work_mem is exceeded, the changes are
sent to output plugin in streaming mode. But there is a restriction that the
minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC to
allow sending every change to output plugin without waiting until
logical_decoding_work_mem is exceeded.

This helps to test streaming mode. For example, to test "Avoid streaming the
transaction which are skipped" [1], it needs many XLOG_XACT_INVALIDATIONS
messages. With the new option, it can be tested with fewer changes and in less
time. Also, this new option helps to test more scenarios for "Perform streaming
logical transactions by background workers" [2].

[1] 
https://www.postgresql.org/message-id/CAFiTN-tHK=7lzfrps8fbt2ksrojgqbzywcgxst2bm9-rjja...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/CAA4eK1%2BwyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw%40mail.gmail.com

Regards,
Shi yu


v1-0001-Allow-streaming-every-change-without-waiting-till.patch
Description:  v1-0001-Allow-streaming-every-change-without-waiting-till.patch

RE: Avoid streaming the transaction which are skipped (in corner cases)

2022-12-05 Thread shiy.f...@fujitsu.com

On Mon, Dec 5, 2022 6:57 PM Amit Kapila  wrote:
> 
> On Mon, Dec 5, 2022 at 3:41 PM Dilip Kumar  wrote:
> >
> > I think we need something like this[1] so that we can better control
> > the streaming.
> >
> 
> +1. The additional advantage would be that we can generate parallel
> apply and new streaming tests with much lesser data. Shi-San, can you
> please start a new thread for the GUC patch proposed by you as
> indicated by Dilip?
> 

OK, I started a new thread for it. [1]

[1] 
https://www.postgresql.org/message-id/OSZPR01MB63104E7449DBE41932DB19F1FD1B9%40OSZPR01MB6310.jpnprd01.prod.outlook.com

Regards,
Shi yu

RE: Avoid streaming the transaction which are skipped (in corner cases)

2022-12-06 Thread shiy.f...@fujitsu.com

On Wed, Dec 7, 2022 12:01 PM Dilip Kumar  wrote:
> 
> On Wed, Dec 7, 2022 at 9:28 AM Amit Kapila 
> wrote:
> >
> > On Tue, Dec 6, 2022 at 11:55 AM shiy.f...@fujitsu.com
> >  wrote:
> > >
> > > On Mon, Dec 5, 2022 6:57 PM Amit Kapila 
> wrote:
> > > >
> > > > On Mon, Dec 5, 2022 at 3:41 PM Dilip Kumar 
> wrote:
> > > > >
> > > > > I think we need something like this[1] so that we can better control
> > > > > the streaming.
> > > > >
> > > >
> > > > +1. The additional advantage would be that we can generate parallel
> > > > apply and new streaming tests with much lesser data. Shi-San, can you
> > > > please start a new thread for the GUC patch proposed by you as
> > > > indicated by Dilip?
> > > >
> > >
> > > OK, I started a new thread for it. [1]
> > >
> >
> > Thanks. I think it is better to go ahead with this patch and once we
> > decide what is the right thing to do in terms of GUC then we can try
> > to add additional tests for this. Anyway, it is not that the code
> > added by this patch is not getting covered by existing tests. What do
> > you think?
> 
> That makes sense to me.
> 

+1

Regards,
Shi yu

RE: Truncate in synchronous logical replication failed

2021-04-22 Thread shiy.f...@fujitsu.com

> I don't think here we need to restart to get a stable list of indexes
> as we do in RelationGetIndexAttrBitmap. The reason is here we build
> the cache entry using a historic snapshot and all the later changes
> are absorbed while decoding WAL. I have updated that and modified few
> comments in the attached patch. Can you please test this in
> clobber_cache_always mode? I think just testing
> subscription/t/010_truncate.pl would be sufficient.

Thanks for your patch. I tested your patch and it passes 'make check-world' and 
it works as expected.
By the way, I also tested in clobber_cache_always mode, it passed, too.(I only 
tested subscription/t/010_truncate.pl.)

Regards,
Shi yu

RE: Logical replication timeout problem

2023-01-29 Thread shiy.f...@fujitsu.com

On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com  
wrote:
> 
> I tested a mix transaction of transactional and non-transactional messages on
> the current HEAD and reproduced the timeout problem. I think this result is 
> OK.
> Because when decoding a transaction, non-transactional changes are processed
> directly and the function WalSndKeepaliveIfNecessary is called, while
> transactional changes are cached and processed after decoding. After decoding,
> only transactional changes will be processed (in the function
> ReorderBufferProcessTXN), so the timeout problem will still be reproduced.
> 
> After applying the v8 patch, the test mentioned above didn't reproduce the
> timeout problem (Attach this test script 'test_with_nontransactional.sh').
> 
> Attach the new patch.
> 

Thanks for updating the patch. Here is a comment.

In update_progress_txn_cb_wrapper(), it looks like we need to reset
changes_count to 0 after calling OutputPluginUpdateProgress(), otherwise
OutputPluginUpdateProgress() will always be called after 100 changes.

Regards,
Shi yu

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-01 Thread shiy.f...@fujitsu.com

On Mon, Jan 30, 2023 6:05 PM Takamichi Osumi (Fujitsu) 
 wrote:
> 
> On Saturday, January 28, 2023 1:28 PM I wrote:
> > Attached the updated patch v24.
> Hi,
> 
> 
> I've conducted the rebase affected by the commit(1e8b61735c)
> by renaming the GUC to logical_replication_mode accordingly,
> because it's utilized in the TAP test of this time-delayed LR feature.
> There is no other change for this version.
> 
> Kindly have a look at the attached v25.
> 

Thanks for your patch. Here are some comments.

1.
+   /*
+* The min_apply_delay parameter is ignored until all tablesync workers
+* have reached READY state. This is because if we allowed the delay
+* during the catchup phase, then once we reached the limit of tablesync
+* workers it would impose a delay for each subsequent worker. That 
would
+* cause initial table synchronization completion to take a long time.
+*/
+   if (!AllTablesyncsReady())
+   return;

I saw that the new parameter becomes effective after all tables are in ready
state, because the apply worker can't set the state to catchup during the delay.
But can we call process_syncing_tables() in the while-loop of
maybe_apply_delay()? Then the tablesync can finish without delay. If we can't do
so, it might be better to add some comments for it.

2.
+# Make sure the apply worker knows to wait for more than 500ms
+check_apply_delay_log($node_subscriber, $offset, "0.5");

I think the last parameter should be 500.
Besides, I am not sure it's a stable test to check the log. Is it possible that
there's no such log on a slow machine? I modified the code to sleep 1s at the
beginning of apply_dispatch(), then the new added test failed because the server
log cannot match.

Regards,
Shi yu

RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-02-06 Thread shiy.f...@fujitsu.com


On Thu, Feb 2, 2023 11:48 AM shveta malik  wrote:
> 
> On Wed, Feb 1, 2023 at 5:42 PM Melih Mutlu 
> wrote:
> >
> > Hi Shveta,
> >
> > shveta malik , 1 Şub 2023 Çar, 15:01 tarihinde
> şunu yazdı:
> >>
> >> On Wed, Feb 1, 2023 at 5:05 PM Melih Mutlu 
> wrote:
> >> 2) I found a crash in the previous patch (v9), but have not tested it
> >> on the latest yet. Crash happens when all the replication slots are
> >> consumed and we are trying to create new. I tweaked the settings like
> >> below so that it can be reproduced easily:
> >> max_sync_workers_per_subscription=3
> >> max_replication_slots = 2
> >> and then ran the test case shared by you. I think there is some memory
> >> corruption happening. (I did test in debug mode, have not tried in
> >> release mode). I tried to put some traces to identify the root-cause.
> >> I observed that worker_1 keeps on moving from 1 table to another table
> >> correctly, but at some point, it gets corrupted i.e. origin-name
> >> obtained for it is wrong and it tries to advance that and since that
> >> origin does not exist, it  asserts and then something else crashes.
> >> From log: (new trace lines added by me are prefixed by shveta, also
> >> tweaked code to have my comment 1 fixed to have clarity on worker-id).
> >>
> >> form below traces, it is clear that worker_1 was moving from one
> >> relation to another, always getting correct origin 'pg_16688_1', but
> >> at the end it got 'pg_16688_49' which does not exist. Second part of
> >> trace shows who updated 'pg_16688_49', it was done by worker_49
> which
> >> even did not get chance to create this origin due to max_rep_slot
> >> reached.
> >
> >
> > Thanks for investigating this error. I think it's the same error as the one 
> > Shi
> reported earlier. [1]
> > I couldn't reproduce it yet but will apply your tweaks and try again.
> > Looking into this.
> >
> > [1] https://www.postgresql.org/message-
> id/OSZPR01MB631013C833C98E826B3CFCB9FDC69%40OSZPR01MB6310.jpn
> prd01.prod.outlook.com
> >
> 
> Hi Melih,
> I think I am able to identify the root cause. It is not memory
> corruption, but the way origin-names are stored in system-catalog
> mapped to a particular relation-id before even those are created.
> 
> After adding few more logs:
> 
> [4227] LOG:  shveta- LogicalRepSyncTableStart- worker_49 constructed
> originname :pg_16684_49, relid:16540
> [4227] LOG:  shveta- LogicalRepSyncTableStart- worker_49
> updated-origin in system catalog:pg_16684_49,
> slot:pg_16684_sync_49_7195149685251088378, relid:16540
> [4227] LOG:  shveta- LogicalRepSyncTableStart- worker_49
> get-origin-id2 originid:0, originname:pg_16684_49
> [4227] ERROR:  could not create replication slot
> "pg_16684_sync_49_7195149685251088378": ERROR:  all replication slots
> are in use
> HINT:  Free one or increase max_replication_slots.
> 
> 
> [4428] LOG:  shveta- LogicalRepSyncTableStart- worker_148 constructed
> originname :pg_16684_49, relid:16540
> [4428] LOG:  could not drop replication slot
> "pg_16684_sync_49_7195149685251088378" on publisher: ERROR:
> replication slot "pg_16684_sync_49_7195149  685251088378" does not
> exist
> [4428] LOG:  shveta- LogicalRepSyncTableStart- worker_148 drop-origin
> originname:pg_16684_49
> [4428] LOG:  shveta- LogicalRepSyncTableStart- worker_148
> updated-origin:pg_16684_49,
> slot:pg_16684_sync_148_7195149685251088378, relid:16540
> 
> So from above, worker_49 came and picked up relid:16540, constructed
> origin-name and slot-name and updated in system-catalog and then it
> tried to actually create that slot and origin but since max-slot count
> was reached, it failed and exited, but did not do cleanup from system
> catalog for that relid.
> 
> Then worker_148 came and also picked up table 16540 since it was not
> completed/started by previous worker, but this time it found that
> origin and slot entry present in system-catalog against this relid, so
> it picked the same names and started processing, but since those do
> not exist, it asserted while advancing the origin.
> 
> The assert is only reproduced when an already running worker (say
> worker_1) who has 'created=true' set, gets to sync the relid for which
> a previously failed worker has tried and updated origin-name w/o
> creating it. In such a case worker_1 (with created=true) will try to
> reuse the origin and thus will try to advance it and will end up
> asserting. That is why you might not see the assertion always. The
> condition 'created' is set to true for that worker and it goes to
> reuse the origin updated by the previous worker.
> 
> So to fix this, I think either we update origin and slot entries in
> the system catalog after the creation has passed or we clean-up the
> system catalog in case of failure. What do you think?
> 

I think the first way seems better.

I reproduced the problem I reported before with latest patch (v7-0001,
v10-0002), and looked into this problem. It is caused by a similar reason. Here
is some analysis for

RE: run pgindent on a regular basis / scripted manner

2023-02-09 Thread shiy.f...@fujitsu.com

On Thu, Feb 9, 2023 6:10 AM Andrew Dunstan  wrote:
> Thanks, I have committed this. Still looking at Robert's other request.
>

Hi,

I tried the new option --commit and found that it seems to try to indent files
which are deleted in the specified commit and reports an error.

cannot open file "src/backend/access/brin/brin.c": No such file or directory

It looks we should filter such files.

Regards,
Shi Yu

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-02-13 Thread shiy.f...@fujitsu.com

On Thu, Feb 2, 2023 4:34 PM Önder Kalacı  wrote:
>
>>
>> and if there's more
>> than one candidate index pick any one. Additionally, we can allow
>> disabling the use of an index scan for this particular case. If we are
>> too worried about API change for allowing users to specify the index
>> then we can do that later or as a separate patch.
>>
>
> On v23, I dropped the planner support for picking the index. Instead, it 
> simply
> iterates over the indexes and picks the first one that is suitable.
>
> I'm currently thinking on how to enable users to override this decision.
> One option I'm leaning towards is to add a syntax like the following:
>
> ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
>
> Though, that should probably be a seperate patch. I'm going to work
> on that, but still wanted to share v23 given picking the index sounds
> complementary, not strictly required at this point.
>

Thanks for your patch. Here are some comments.

1.
I noticed that get_usable_indexoid() is called in apply_handle_update_internal()
and apply_handle_delete_internal() to get the usable index. Could usableIndexOid
be a parameter of these two functions? Because we have got the
LogicalRepRelMapEntry when calling them and if we do so, we can get
usableIndexOid without get_usable_indexoid(). Otherwise for partitioned tables,
logicalrep_partition_open() is called in get_usable_indexoid() and searching
the entry via hash_search() will increase cost.

2.
+* This attribute is an expression, and
+* SuitableIndexPathsForRepIdentFull() was called 
earlier when the
+* index for subscriber was selected. There, the indexes
+* comprising *only* expressions have already been 
eliminated.

The comment looks need to be updated:
SuitableIndexPathsForRepIdentFull
->
FindUsableIndexForReplicaIdentityFull

3.

/* Build scankey for every attribute in the index. */
-   for (attoff = 0; attoff < 
IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
+   for (index_attoff = 0; index_attoff < 
IndexRelationGetNumberOfKeyAttributes(idxrel);
+index_attoff++)
{

Should the comment be changed? Because we skip the attributes that are 
expressions.

4.
+   Assert(RelationGetReplicaIndex(rel) != 
RelationGetRelid(idxrel) &&
+  RelationGetPrimaryKeyIndex(rel) != 
RelationGetRelid(idxrel));

Maybe we can call the new function idxIsRelationIdentityOrPK()?

Regards,
Shi Yu

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-02-14 Thread shiy.f...@fujitsu.com

On Mon, Feb 13, 2023 7:01 PM shiy.f...@fujitsu.com  
wrote:
> 
> On Thu, Feb 2, 2023 4:34 PM Önder Kalacı  wrote:
> >
> >>
> >> and if there's more
> >> than one candidate index pick any one. Additionally, we can allow
> >> disabling the use of an index scan for this particular case. If we are
> >> too worried about API change for allowing users to specify the index
> >> then we can do that later or as a separate patch.
> >>
> >
> > On v23, I dropped the planner support for picking the index. Instead, it 
> > simply
> > iterates over the indexes and picks the first one that is suitable.
> >
> > I'm currently thinking on how to enable users to override this decision.
> > One option I'm leaning towards is to add a syntax like the following:
> >
> > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
> >
> > Though, that should probably be a seperate patch. I'm going to work
> > on that, but still wanted to share v23 given picking the index sounds
> > complementary, not strictly required at this point.
> >
> 
> Thanks for your patch. Here are some comments.
> 

Hi,

Here are some comments on the test cases.

1. in test case "SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX"
+# now, ingest more data and create index on column y which has higher 
cardinality
+# so that the future commands use the index on column y
+$node_publisher->safe_psql('postgres',
+   "INSERT INTO test_replica_id_full SELECT 50, i FROM 
generate_series(0,3100)i;");
+$node_subscriber->safe_psql('postgres',
+   "CREATE INDEX test_replica_id_full_idy ON test_replica_id_full(y)");

We don't pick the cheapest index in the current patch, so should we modify this
part of the test?

BTW, the following comment in FindLogicalRepUsableIndex() need to be changed,
too.

+* We are looking for one more opportunity for using an index. 
If
+* there are any indexes defined on the local relation, try to 
pick
+* the cheapest index.


2. Is there any reasons why we need the test case "SUBSCRIPTION USES INDEX WITH
DROPPED COLUMNS"? Has there been a problem related to dropped columns before?

3. in test case "SUBSCRIPTION USES INDEX ON PARTITIONED TABLES"
+# deletes rows and moves between partitions
+$node_publisher->safe_psql('postgres',
+   "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+   "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");

"moves between partitions" in the comment seems wrong.

4. in test case "SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS"
+# update 2 rows
+$node_publisher->safe_psql('postgres',
+   "UPDATE people SET firstname = 'Nan' WHERE firstname = 
'first_name_1';");
+$node_publisher->safe_psql('postgres',
+   "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_2' 
AND lastname = 'last_name_2';");
+
+# make sure the index is not used on the subscriber
+$result = $node_subscriber->safe_psql('postgres',
+   "select idx_scan from pg_stat_all_indexes where indexrelname = 
'people_names'");
+is($result, qq(0), 'ensure subscriber tap_sub_rep_full updates two rows via 
seq. scan with index on expressions');
+

I think it would be better to call wait_for_catchup() before the check because
we want to check the index is NOT used. Otherwise the check may pass because the
rows have not yet been updated on subscriber.

5. in test case "SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN"
+# show that index is not used even when enable_indexscan=false
+$result = $node_subscriber->safe_psql('postgres',
+   "select idx_scan from pg_stat_all_indexes where indexrelname = 
'test_replica_id_full_idx'");
+is($result, qq(0), 'ensure subscriber has not used index with 
enable_indexscan=false');

Should we remove the word "even" in the comment?

6. 
In each test case we re-create publications, subscriptions, and tables. Could we
create only one publication and one subscription at the beginning, and use them
in all test cases? I think this can save some time running the test file.

Regards,
Shi Yu

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-02-14 Thread shiy.f...@fujitsu.com

On Sat, Feb 4, 2023 7:24 PM Amit Kapila  wrote:
> 
> On Thu, Feb 2, 2023 at 2:03 PM Önder Kalacı  wrote:
> >
> >>
> >> and if there's more
> >> than one candidate index pick any one. Additionally, we can allow
> >> disabling the use of an index scan for this particular case. If we are
> >> too worried about API change for allowing users to specify the index
> >> then we can do that later or as a separate patch.
> >>
> >
> > On v23, I dropped the planner support for picking the index. Instead, it 
> > simply
> > iterates over the indexes and picks the first one that is suitable.
> >
> > I'm currently thinking on how to enable users to override this decision.
> > One option I'm leaning towards is to add a syntax like the following:
> >
> > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
> >
> > Though, that should probably be a seperate patch. I'm going to work
> > on that, but still wanted to share v23 given picking the index sounds
> > complementary, not strictly required at this point.
> >
> 
> I agree that it could be a separate patch. However, do you think we
> need some way to disable picking the index scan? This is to avoid
> cases where sequence scan could be better or do we think there won't
> exist such a case?
> 

I think such a case exists. I tried the following cases based on v23 patch.

# Step 1.
Create publication, subscription and tables.
-- on publisher
create table tbl (a int);
alter table tbl replica identity full;
create publication pub for table tbl;

-- on subscriber
create table tbl (a int);
create index idx_a on tbl(a);
create subscription sub connection 'dbname=postgres port=5432' publication pub;

# Step 2.
Setup synchronous replication.

# Step 3.
Execute SQL query on publisher.

-- case 1 (All values are duplicated)
truncate tbl;
insert into tbl select 1 from generate_series(0,1)i;
update tbl set a=a+1;

-- case 2
truncate tbl;
insert into tbl select i%3 from generate_series(0,1)i;
update tbl set a=a+1;

-- case 3
truncate tbl;
insert into tbl select i%5 from generate_series(0,1)i;
update tbl set a=a+1;

-- case 4
truncate tbl;
insert into tbl select i%10 from generate_series(0,1)i;
update tbl set a=a+1;

-- case 5
truncate tbl;
insert into tbl select i%100 from generate_series(0,1)i;
update tbl set a=a+1;

-- case 6
truncate tbl;
insert into tbl select i%1000 from generate_series(0,1)i;
update tbl set a=a+1;

-- case 7 (No duplicate value)
truncate tbl;
insert into tbl select i from generate_series(0,1)i;
update tbl set a=a+1;

# Result
The time executing update (the average of 3 runs is taken, the unit is
milliseconds):

++-+-+
|| patched |  master |
++-+-+
| case 1 | 3933.68 | 1298.32 |
| case 2 | 1803.46 | 1294.42 |
| case 3 | 1380.82 | 1299.90 |
| case 4 | 1042.60 | 1300.20 |
| case 5 |  691.69 | 1297.51 |
| case 6 |  578.50 | 1300.69 |
| case 7 |  566.45 | 1302.17 |
++-+-+

In case 1~3, there's an overhead after applying the patch. In other cases, the
patch improved the performance. As more duplicate values, the greater the
overhead after applying the patch.

Regards,
Shi Yu

RE: Add LZ4 compression in pg_dump

2023-02-15 Thread shiy.f...@fujitsu.com

On Fri, Jan 27, 2023 2:04 AM gkokola...@pm.me  wrote:
> 
> --- Original Message ---
> On Thursday, January 26th, 2023 at 12:53 PM, Michael Paquier
>  wrote:
> 
> 
> >
> >
> > On Thu, Jan 26, 2023 at 11:24:47AM +, gkokola...@pm.me wrote:
> >
> > > I gave this a little bit of thought. I think that ReadHead should not
> > > emit a warning, or at least not this warning as it is slightly misleading.
> > > It implies that it will automatically turn off data restoration, which is
> > > false. Further ahead, the code will fail with a conflicting error message
> > > stating that the compression is not available.
> > >
> > > Instead, it would be cleaner both for the user and the maintainer to
> > > move the check in RestoreArchive and make it the sole responsible for
> > > this logic.
> >
> >
> > - pg_fatal("cannot restore from compressed archive (compression not
> supported in this installation)");
> > + pg_fatal("cannot restore data from compressed archive (compression not
> supported in this installation)");
> > Hmm. I don't mind changing this part as you suggest.
> >
> > -#ifndef HAVE_LIBZ
> > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> >
> > - pg_fatal("archive is compressed, but this installation does not support
> compression");
> > -#endif
> > However I think that we'd better keep the warning, as it can offer a
> > hint when using pg_restore -l not built with compression support if
> > looking at a dump that has been compressed.
> 
> Fair enough. Please find v27 attached.
> 

Hi,

I am interested in this feature and tried the patch. While reading the comments,
I noticed some minor things that could possibly be improved (in v27-0003 patch).

1.
+   /*
+* Open a file for writing.
+*
+* 'mode' can be one of ''w', 'wb', 'a', and 'ab'. Requrires an already
+* initialized CompressFileHandle.
+*/
+   int (*open_write_func) (const char *path, const 
char *mode,
+   
CompressFileHandle *CFH);

There is a redundant single quote in front of 'w'.

2.
/*
 * Callback function for WriteDataToArchive. Writes one block of (compressed)
 * data to the archive.
 */
/*
 * Callback function for ReadDataFromArchive. To keep things simple, we
 * always read one compressed block at a time.
 */

Should the function names in the comments be updated?

WriteDataToArchive
->
writeData

ReadDataFromArchive
->
readData

3.
+   Assert(strcmp(mode, "r") == 0 || strcmp(mode, "rb") == 0);

Could we use PG_BINARY_R instead of "r" and "rb" here?

Regards,
Shi Yu

RE: run pgindent on a regular basis / scripted manner

2023-02-16 Thread shiy.f...@fujitsu.com

On Thu, Feb 9, 2023 6:10 AM Andrew Dunstan  wrote:
> Thanks, I have committed this. Still looking at Robert's other request.
>

Hi,

Commit #068a243b7 supported directories to be non-option arguments of pgindent.
But the help text doesn't mention that. Should we update it? Attach a small
patch which did that.

Regards,
Shi Yu

v1-0001-Update-help-text-of-pgindent.patch
Description: v1-0001-Update-help-text-of-pgindent.patch

Missing default value of createrole_self_grant in document

2023-02-16 Thread shiy.f...@fujitsu.com

Hi hackers,

I noticed that the document of GUC createrole_self_grant doesn't mention its
default value. The attached patch adds that.

Regards,
Shi Yu


v1-0001-Add-default-value-of-createrole_self_grant-in-doc.patch
Description:  v1-0001-Add-default-value-of-createrole_self_grant-in-doc.patch

"out of relcache_callback_list slots" after multiple calls to pg_logical_slot_get_binary_changes

2023-02-18 Thread shiy.f...@fujitsu.com

Hi hackers,

After multiple calls to the function pg_logical_slot_get_binary_changes() in
single client backend (the output plugin of the slot is pgoutput), I got the
following error:

client backend FATAL:  out of relcache_callback_list slots
client backend CONTEXT:  slot "testslot", output plugin "pgoutput", in the 
startup callback
client backend STATEMENT:  SELECT data FROM 
pg_logical_slot_get_binary_changes('testslot', NULL, NULL, 'proto_version', 
'3', 'streaming', 'off', 'publication_names', 'pub');

I tried to look into it and found that it's because every time the function
(pg_logical_slot_get_binary_changes) is called, relcache callback and syscache
callbacks are registered when initializing pgoutput (see pgoutput_startup() and
init_rel_sync_cache()), but they are not unregistered when it shutdowns. So,
after multiple calls to the function, MAX_RELCACHE_CALLBACKS is exceeded. This
is mentioned in the following comment.

/*
 * We can get here if the plugin was used in SQL interface as the
 * RelSchemaSyncCache is destroyed when the decoding finishes, but there
 * is no way to unregister the relcache invalidation callback.
 */
if (RelationSyncCache == NULL)
return;

Could we fix it by adding two new function to unregister relcache callback and
syscache callback? I tried to do so in the attached patch.

Regards,
Shi Yu


v1-0001-Unregister-syscache-callback-and-relcache-callbac.patch
Description:  v1-0001-Unregister-syscache-callback-and-relcache-callbac.patch

RE: Allow logical replication to copy tables in binary format

2023-02-20 Thread shiy.f...@fujitsu.com

On Thu, Feb 16, 2023 8:48 PM Amit Kapila  wrote:
> 
> On Mon, Jan 30, 2023 at 4:19 PM Melih Mutlu 
> wrote:
> >
> > Logical replication between different types like int and smallint is 
> > already not
> working properly on HEAD too.
> > Yes, the scenario you shared looks like working. But you didn't create the
> subscription with binary=true. The patch did not change subscription with
> binary=false case. I believe what you should experiment is binary=true case
> which already fails in the apply phase on HEAD.
> >
> > Well, with this patch, it will begin to fail in the table copy phase. But I 
> > don't
> think this is a problem because logical replication in binary format is 
> already
> broken for replications between different data types.
> >
> 
> So, doesn't this mean that there is no separate failure mode during
> the initial copy? I am clarifying this to see if the patch really
> needs a separate copy_format option for initial sync?
> 

In the case that the data type doesn't have binary output function, for apply
phase, the column will be sent in text format (see logicalrep_write_tuple()) and
it works fine. But with copy_format = binary, the walsender exits with an
error.

For example:
-- create table on publisher and subscriber
CREATE TYPE myvarchar;
CREATE FUNCTION myvarcharin(cstring, oid, integer) RETURNS myvarchar
LANGUAGE internal IMMUTABLE PARALLEL SAFE STRICT AS 'varcharin';
CREATE FUNCTION myvarcharout(myvarchar) RETURNS cstring
LANGUAGE internal IMMUTABLE PARALLEL SAFE STRICT AS 'varcharout';
CREATE TYPE myvarchar (
input = myvarcharin,
output = myvarcharout,
alignment = integer,
storage = main
);
CREATE TABLE tbl1 (a myvarchar);

-- create publication and insert some data on publisher
create publication pub for table tbl1;
INSERT INTO tbl1 values ('a');

-- create subscription on subscriber
create subscription sub connection 'dbname=postgres port=5432' publication pub 
with(binary, copy_format = binary);

Then I got the following error in the publisher log.

walsender ERROR:  no binary output function available for type public.myvarchar
walsender STATEMENT:  COPY public.tbl1 (a) TO STDOUT  WITH (FORMAT binary)

Regards,
Shi Yu

RE: "out of relcache_callback_list slots" after multiple calls to pg_logical_slot_get_binary_changes

2023-02-21 Thread shiy.f...@fujitsu.com

On Mon, Feb 20, 2023 11:31 PM Tom Lane  wrote:
> 
> Kyotaro Horiguchi  writes:
> > I'm pretty sure that everytime an output plugin is initialized on a
> > process, it installs the same set of syscache/relcache callbacks each
> > time.  Do you think we could simply stop duplicate registration of
> > those callbacks by using a static boolean?  It would be far simpler.
> 
> Yeah, I think that's the way it's done elsewhere.  Removing and
> re-registering your callback seems expensive, and it also destroys
> any reasoning that anyone might have made about the order in which
> different callbacks will get called.  (Admittedly, that's probably not
> important for invalidation callbacks, but it does matter for e.g.
> process exit callbacks.)
> 

Thanks for your reply. I agree that's expensive. Attach a new patch which adds a
static boolean to avoid duplicate registration.

Regards,
Shi Yu


v2-0001-Avoid-duplicate-registration-of-callbacks-in-pgou.patch
Description:  v2-0001-Avoid-duplicate-registration-of-callbacks-in-pgou.patch

Fix some newly modified tab-complete changes

2022-09-27 Thread shiy.f...@fujitsu.com

Hi hackers,

I saw a problem when using tab-complete for "GRANT", "TABLES IN SCHEMA" should
be "ALL TABLES IN SCHEMA" in the following case.

postgres=# grant all on
ALL FUNCTIONS IN SCHEMA   DATABASE  FUNCTION  
PARAMETER SCHEMATABLESPACE
ALL PROCEDURES IN SCHEMA  DOMAINinformation_schema.   
PROCEDURE SEQUENCE  tbl
ALL ROUTINES IN SCHEMAFOREIGN DATA WRAPPER  LANGUAGE  
public.   TABLE TYPE
ALL SEQUENCES IN SCHEMA   FOREIGN SERVERLARGE OBJECT  
ROUTINE   TABLES IN SCHEMA

I found that it is related to the recent commit 790bf615dd, and maybe it's
better to fix it. I also noticed that some comments should be modified according
to this new syntax. Attach a patch to fix them.

Regards,
Shi yu


0001-Fix-some-newly-modified-tab-complete-changes-and-com.patch
Description:  0001-Fix-some-newly-modified-tab-complete-changes-and-com.patch

RE: Fix some newly modified tab-complete changes

2022-09-28 Thread shiy.f...@fujitsu.com

On Wed, Sep 28, 2022 1:49 PM Kyotaro Horiguchi  wrote:
> 
> At Wed, 28 Sep 2022 14:14:01 +1000, Peter Smith
>  wrote in
> > On Tue, Sep 27, 2022 at 8:28 PM shiy.f...@fujitsu.com
> >  wrote:
> > >
> > > Hi hackers,
> > >
> > > I saw a problem when using tab-complete for "GRANT", "TABLES IN
> SCHEMA" should
> > > be "ALL TABLES IN SCHEMA" in the following case.
> > >
> > > postgres=# grant all on
> > > ALL FUNCTIONS IN SCHEMA   DATABASE  FUNCTION
> PARAMETER SCHEMATABLESPACE
> > > ALL PROCEDURES IN SCHEMA  DOMAINinformation_schema.
> PROCEDURE SEQUENCE  tbl
> > > ALL ROUTINES IN SCHEMAFOREIGN DATA WRAPPER  LANGUAGE
> public.   TABLE TYPE
> > > ALL SEQUENCES IN SCHEMA   FOREIGN SERVERLARGE OBJECT
> ROUTINE   TABLES IN SCHEMA
> > >
> > > I found that it is related to the recent commit 790bf615dd, and maybe it's
> > > better to fix it. I also noticed that some comments should be modified
> according
> > > to this new syntax. Attach a patch to fix them.
> > >
> >
> > Thanks for the patch! Below are my review comments.
> >
> > The patch looks good to me but I did find some other tab-completion
> > anomalies. IIUC these are unrelated to your work, but since I found
> > them while testing your patch I am reporting them here.
> 
> Looks fine to me, too.
> 

Thanks for reviewing it.

> > Perhaps you want to fix them in the same patch, or just raise them
> > again separately?
> >
> > ==
> >
> > 1. tab complete for CREATE PUBLICATION
> >
> > I donʼt think this is any new bug, but I found that it is possible to do 
> > this...
> >
> > test_pub=# create publication p for ALL TABLES IN SCHEMA 
> > information_schema  pg_catalog  pg_toastpublic
> >
> > or, even this...
> >
> > test_pub=# create publication p for XXX TABLES IN SCHEMA 
> > information_schema  pg_catalog  pg_toastpublic
> 
> Completion is responding to "IN SCHEMA" in these cases.  However, I
> don't reach this state only by completion becuase it doesn't suggest
> "IN SCHEMA" after "TABLES" nor "ALL TABLES".  I don't see a reason to
> change that behavior unless that fix doesn't cause any additional
> complexity.
> 

+1

> > ==
> >
> > 2. tab complete for GRANT
> >
> > test_pub=# grant 
> > ALLEXECUTE
> > pg_execute_server_program  pg_read_server_files   postgres
> >   TRIGGER
> > ALTER SYSTEM   GRANT  pg_monitor
> >   pg_signal_backend  REFERENCES
> > TRUNCATE
> > CONNECTINSERT pg_read_all_data
> >   pg_stat_scan_tablesSELECT UPDATE
> > CREATE pg_checkpoint
> > pg_read_all_settings   pg_write_all_data  SET
> >   USAGE
> > DELETE pg_database_owner
> > pg_read_all_stats  pg_write_server_files  TEMPORARY
> >
> > 2a.
> > grant "GRANT" ??
> 
> Yeah, for the mement I thought that might a kind of admin option but
> there's no such a privilege. REVOKE gets the same suggestion.
> 

Maybe that's for "REVOKE GRANT OPTION FOR". But it is used by both GRANT and
REVOKE. I think it's a separate problem, I have tried to fix it in the attached
0002 patch.

> > 2b.
> > grant "TEMPORARY" but not "TEMP" ??
> 
> TEMP is an alternative spelling so that's fine.
> 

Agreed.

> 
> I found the following suggestion.
> 
> CREATE PUBLICATION p FOR TABLES  -> ["IN SCHEMA", "WITH ("]
> 
> I believe "WITH (" doesn't come there.
> 

Fixed.

Attach the updated patch.

Regards,
Shi yu


v2-0001-Fix-some-newly-modified-tab-complete-changes-and-.patch
Description:  v2-0001-Fix-some-newly-modified-tab-complete-changes-and-.patch


v2-0002-Fix-tab-completion-for-GRANT-REVOKE.patch
Description: v2-0002-Fix-tab-completion-for-GRANT-REVOKE.patch

RE: Fix some newly modified tab-complete changes

2022-10-09 Thread shiy.f...@fujitsu.com

On Tue, Oct 4, 2022 4:17 PM Peter Smith  wrote:
> 
> On Thu, Sep 29, 2022 at 12:50 PM shiy.f...@fujitsu.com
>  wrote:
> >
> > On Wed, Sep 28, 2022 1:49 PM Kyotaro Horiguchi
>  wrote:
> > >
> > > At Wed, 28 Sep 2022 14:14:01 +1000, Peter Smith
> > >  wrote in
> ...
> > > >
> > > > 2. tab complete for GRANT
> > > >
> > > > test_pub=# grant 
> > > > ALLEXECUTE
> > > > pg_execute_server_program  pg_read_server_files   postgres
> > > >   TRIGGER
> > > > ALTER SYSTEM   GRANT  pg_monitor
> > > >   pg_signal_backend  REFERENCES
> > > > TRUNCATE
> > > > CONNECTINSERT pg_read_all_data
> > > >   pg_stat_scan_tablesSELECT UPDATE
> > > > CREATE pg_checkpoint
> > > > pg_read_all_settings   pg_write_all_data  SET
> > > >   USAGE
> > > > DELETE pg_database_owner
> > > > pg_read_all_stats  pg_write_server_files  TEMPORARY
> > > >
> > > > 2a.
> > > > grant "GRANT" ??
> > >
> > > Yeah, for the mement I thought that might a kind of admin option but
> > > there's no such a privilege. REVOKE gets the same suggestion.
> > >
> >
> > Maybe that's for "REVOKE GRANT OPTION FOR". But it is used by both
> GRANT and
> > REVOKE. I think it's a separate problem, I have tried to fix it in the 
> > attached
> > 0002 patch.
> >
> 
> I checked your v2-0002 patch and AFAICT it does fix properly the
> previously reported GRANT/REVOKE problem.
> 

Thanks for reviewing and testing it.

> ~
> 
> But, while testing I noticed another different quirk
> 
> It seems that neither the GRANT nor the REVOKE auto-complete
> recognises the optional PRIVILEGE keyword
> 
> e.g. GRANT ALL  --> ON  (but not PRIVILEGE)
> e.g. GRANT ALL PRIV --> ???
> 
> e.g. REVOKE ALL  --> ON (but not PRIVILEGE)..
> e.g. REVOKE ALL PRIV --> ???
> 

I tried to add tab-completion for it. Pleases see attached updated patch.

Regards,
Shi yu


v3-0001-Fix-tab-completion-for-GRANT-REVOKE.patch
Description: v3-0001-Fix-tab-completion-for-GRANT-REVOKE.patch

RE: Fix some newly modified tab-complete changes

2022-10-10 Thread shiy.f...@fujitsu.com

On Mon, Oct 10, 2022 2:12 PM shiy.f...@fujitsu.com  
wrote:
> 
> On Tue, Oct 4, 2022 4:17 PM Peter Smith  wrote:
> >
> > But, while testing I noticed another different quirk
> >
> > It seems that neither the GRANT nor the REVOKE auto-complete
> > recognises the optional PRIVILEGE keyword
> >
> > e.g. GRANT ALL  --> ON  (but not PRIVILEGE)
> > e.g. GRANT ALL PRIV --> ???
> >
> > e.g. REVOKE ALL  --> ON (but not PRIVILEGE)..
> > e.g. REVOKE ALL PRIV --> ???
> >
> 
> I tried to add tab-completion for it. Pleases see attached updated patch.
> 

Sorry for attaching a wrong patch. Here is the right one.

Regards,
Shi yu


v4-0001-Fix-tab-completion-for-GRANT-REVOKE.patch
Description: v4-0001-Fix-tab-completion-for-GRANT-REVOKE.patch

RE: [RFC] building postgres with meson - v13

2022-10-13 Thread shiy.f...@fujitsu.com

Hi,

I noticed that `pg_config --configure` didn't show the options given when
building with meson. 

For example, 
meson setup build -Dcache=gcc.cache -Ddtrace=enabled -Dicu=enabled 
-Dcassert=true -Dprefix=/home/postgres/install_meson/
meson compile -C build 
meson install -C build

$ pg_config --configure

The options I specified (like dtrace) are not shown. I found they actually work
in compilation.
When specifying `-Ddtrace=enabled`, there is a log like this. And with
`-Ddtrace=disabled`, no such log.

[120/1834] /usr/bin/dtrace -C -h -s 
../src/include/utils/../../backend/utils/probes.d -o 
src/include/utils/probes.h.tmp

Maybe it would be better if pg_config can output this information, to be
consistent with the output when building with `./configure` and `make`.

The output when building with `./configure` and `make`:
$ pg_config --configure
 '--prefix=/home/postgres/install/' '--cache' 'gcc.cache' '--enable-dtrace' 
'--with-icu' '--enable-cassert'


Regards,
Shi yu

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-13 Thread shiy.f...@fujitsu.com

On Wed, Aug 24, 2022 12:25 AM Önder Kalacı  wrote:
> Hi,
> 
> Thanks for the review!
> 

Thanks for your reply.

> 
> >
> > 1.
> > In FilterOutNotSuitablePathsForReplIdentFull(), is
> > "nonPartialIndexPathList" a
> > good name for the list? Indexes on only expressions are also be filtered.
> >
> > +static List *
> > +FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)
> > +{
> > +   ListCell   *lc;
> > +   List *nonPartialIndexPathList = NIL;
> >
> >
> Yes, true. We only started filtering the non-partial ones first. Now
> changed to *suitableIndexList*, does that look right?
> 

That looks ok to me.

> 
> 
> > 3.
> > It looks we should change the comment for FindReplTupleInLocalRel() in this
> > patch.
> >
> > /*
> >  * Try to find a tuple received from the publication side (in
> > 'remoteslot') in
> >  * the corresponding local relation using either replica identity index,
> >  * primary key or if needed, sequential scan.
> >  *
> >  * Local tuple, if found, is returned in '*localslot'.
> >  */
> > static bool
> > FindReplTupleInLocalRel(EState *estate, Relation localrel,
> >
> >
> I made a small change, just adding "index". Do you expect a larger change?
> 
> 

I think that's sufficient.

> 
> 
> > 5.
> > +   if (!AttributeNumberIsValid(mainattno))
> > +   {
> > +   /*
> > +* There are two cases to consider. First, if the
> > index is a primary or
> > +* unique key, we cannot have any indexes with
> > expressions. So, at this
> > +* point we are sure that the index we deal is not
> > these.
> > +*/
> > +   Assert(RelationGetReplicaIndex(rel) !=
> > RelationGetRelid(idxrel) &&
> > +  RelationGetPrimaryKeyIndex(rel) !=
> > RelationGetRelid(idxrel));
> > +
> > +   /*
> > +* For a non-primary/unique index with an
> > expression, we are sure that
> > +* the expression cannot be used for replication
> > index search. The
> > +* reason is that we create relevant index paths
> > by providing column
> > +* equalities. And, the planner does not pick
> > expression indexes via
> > +* column equality restrictions in the query.
> > +*/
> > +   continue;
> > +   }
> >
> > Is it possible that it is a usable index with an expression? I think
> > indexes
> > with an expression has been filtered in
> > FilterOutNotSuitablePathsForReplIdentFull(). If it can't be a usable index
> > with
> > an expression, maybe we shouldn't use "continue" here.
> >
> 
> 
> 
> Ok, I think there are some confusing comments in the code, which I updated.
> Also, added one more explicit Assert to make the code a little more
> readable.
> 
> We can support indexes involving expressions but not indexes that are only
> consisting of expressions. FilterOutNotSuitablePathsForReplIdentFull()
> filters out the latter, see IndexOnlyOnExpression().
> 
> So, for example, if we have an index as below, we are skipping the
> expression while building the index scan keys:
> 
> CREATE INDEX people_names ON people (firstname, lastname, (id || '_' ||
> sub_id));
> 
> We can consider removing `continue`, but that'd mean we should also adjust
> the following code-block to handle indexprs. To me, that seems like an edge
> case to implement at this point, given such an index is probably not
> common. Do you think should I try to use the indexprs as well while
> building the scan key?
> 
> I'm mostly trying to keep the complexity small. If you suggest this
> limitation should be lifted, I can give it a shot. I think the limitation I
> leave here is with a single sentence: *The index on the subscriber can only
> use simple column references.  *
> 

Thanks for your explanation. I get it and think it's OK.

> > 6.
> > In the following case, I got a result which is different from HEAD, could
> > you
> > please look into it?
> >
> > -- publisher
> > CREATE TABLE test_replica_id_full (x int);
> > ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
> > CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
> >
> > -- subscriber
> > CREATE TABLE test_replica_id_full (x int, y int);
> > CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x,y);
> > CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres
> > port=5432' PUBLICATION tap_pub_rep_full;
> >
> > -- publisher
> > INSERT INTO test_replica_id_full VALUES (1);
> > UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;
> >
> > The data in subscriber:
> > on HEAD:
> > postgres=# select * from test_replica_id_full ;
> >  x | y
> > ---+---
> >  2 |
> > (1 row)
> >
> > After applying the patch:
> > postgres=# select * from test_replica_id_full ;
> >  x | y
> > ---+---
> >  1 |
> > (1 row)
> >

RE: [RFC] building postgres with meson - v13

2022-10-13 Thread shiy.f...@fujitsu.com

On Fri, Oct 14, 2022 12:40 AM Andres Freund  wrote:
> 
> Hi,
> 
> On 2022-10-13 09:24:51 +, shiy.f...@fujitsu.com wrote:
> > I noticed that `pg_config --configure` didn't show the options given when
> > building with meson.
> 
> Yes, that was noted somewhere on this thread.
> 
> 
> > Maybe it would be better if pg_config can output this information, to be
> > consistent with the output when building with `./configure` and `make`.
> >
> > The output when building with `./configure` and `make`:
> > $ pg_config --configure
> >  '--prefix=/home/postgres/install/' '--cache' 'gcc.cache' '--enable-dtrace' 
> > '--
> with-icu' '--enable-cassert'
> 
> It'd be a fair amount of work, both initially and to maintain it, to generate
> something compatible. I can see some benefit in showing some feature
> influencing output in --configure, but compatible output doesn't seem worth it
> to me.
> 

I agree that there are some benefits to showing that, which helps to confirm the
build options. Although that can be confirmed from the compile log, but the log
may not be available all the time.

And it's ok for me that the output is not exactly the same as before.

Regards,
Shi yu

RE: create subscription - improved warning message

2022-10-17 Thread shiy.f...@fujitsu.com

On Mon, Oct 17, 2022 9:47 AM Peter Smith  wrote:
> 
> On Sun, Oct 16, 2022 at 12:14 AM Amit Kapila 
> wrote:
> >
> > On Fri, Oct 14, 2022 at 8:22 AM Peter Smith 
> wrote:
> > >
> > > On Thu, Oct 13, 2022 at 9:07 AM Peter Smith 
> wrote:
> > > >
> ...
> > > PSA a patch for adding examples of how to activate a subscription that
> > > was originally created in a disconnected mode.
> > >
> > > The new docs are added as part of the "Logical Replication -
> > > Subscription" section 31.2.
> > >
> > > The CREATE SUBSCRIPTION reference page was also updated to include
> > > links to the new docs.
> > >
> >
> > You have used 'pgoutput' as plugin name in the examples. Shall we
> > mention in some way that this is a default plugin used for built-in
> > logical replication and it is required to use only this one to enable
> > logical replication?
> >
> 
> Updated as sugggested.
> 
> PSA v5.
> 

Thanks for your patch. Here are some comments.

In Example 2, the returned slot_name should be "myslot".

+test_pub=# SELECT * FROM pg_create_logical_replication_slot('myslot', 
'pgoutput');
+ slot_name |lsn
+---+---
+ sub1  | 0/19059A0
+(1 row)

Besides, I am thinking is it possible to slightly simplify the example. For
example, merge example 1 and 2, keep the steps of example 2 and in the step of
creating slot, mention what should we do if slot_name is not specified when
creating subscription.


Regards,
Shi yu

RE: create subscription - improved warning message

2022-10-18 Thread shiy.f...@fujitsu.com

On Tue, Oct 18, 2022 5:44 PM Peter Smith  wrote:
> 
> On Mon, Oct 17, 2022 at 7:11 PM shiy.f...@fujitsu.com
>  wrote:
> >
> ...
> >
> > Thanks for your patch. Here are some comments.
> >
> > In Example 2, the returned slot_name should be "myslot".
> >
> > +test_pub=# SELECT * FROM pg_create_logical_replication_slot('myslot',
> 'pgoutput');
> > + slot_name |lsn
> > +---+---
> > + sub1  | 0/19059A0
> > +(1 row)
> >
> 
> Oops. Sorry for my cut/paste error. Fixed in patch v6.
> 
> 
> > Besides, I am thinking is it possible to slightly simplify the example. For
> > example, merge example 1 and 2, keep the steps of example 2 and in the
> step of
> > creating slot, mention what should we do if slot_name is not specified when
> > creating subscription.
> >
> 
> Sure, it might be a bit shorter to combine the examples, but I thought
> it’s just simpler not to do it that way because the combined example
> will then need additional bullets/notes to say – “if there is no
> slot_name do this…” and “if there is a slot_name do that…”. It’s
> really only the activation part which is identical for both.
> 

Thanks for updating the patch.

+test_sub=# CREATE SUBSCRIPTION sub1
+test_sub-# CONNECTION 'host=localhost dbname=test_pub'
+test_sub-# PUBLICATION pub1
+test_sub-# WITH (slot_name=NONE, enabled=false, create_slot=false);
+WARNING:  subscription was created, but is not connected
+HINT:  To initiate replication, you must manually create the replication slot, 
enable the subscription, and refresh the subscription.
+CREATE SUBSCRIPTION

In example 3, there is actually no such warning message when creating
subscription because "connect=false" is not specified.

I have tested the examples in the patch and didn't see any problem other than
the one above.

Regards,
Shi yu

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-19 Thread shiy.f...@fujitsu.com

On Wed, Oct 19, 2022 12:05 AM Önder Kalacı   wrote:
> 
> Attached v19.
> 

Thanks for updating the patch. Here are some comments on v19.

1.
In execReplication.c:

+   TypeCacheEntry **eq = NULL; /* only used when the index is not unique */

Maybe the comment here should be changed. Now it is used when the index is not
primary key or replica identity index.

2.
+# wait until the index is created
+$node_subscriber->poll_query_until(
+   'postgres', q{select count(*)=1 from pg_stat_all_indexes where 
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full 
updates one row via index";

The message doesn't seem right,  should it be changed to "Timed out while
waiting for creating index test_replica_id_full_idx"?

3.
+# now, ingest more data and create index on column y which has higher 
cardinality
+# then create an index on column y so that future commands uses the index on 
column
+$node_publisher->safe_psql('postgres',
+   "INSERT INTO test_replica_id_full SELECT 50, i FROM 
generate_series(0,3100)i;");

The comment say "create (an) index on column y" twice, maybe it can be changed
to:

now, ingest more data and create index on column y which has higher cardinality,
so that future commands will use the index on column y

4.
+# deletes 200 rows
+$node_publisher->safe_psql('postgres',
+   "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+   'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where 
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full 
updates 200 rows via index";

It would be better to call wait_for_catchup() after DELETE. (And some other
places in this file.)
Besides, the "updates" in the message should be "deletes".

5.
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+   'postgres', q{select sum(idx_scan)=10 from pg_stat_all_indexes where 
indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full 
updates partitioned table";

Maybe we should say "updates partitioned table with index" in this message.

Regards,
Shi yu

RE: Perform streaming logical transactions by background workers and parallel apply

2022-10-26 Thread shiy.f...@fujitsu.com

On Wed, Oct 26, 2022 7:19 PM Amit Kapila  wrote:
> 
> On Tue, Oct 25, 2022 at 8:38 AM Masahiko Sawada
>  wrote:
> >
> > On Fri, Oct 21, 2022 at 6:32 PM houzj.f...@fujitsu.com
> >  wrote:
> >
> > I've started to review this patch. I tested v40-0001 patch and have
> > one question:
> >
> > IIUC even when most of the changes in the transaction are filtered out
> > in pgoutput (eg., by relation filter or row filter), the walsender
> > sends STREAM_START. This means that the subscriber could end up
> > launching parallel apply workers also for almost empty (and streamed)
> > transactions. For example, I created three subscriptions each of which
> > subscribes to a different table. When I loaded a large amount of data
> > into one table, all three (leader) apply workers received START_STREAM
> > and launched their parallel apply workers.
> >
> 
> The apply workers will be launched just the first time then we
> maintain a pool so that we don't need to restart them.
> 
> > However, two of them
> > finished without applying any data. I think this behaviour looks
> > problematic since it wastes workers and rather decreases the apply
> > performance if the changes are not large. Is it worth considering a
> > way to delay launching a parallel apply worker until we find out the
> > amount of changes is actually large?
> >
> 
> I think even if changes are less there may not be much difference
> because we have observed that the performance improvement comes from
> not writing to file.
> 
> > For example, the leader worker
> > writes the streamed changes to files as usual and launches a parallel
> > worker if the amount of changes exceeds a threshold or the leader
> > receives the second segment. After that, the leader worker switches to
> > send the streamed changes to parallel workers via shm_mq instead of
> > files.
> >
> 
> I think writing to file won't be a good idea as that can hamper the
> performance benefit in some cases and not sure if it is worth.
> 

I tried to test some cases that only a small part of the transaction or an empty
transaction is sent to subscriber, to see if using streaming parallel will bring
performance degradation.

The test was performed ten times, and the average was taken.
The results are as follows. The details and the script of the test is attached.

10% of rows are sent
--
HEAD24.4595
patched 18.4545

5% of rows are sent
--
HEAD21.244
patched 17.9655

0% of rows are sent
--
HEAD18.0605
patched 17.893


It shows that when only 5% or 10% of rows are sent to subscriber, using parallel
apply takes less time than HEAD, and even if all rows are filtered there's no
performance degradation.


Regards
Shi yu
<>

RE: Perform streaming logical transactions by background workers and parallel apply

2022-10-28 Thread shiy.f...@fujitsu.com

On Tue, Oct 25, 2022 2:56 PM Wang, Wei/王 威  wrote:
> 
> On Tues, Oct 25, 2022 at 14:28 PM Peter Smith 
> wrote:
> > FYI - After a recent push, the v40-0001 patch no longer applies on the
> > latest HEAD.
> >
> > [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> > ../patches_misc/v40-0001-Perform-streaming-logical-transactions-by-
> > parall.patch
> > error: patch failed: src/backend/replication/logical/launcher.c:54
> > error: src/backend/replication/logical/launcher.c: patch does not apply
> 
> Thanks for your reminder.
> 
> I just rebased the patch set for review.
> The new patch set will be shared later when the comments in this thread are
> addressed.
> 

I tried to write a draft patch to force streaming every change instead of
waiting until logical_decoding_work_mem is exceeded, which could help to test
streaming parallel. Attach the patch. This is based on v41-0001 patch.

With this patch, I saw a problem that the subscription option "origin" doesn't
work when using streaming parallel. That's because when the parallel apply
worker writing the WAL for the changes, replorigin_session_origin is
InvalidRepOriginId. In current patch, origin can be active only in one process
at-a-time.

To fix it, maybe we need to remove this restriction, like what we did in the old
version of patch.

Regards
Shi yu

0001-Allow-streaming-every-change-instead-of-waiting-till_patch
Description:  0001-Allow-streaming-every-change-instead-of-waiting-till_patch

RE: Segfault on logical replication to partitioned table with foreign children

2022-10-30 Thread shiy.f...@fujitsu.com

On Sun, Oct 30, 2022 9:39 PM Tom Lane  wrote:
> 
> What I'm wondering about is whether we can refactor this code
> to avoid so many usually-useless catalog lookups.  Pulling the
> namespace name, in particular, is expensive and we generally
> are not going to need the result.  In the child-rel case it'd
> be much better to pass the opened relation and let the error-check
> subroutine work from that.  Maybe we should just do it like that
> at the start of the recursion, too?  Or pass the relid and let
> the subroutine look up the names only in the error case.
> 
> A completely different line of thought is that this doesn't seem
> like a terribly bulletproof fix, since children could get added to
> a partitioned table after we look.  Maybe it'd be better to check
> the relkind at the last moment before we do something that depends
> on a child table being a relation.
> 

I agree. So maybe we can add this check in apply_handle_tuple_routing().

diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index 5250ae7f54..e941b68e4b 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2176,6 +2176,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
Assert(partrelinfo != NULL);
partrel = partrelinfo->ri_RelationDesc;

+   /* Check for supported relkind. */
+   CheckSubscriptionRelkind(partrel->rd_rel->relkind,
+
relmapentry->remoterel.nspname, relmapentry->remoterel.relname);
+
/*
 * To perform any of the operations below, the tuple must match the
 * partition's rowtype. Convert if needed or just copy, using a 
dedicated


Regards,
Shi yu

RE: bogus: logical replication rows/cols combinations

2022-05-16 Thread shiy.f...@fujitsu.com

On Mon, May 16, 2022 8:34 PM houzj.f...@fujitsu.com  
wrote:
> 
> Attach the new version patch.
> 

Thanks for your patch. Here are some comments:

1. (0001 patch)
/*
 * Returns Oids of tables in a publication.
 */
Datum
pg_get_publication_tables(PG_FUNCTION_ARGS)

Should we modify the comment of pg_get_publication_tables() to "Returns
information of tables in a publication"?

2. (0002 patch)

+* Note that we don't support the case where column list is different 
for
+* the same table when combining publications. But we still need to 
check
+* all the given publication-table mappings and report an error if any
+* publications have different column list.
 *
 * Multiple publications might have multiple column lists for this
 * relation.

I think it would be better if we swap the order of these two paragraphs. 

Regards,
Shi yu

RE: Skipping schema changes in publication

2022-05-17 Thread shiy.f...@fujitsu.com

On Sat, May 14, 2022 9:33 PM vignesh C  wrote:
> 
> Thanks for the comments, the attached v5 patch has the changes for the
> same. Also I have made the changes for SKIP Table based on the new
> syntax, the changes for the same are available in
> v5-0002-Skip-publishing-the-tables-specified-in-EXCEPT-TA.patch.
>

Thanks for your patch. Here are some comments on v5-0001 patch.

+   Oid relid = lfirst_oid(lc);
+
+   prid = GetSysCacheOid2(PUBLICATIONRELMAP, 
Anum_pg_publication_rel_oid,
+  
ObjectIdGetDatum(relid),
+  
ObjectIdGetDatum(pubid));
+   if (!OidIsValid(prid))
+   ereport(ERROR,
+   (errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg("relation \"%s\" is not part of 
the publication",
+   
RelationGetRelationName(rel;

I think the relation in the error message should be the one whose oid is
"relid", instead of relation "rel".

Besides, I think it might be better not to report an error in this case. If
"prid" is invalid, just ignore this relation. Because in RESET cases, we want to
drop all tables in the publications, and there is no specific table.
(If you agree with that, similarly missing_ok should be set to true when calling
PublicationDropSchemas().)

Regards,
Shi yu

RE: Handle infinite recursion in logical replication setup

2022-05-18 Thread shiy.f...@fujitsu.com

On Wed, May 4, 2022 2:47 PM vignesh C  wrote:
> 
> Thanks for the comments, the attached v13 patch has the changes for the same.
> 

Thanks for your patch. Here are some comments on v13-0001 patch.

1) 
@@ -152,6 +152,18 @@ CREATE SUBSCRIPTION subscription_name

+   
+local_only (boolean)
+
+ 
+  Specifies whether the subscription will request the publisher to send
+  locally originated changes at the publisher node, or send any
+  publisher node changes regardless of their origin. The default is
+  false.
+ 
+
+   
+

 slot_name (string)

I think this change should be put after "The following parameters control the
subscription's replication behavior after it has been created", thoughts?

2)
A new column "sublocalonly" is added to pg_subscription, so maybe we need add it
to pg_subscription document, too. (in doc/src/sgml/catalogs.sgml)

3)
/*
 * Currently we always forward.
 */
static bool
pgoutput_origin_filter(LogicalDecodingContext *ctx,
   RepOriginId origin_id)

Should we modify the comment of pgoutput_origin_filter()? It doesn't match the
code.

Regards,
Shi yu

RE: Handle infinite recursion in logical replication setup

2022-05-19 Thread shiy.f...@fujitsu.com

On Wed, May 4, 2022 2:47 PM vignesh C  wrote:
> 
> Thanks for the comments, the attached v13 patch has the changes for the same.
>

Here are some comments on v13-0002 patch.

1)
+* Throw an error so that the user can take care of the initial 
data
+* copying and then create subscription with copy_data off.

Should "with copy_data off" be changed to "with copy_data off or force"?

2)
case ALTER_SUBSCRIPTION_ADD_PUBLICATION:
case ALTER_SUBSCRIPTION_DROP_PUBLICATION:
...
/*
 * See ALTER_SUBSCRIPTION_REFRESH for 
details why this is
 * not allowed.
 */
if (sub->twophasestate == 
LOGICALREP_TWOPHASE_STATE_ENABLED && opts.copy_data)

I think we need some changes here, too. Should it be modified to:
if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && 
IS_COPY_DATA_ON_OR_FORCE(opts.copy_data))

Regards,
Shi yu

RE: Handle infinite recursion in logical replication setup

2022-05-23 Thread shiy.f...@fujitsu.com

On Tue, May 24, 2022 1:34 AM vignesh C  wrote:
> 
> Thanks for the comments, the attached v15 patch has the fixes for the same.
> 

Thanks for updating the patch. I have a comment on the document in 0002 patch.

@@ -300,6 +310,11 @@ CREATE SUBSCRIPTION subscription_namefalse.

+ 
+  There is some interaction between the only_local
+  option and copy_data option. Refer to the
+   for details.
+ 

This change is related to "only_local" option and "copy_data" option, so I think
it should be put together with "only_local", instead of "disable_on_error".

Besides, I tested some cross version cases, pg_dump and describe command, the
results are all as expected.

Regards,
Shi yu

RE: Handle infinite recursion in logical replication setup

2022-05-27 Thread shiy.f...@fujitsu.com

On Wed, May 25, 2022 7:55 PM vignesh C  wrote:
> 
> The attached v16 patch has the changes for the same.
> 

Thanks for updating the patch.

Some comments for the document in 0002 patch.

1.
+   
+Lock the required tables in node1 and
+node2 till the setup is completed.
+   
+
+   
+Create a publication in node1:
+
+node1=# CREATE PUBLICATION pub_node1 FOR TABLE t1;
+CREATE PUBLICATION
+

If the table is locked in the very beginning, we will not be able to create the
publication (because the locks have conflict). Maybe we should switch the order
of creating publication and locking tables here.

2.
In the case of "Adding a new node when data is present in the new node", we need
to truncate table t1 in node3, but the truncate operation would be blocked
because the table has be locked before. Maybe we need some changes for it.

Regards,
Shi yu

RE: Hash index build performance tweak from sorting

2022-05-30 Thread shiy.f...@fujitsu.com

On Tue, May 10, 2022 5:43 PM Simon Riggs  wrote:
> 
> On Sat, 30 Apr 2022 at 12:12, Amit Kapila 
> wrote:
> >
> > Few comments on the patch:
> > 1. I think it is better to use DatumGetUInt32 to fetch the hash key as
> > the nearby code is using.
> > 2. You may want to change the below comment in HSpool
> > /*
> > * We sort the hash keys based on the buckets they belong to. Below
> masks
> > * are used in _hash_hashkey2bucket to determine the bucket of given
> hash
> > * key.
> > */
> 
> Addressed in new patch, v2.
> 

I think your changes looks reasonable.

Besides, I tried this patch with Simon's script, and index creation time was 
about
7.5% faster after applying this patch on my machine, which looks good to me.

RESULT - index creation time
===
HEAD: 9513.466 ms
Patched: 8796.75 ms

I ran it 10 times and got the average, and here are the configurations used in
the test:
shared_buffers = 2GB
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off

Regards,
Shi yu

RE: Handle infinite recursion in logical replication setup

2022-06-06 Thread shiy.f...@fujitsu.com

On Mon, Jun 6, 2022 1:14 PM vignesh C  wrote:
> 
> The attached v18 patch has the fixes for the same.
> 

Thanks for updating the patch, here are some comments.

0002 patch
==
1. 
+   
+origin (string)
+
+ 

It maybe better if the type of  "origin" parameter is enum,  as it cannot be any
string and only has two valid values.

2.
@@ -607,6 +626,11 @@ CreateSubscription(ParseState *pstate, 
CreateSubscriptionStmt *stmt,
 LOGICALREP_TWOPHASE_STATE_PENDING :
 LOGICALREP_TWOPHASE_STATE_DISABLED);
values[Anum_pg_subscription_subdisableonerr - 1] = 
BoolGetDatum(opts.disableonerr);
+   if (opts.origin)
+   values[Anum_pg_subscription_suborigin - 1] =
+   CStringGetTextDatum(opts.origin);
+   else
+   nulls[Anum_pg_subscription_suborigin - 1] = true;
values[Anum_pg_subscription_subconninfo - 1] =
CStringGetTextDatum(conninfo);
if (opts.slot_name)

Document of "CREATE SUBSCRIPTION" says, the default value of "origin" is "any", 
so why not set
suborigin to "any" when user doesn't specify this parameter?

0003 patch
==
1.
@@ -300,6 +310,11 @@ CREATE SUBSCRIPTION subscription_namefalse.

+ 
+  There is some interaction between the origin
+  parameter and copy_data parameter. Refer to the
+   for details.
+ 

I think this change should be put together with "origin" parameter, instead of
"disable_on_error".

0004 patch
==
1.
+   
+Now the bidirectional logical replication setup is complete between
+node1, node2 and
+node2. Any subsequent changes in one node will
+replicate the changes to the other nodes.
+   

I think "node1, node2 and node2" should be "node1, node2 and node3".

Regards,
Shi yu

Replica Identity check of partition table on subscriber

2022-06-08 Thread shiy.f...@fujitsu.com

Hi hackers,

I saw a problem in logical replication, when the target table on subscriber is a
partitioned table, it only checks whether the Replica Identity of partitioned
table is consistent with the publisher, and doesn't check Replica Identity of
the partition.

For example:
-- publisher --
create table tbl (a int not null, b int);
create unique INDEX ON tbl (a);
alter table tbl replica identity using INDEX tbl_a_idx;
create publication pub for table tbl;

-- subscriber --
-- table tbl (parent table) has RI index, while table child has no RI index.
create table tbl (a int not null, b int) partition by range (a);
create table child partition of tbl default;
create unique INDEX ON tbl (a);
alter table tbl replica identity using INDEX tbl_a_idx;
create subscription sub connection 'port=5432 dbname=postgres' publication pub;

-- publisher --
insert into tbl values (11,11);
update tbl set a=a+1;

It caused an assertion failure on subscriber:
TRAP: FailedAssertion("OidIsValid(idxoid) || (remoterel->replident == 
REPLICA_IDENTITY_FULL)", File: "worker.c", Line: 2088, PID: 1616523)

The backtrace is attached.

We got the assertion failure because idxoid is invalid, as table child has no
Replica Identity or Primary Key. We have a check in check_relation_updatable(),
but what it checked is table tbl (the parent table) and it passed the check.

I think one approach to fix it is to check the target partition in this case,
instead of the partitioned table.

When trying to fix it, I saw some other problems about updating partition map
cache.

a) In logicalrep_partmap_invalidate_cb(), the type of the entry in
LogicalRepPartMap should be LogicalRepPartMapEntry, instead of
LogicalRepRelMapEntry.

b) In logicalrep_partition_open(), it didn't check if the entry is valid.

c) When the publisher send new relation mapping, only relation map cache will be
updated, and partition map cache wouldn't. I think it also should be updated
because it has remote relation information, too.

Attach two patches which tried to fix them.
0001 patch: fix the above three problems about partition map cache.
0002 patch: fix the assertion failure, check the Replica Identity of the
partition if the target table is a partitioned table.

Thanks to Hou Zhijie for helping me in the first patch.

I will add a test for it later if no one doesn't like this fix.

Regards,
Shi yu


v1-0001-Fix-partition-map-cache-issues.patch
Description: v1-0001-Fix-partition-map-cache-issues.patch


v1-0002-Check-partition-table-replica-identity-on-subscri.patch
Description:  v1-0002-Check-partition-table-replica-identity-on-subscri.patch
#0  0x7fd35e76970f in raise () from /lib64/libc.so.6
#1  0x7fd35e753b25 in abort () from /lib64/libc.so.6
#2  0x00fff9b4 in ExceptionalCondition (conditionName=0x1237a80 
"OidIsValid(idxoid) || (remoterel->replident == REPLICA_IDENTITY_FULL)",
errorType=0x1237208 "FailedAssertion", fileName=0x123737f "worker.c", 
lineNumber=2088) at assert.c:69
#3  0x00c354d4 in FindReplTupleInLocalRel (estate=0x1c92240, 
localrel=0x7fd352ce7bf8, remoterel=0x1ca5b28, remoteslot=0x1c962a0, 
localslot=0x7ffc8b70c360)
at worker.c:2087
#4  0x00c35a0f in apply_handle_tuple_routing (edata=0x1c7b3b0, 
remoteslot=0x1c7b808, newtup=0x7ffc8b70c420, operation=CMD_UPDATE) at 
worker.c:2192
#5  0x00c34a1d in apply_handle_update (s=0x7ffc8b70c4f0) at 
worker.c:1855
#6  0x00c36cab in apply_dispatch (s=0x7ffc8b70c4f0) at worker.c:2481
#7  0x00c37834 in LogicalRepApplyLoop (last_received=22131584) at 
worker.c:2757
#8  0x00c39ee5 in start_apply (origin_startpos=0) at worker.c:3531
#9  0x00c3ae14 in ApplyWorkerMain (main_arg=0) at worker.c:3787
#10 0x00bbd903 in StartBackgroundWorker () at bgworker.c:858
#11 0x00bd175a in do_start_bgworker (rw=0x1bd4910) at postmaster.c:5815
#12 0x00bd1ea5 in maybe_start_bgworkers () at postmaster.c:6039
#13 0x00bcfdb7 in sigusr1_handler (postgres_signal_arg=10) at 
postmaster.c:5204
#14 
#15 0x7fd35e8254ab in select () from /lib64/libc.so.6
#16 0x00bc74fc in ServerLoop () at postmaster.c:1770
#17 0x00bc6a67 in PostmasterMain (argc=3, argv=0x1ba9b00) at 
postmaster.c:1478
#18 0x00a10b78 in main (argc=3, argv=0x1ba9b00) at main.c:202

RE: Replica Identity check of partition table on subscriber

2022-06-10 Thread shiy.f...@fujitsu.com

On Thu, June 9, 2022 7:02 PM Amit Kapila  wrote:
> 
> > I think one approach to fix it is to check the target partition in this 
> > case,
> > instead of the partitioned table.
> >
> 
> This approach sounds reasonable to me. One minor point:
> +/*
> + * Check that replica identity matches.
> + *
> + * We allow for stricter replica identity (fewer columns) on subscriber as
> + * that will not stop us from finding unique tuple. IE, if publisher has
> + * identity (id,timestamp) and subscriber just (id) this will not be a
> + * problem, but in the opposite scenario it will.
> + *
> + * Don't throw any error here just mark the relation entry as not updatable,
> + * as replica identity is only for updates and deletes but inserts can be
> + * replicated even without it.
> + */
> +static void
> +logicalrep_check_updatable(LogicalRepRelMapEntry *entry)
> 
> Can we name this function as logicalrep_rel_mark_updatable as we are
> doing that? If so, change the comments as well.
> 

OK. Modified.

> > When trying to fix it, I saw some other problems about updating partition
> map
> > cache.
> >
> > a) In logicalrep_partmap_invalidate_cb(), the type of the entry in
> > LogicalRepPartMap should be LogicalRepPartMapEntry, instead of
> > LogicalRepRelMapEntry.
> >
> > b) In logicalrep_partition_open(), it didn't check if the entry is valid.
> >
> > c) When the publisher send new relation mapping, only relation map cache
> will be
> > updated, and partition map cache wouldn't. I think it also should be updated
> > because it has remote relation information, too.
> >
> 
> Is there any test case that can show the problem due to your above
> observations?
> 

Please see the following case.

-- publisher
create table tbl (a int primary key, b int);
create publication pub for table tbl;

-- subscriber
create table tbl (a int primary key, b int, c int) partition by range (a);
create table child partition of tbl default;

-- publisher, make cache
insert into tbl values (1,1);
update tbl set a=a+1;
alter table tbl add column c int;
update tbl set c=1 where a=2;

-- subscriber
postgres=# select * from tbl;
 a | b | c
---+---+---
 2 | 1 |
(1 row)

The value of column c updated failed on subscriber.
And after applying the first patch, it would work fine.

I have added this case to the first patch. Also add a test case for the second
patch.

Attach the new patches.

Regards,
Shi yu


v2-0001-Fix-partition-map-cache-issues.patch
Description: v2-0001-Fix-partition-map-cache-issues.patch


v2-0002-Check-partition-table-replica-identity-on-subscri.patch
Description:  v2-0002-Check-partition-table-replica-identity-on-subscri.patch

RE: tablesync copy ignores publication actions

2022-06-13 Thread shiy.f...@fujitsu.com

On Wed, Jun 8, 2022 12:10 PM Amit Kapila  wrote:
> 
> On Tue, Jun 7, 2022 at 7:08 PM Euler Taveira  wrote:
> >
> > On Tue, Jun 7, 2022, at 1:10 AM, Peter Smith wrote:
> >
> > The logical replication tablesync ignores the publication 'publish'
> > operations during the initial data copy.
> >
> > This is current/known PG behaviour (e.g. as recently mentioned [1])
> > but it was not documented anywhere.
> >
> > initial data synchronization != replication. publish parameter is a 
> > replication
> > property; it is not a initial data synchronization property. Maybe we should
> > make it clear like you are suggesting.
> >
> 
> +1 to document it. We respect some other properties of publication
> like the publish_via_partition_root parameter, column lists, and row
> filters. So it is better to explain about 'publish' parameter which we
> ignore during the initial sync.
> 

I also agree to add it to the document.

> > You could expand the
> > suggested sentence to make it clear what publish parameter is or even add
> a
> > link to the CREATE PUBLICATION synopsis (that explains about publish
> > parameter).
> >
> 
> +1. I suggest that we should add some text about the behavior of
> initial sync in CREATE PUBLICATION doc (along with the 'publish'
> parameter) or otherwise, we can explain it where we are talking about
> publications [1].
> 

I noticed that CREATE SUBSCRIPTION doc mentions that row filter will affect
initial sync along with "copy_data" parameter.[1] Maybe we can add some text for
"publish" parameter there.

[1] https://www.postgresql.org/docs/devel/sql-createsubscription.html

Regards,
Shi yu

RE: Replica Identity check of partition table on subscriber

2022-06-14 Thread shiy.f...@fujitsu.com

On Tue, Jun 14, 2022 2:18 PM Amit Langote  wrote:
> 
> On Mon, Jun 13, 2022 at 9:26 PM Amit Kapila 
> wrote:
> > On Mon, Jun 13, 2022 at 1:03 PM houzj.f...@fujitsu.com
> >  wrote:
> > > On Monday, June 13, 2022 1:53 PM Amit Kapila
>  wrote:
> > > I have separated out the bug-fix for the subscriber-side.
> > > And fix the typo and function name.
> > > Attach the new version patch set.
> > >
> >
> > The first patch looks good to me. I have slightly modified one of the
> > comments and the commit message. I think we need to backpatch this
> > through 13 where we introduced support to replicate into partitioned
> > tables (commit f1ac27bf). If you guys are fine, I'll push this once
> > the work for PG14.4 is done.
> 
> Both the code changes and test cases look good to me.  Just a couple
> of minor nitpicks with test changes:

Thanks for your comments.

> 
> +   CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
> +   ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
> +   ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});
> 
> Not sure if we follow it elsewhere, but should we maybe avoid using
> the internally generated index name as in the partition's case above?
> 

I saw some existing tests also use internally generated index name (e.g.
replica_identity.sql, ddl.sql and 031_column_list.pl), so maybe it's better to
fix them all in a separate patch. I didn't change this.

> +# Test the case that target table on subscriber is a partitioned table and
> +# check that the changes are replicated correctly after changing the schema
> of
> +# table on subcriber.
> 
> The first sentence makes it sound like the tests that follow are the
> first ones in the file where the target table is partitioned, which is
> not true, so I think we should drop that part.  Also how about being
> more specific about the test intent, say:
> 
> Test that replication continues to work correctly after altering the
> partition of a partitioned target table.
> 

OK, modified.

Attached the new version of patch set, and the patches for pg14 and pg13.

Regards,
Shi yu


v6-0001-Fix-cache-look-up-failures-while-applying-changes.patch
Description:  v6-0001-Fix-cache-look-up-failures-while-applying-changes.patch


v6-0002-Reset-partition-map-cache-when-receiving-new-rela.patch
Description:  v6-0002-Reset-partition-map-cache-when-receiving-new-rela.patch


v6-0003-Check-partition-table-replica-identity-on-subscri.patch
Description:  v6-0003-Check-partition-table-replica-identity-on-subscri.patch


v6-0004-fix-memory-leak-about-attrmap.patch
Description: v6-0004-fix-memory-leak-about-attrmap.patch


v6-pg13-0001-Fix-cache-look-up-failures-while-applying-chang_patch
Description:  v6-pg13-0001-Fix-cache-look-up-failures-while-applying-chang_patch


v6-pg14-0001-Fix-cache-look-up-failures-while-applying-chang_patch
Description:  v6-pg14-0001-Fix-cache-look-up-failures-while-applying-chang_patch

RE: Replica Identity check of partition table on subscriber

2022-06-14 Thread shiy.f...@fujitsu.com

On Tue, Jun 14, 2022 8:57 PM Amit Kapila  wrote:
> 
> > v4-0002 looks good btw, except the bitpick about test comment similar
> > to my earlier comment regarding v5-0001:
> >
> > +# Change the column order of table on publisher
> >
> > I think it might be better to say something specific to describe the
> > test intent, like:
> >
> > Test that replication into partitioned target table continues to works
> > correctly when the published table is altered
> >
> 
> Okay changed this and slightly modify the comments and commit message.
> I am just attaching the HEAD patches for the first two issues.
> 

Thanks for updating the patch.

Attached the new patch set which ran pgindent, and the patches for pg14 and
pg13. (Only the first two patches of the patch set.)

Regards,
Shi yu


v8-pg14-0002-Fix-data-inconsistency-between-publisher-and-su_patch
Description:  v8-pg14-0002-Fix-data-inconsistency-between-publisher-and-su_patch


v8-pg14-0001-Fix-cache-look-up-failures-while-applying-chang_patch
Description:  v8-pg14-0001-Fix-cache-look-up-failures-while-applying-chang_patch


v8-pg13-0002-Fix-data-inconsistency-between-publisher-and-su_patch
Description:  v8-pg13-0002-Fix-data-inconsistency-between-publisher-and-su_patch


v8-pg13-0001-Fix-cache-look-up-failures-while-applying-chang_patch
Description:  v8-pg13-0001-Fix-cache-look-up-failures-while-applying-chang_patch


v8-0002-Fix-data-inconsistency-between-publisher-and-subs.patch
Description:  v8-0002-Fix-data-inconsistency-between-publisher-and-subs.patch


v8-0001-Fix-cache-look-up-failures-while-applying-changes.patch
Description:  v8-0001-Fix-cache-look-up-failures-while-applying-changes.patch

1 2 >

1 - 100 of 136 matches

Mail list logo