Re: Allow logical replication to copy tables in binary format
Hi, Amit Kapila , 23 Mar 2023 Per, 08:48 tarihinde şunu yazdı: > Pushed the 0001. It may be better to start a separate thread for 0002. > Great! Thanks. Best, -- Melih Mutlu Microsoft
RE: Allow logical replication to copy tables in binary format
Dear Amit, > Pushed the 0001. It may be better to start a separate thread for 0002. Good job! I have started new thread [1] for 0002. [1]: https://www.postgresql.org/message-id/TYAPR01MB58667AE04D291924671E2051F5879%40TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Allow logical replication to copy tables in binary format
On Wed, Mar 22, 2023 at 4:06 PM Hayato Kuroda (Fujitsu) wrote: > > > The patch looks mostly good to me. However, I have one > > question/comment as follows: > > > > - > > + > > binary (boolean) > > > > > > To allow references to the binary option, we add the varlistentry id > > here. It looks slightly odd to me to add id for just one entry, see > > commit 78ee60ed84bb3a1cf0b6bd9a715dcbcf252a90f5 where we have > > purposefully added ids to allow future references. Shall we add id to > > other options as well on this page? > > I have analyzed same points and made patch that could be applied atop > v19-0001. > Please check 0002 patch. > Pushed the 0001. It may be better to start a separate thread for 0002. -- With Regards, Amit Kapila.
RE: Allow logical replication to copy tables in binary format
Dear Amit, hackers, > The patch looks mostly good to me. However, I have one > question/comment as follows: > > - > + > binary (boolean) > > > To allow references to the binary option, we add the varlistentry id > here. It looks slightly odd to me to add id for just one entry, see > commit 78ee60ed84bb3a1cf0b6bd9a715dcbcf252a90f5 where we have > purposefully added ids to allow future references. Shall we add id to > other options as well on this page? I have analyzed same points and made patch that could be applied atop v19-0001. Please check 0002 patch. Best Regards, Hayato Kuroda FUJITSU LIMITED v19-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: v19-0001-Allow-logical-replication-to-copy-table-in-binary.patch v19-0002-Add-XML-ID-attributes-to-create_subscription.sgm.patch Description: v19-0002-Add-XML-ID-attributes-to-create_subscription.sgm.patch
Re: Allow logical replication to copy tables in binary format
On Wed, Mar 22, 2023 at 9:00 AM shiy.f...@fujitsu.com wrote: > > On Wed Mar 22, 2023 7:29 AM Peter Smith wrote: > > > > Thanks for all the patch updates. Patch v19 LGTM. > > > > +1 > The patch looks mostly good to me. However, I have one question/comment as follows: - + binary (boolean) To allow references to the binary option, we add the varlistentry id here. It looks slightly odd to me to add id for just one entry, see commit 78ee60ed84bb3a1cf0b6bd9a715dcbcf252a90f5 where we have purposefully added ids to allow future references. Shall we add id to other options as well on this page? -- With Regards, Amit Kapila.
RE: Allow logical replication to copy tables in binary format
On Wed Mar 22, 2023 7:29 AM Peter Smith wrote: > > Thanks for all the patch updates. Patch v19 LGTM. > +1 Regards, Shi Yu
Re: Allow logical replication to copy tables in binary format
Thanks for all the patch updates. Patch v19 LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Allow logical replication to copy tables in binary format
Hi, Peter Smith , 21 Mar 2023 Sal, 04:33 tarihinde şunu yazdı: > Here are my review comments for v18-0001 > > == > doc/src/sgml/logical-replication.sgml > > 1. > + target table. However, logical replication in binary format is more > + restrictive. See the binary option of > + CREATE > SUBSCRIPTION > + for details. > > > Because you've changed the linkend to be the binary option, IMO now > the part also needs to be modified. Otherwise, this page has > multiple "CREATE SUBSCRIPTION" links which jump to different places, > which just seems wrong to me. > Makes sense. I changed it as you suggested. > 3. > I think there can only be 0 or 1 list element in 'options'. > > So, why does the code here use lappend(options,...) instead of just > using list_make1(...)? > Changed it to list_make1. Amit Kapila , 21 Mar 2023 Sal, 12:27 tarihinde şunu yazdı: > > Explaining the issue explicitly with a comment seems better to me than > the trick of changing order of table creation just for some test cases. > > But I'm also ok with removing the use of disable_on_error if that's what > you agree on. > > > > Let's do that way for now. > Done. Thanks, -- Melih Mutlu Microsoft From 8d94a5f18a433efaeae907b5dc0225272b7442aa Mon Sep 17 00:00:00 2001 From: Melih Mutlu Date: Mon, 8 Aug 2022 14:14:44 +0300 Subject: [PATCH v19] Allow logical replication to copy table in binary If binary option is enabled in a subscription, then copy tables in binary format during table synchronization. Without this patch, tables are copied in text format even if the subscription is created with binary option enabled. This patch allows logical replication to perform in binary format starting from initial sync. Copying tables in binary format may reduce the time spent depending on column types. Binary copy for initial table synchronization is supported only when both publisher and subscriber are v16 or later. Discussion: https://postgr.es/m/CAGPVpCQvAziCLknEnygY0v1-KBtg%2BOm-9JHJYZOnNPKFJPompw%40mail.gmail.com --- doc/src/sgml/logical-replication.sgml | 4 +- doc/src/sgml/ref/alter_subscription.sgml| 5 + doc/src/sgml/ref/create_subscription.sgml | 28 ++- src/backend/replication/logical/tablesync.c | 17 +- src/test/subscription/t/014_binary.pl | 180 ++-- 5 files changed, 216 insertions(+), 18 deletions(-) diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index 6b0e300adc..3836d13ad3 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -251,7 +251,9 @@ column of type bigint. The target table can also have additional columns not provided by the published table. Any such columns will be filled with the default value as specified in the definition of the - target table. + target table. However, logical replication in binary format is more + restrictive. See the binary + option of CREATE SUBSCRIPTION for details. diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index 964fcbb8ff..e92346edef 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -177,6 +177,11 @@ ALTER SUBSCRIPTION name RENAME TO < how copy_data = true can interact with the origin parameter. + + See the binary + option of CREATE SUBSCRIPTION for details + about copying pre-existing data in binary format. + diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index 51c45f17c7..9d4b9d4e33 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -185,15 +185,25 @@ CREATE SUBSCRIPTION subscription_name - + binary (boolean) - Specifies whether the subscription will request the publisher to - send the data in binary format (as opposed to text). - The default is false. - Even when this option is enabled, only data types having - binary send and receive functions will be transferred in binary. + Specifies whether the subscription will request the publisher to send + the data in binary format (as opposed to text). The default is + false. Any initial table synchronization copy + (see copy_data) also uses the same format. Binary + format can be faster than the text format, but it is less portable + across machine architectures and PostgreSQL + versions. Binary format is very data type specific; for example, it + will not allow copying from a smallint column to an + integer column, even though that would work fine in text + format. Even when this option is enabled, only data types having binary + send and receive functions will be
Re: Allow logical replication to copy tables in binary format
On Tue, Mar 21, 2023 at 2:16 PM Melih Mutlu wrote: > > Amit Kapila , 21 Mar 2023 Sal, 09:03 tarihinde şunu > yazdı: >> >> On Tue, Mar 21, 2023 at 7:03 AM Peter Smith wrote: >> > >> > >> > == >> > src/test/subscription/t/014_binary.pl >> > >> > 4. >> > # - >> > # Test mismatched column types with/without binary mode >> > # - >> > >> > # Test syncing tables with mismatching column types >> > $node_publisher->safe_psql( >> > 'postgres', qq( >> > CREATE TABLE public.test_mismatching_types ( >> > a bigint PRIMARY KEY >> > ); >> > INSERT INTO public.test_mismatching_types (a) >> > VALUES (1), (2); >> > )); >> > >> > # Check the subscriber log from now on. >> > $offset = -s $node_subscriber->logfile; >> > >> > # Ensure the subscription is enabled. disable_on_error is still true, >> > # so the subscription can be disabled due to missing realtion until >> > # the test_mismatching_types table is created. >> > $node_subscriber->safe_psql( >> > 'postgres', qq( >> > CREATE TABLE public.test_mismatching_types ( >> > a int PRIMARY KEY >> > ); >> > ALTER SUBSCRIPTION tsub ENABLE; >> > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION; >> > )); >> > >> > ~~ >> > >> > I found the "Ensure the subscription is enabled..." comment and the >> > necessity for enabling the subscription to be confusing. >> > >> > Can't some complications all be eliminated just by creating the table >> > on the subscribe side first? >> > >> >> Hmm, that would make this test inconsistent with other tests and >> probably difficult to understand and extend. I don't like to say this >> but I think introducing disable_on_error has introduced more >> complexities in the patch due to the requirement of enabling >> subscription again and again. I feel it would be better without using >> disable_on_error option in these tests. > > > While this change would make the test inconsistent, I think it also would > make more confusing. > I also think so. > Explaining the issue explicitly with a comment seems better to me than the > trick of changing order of table creation just for some test cases. > But I'm also ok with removing the use of disable_on_error if that's what you > agree on. > Let's do that way for now. -- With Regards, Amit Kapila.
Re: Allow logical replication to copy tables in binary format
Amit Kapila , 21 Mar 2023 Sal, 09:03 tarihinde şunu yazdı: > On Tue, Mar 21, 2023 at 7:03 AM Peter Smith wrote: > > > > > > == > > src/test/subscription/t/014_binary.pl > > > > 4. > > # - > > # Test mismatched column types with/without binary mode > > # - > > > > # Test syncing tables with mismatching column types > > $node_publisher->safe_psql( > > 'postgres', qq( > > CREATE TABLE public.test_mismatching_types ( > > a bigint PRIMARY KEY > > ); > > INSERT INTO public.test_mismatching_types (a) > > VALUES (1), (2); > > )); > > > > # Check the subscriber log from now on. > > $offset = -s $node_subscriber->logfile; > > > > # Ensure the subscription is enabled. disable_on_error is still true, > > # so the subscription can be disabled due to missing realtion until > > # the test_mismatching_types table is created. > > $node_subscriber->safe_psql( > > 'postgres', qq( > > CREATE TABLE public.test_mismatching_types ( > > a int PRIMARY KEY > > ); > > ALTER SUBSCRIPTION tsub ENABLE; > > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION; > > )); > > > > ~~ > > > > I found the "Ensure the subscription is enabled..." comment and the > > necessity for enabling the subscription to be confusing. > > > > Can't some complications all be eliminated just by creating the table > > on the subscribe side first? > > > > Hmm, that would make this test inconsistent with other tests and > probably difficult to understand and extend. I don't like to say this > but I think introducing disable_on_error has introduced more > complexities in the patch due to the requirement of enabling > subscription again and again. I feel it would be better without using > disable_on_error option in these tests. > While this change would make the test inconsistent, I think it also would make more confusing. Explaining the issue explicitly with a comment seems better to me than the trick of changing order of table creation just for some test cases. But I'm also ok with removing the use of disable_on_error if that's what you agree on. Thanks, -- Melih Mutlu Microsoft
Re: Allow logical replication to copy tables in binary format
On Tue, Mar 21, 2023 at 7:03 AM Peter Smith wrote: > > > == > src/test/subscription/t/014_binary.pl > > 4. > # - > # Test mismatched column types with/without binary mode > # - > > # Test syncing tables with mismatching column types > $node_publisher->safe_psql( > 'postgres', qq( > CREATE TABLE public.test_mismatching_types ( > a bigint PRIMARY KEY > ); > INSERT INTO public.test_mismatching_types (a) > VALUES (1), (2); > )); > > # Check the subscriber log from now on. > $offset = -s $node_subscriber->logfile; > > # Ensure the subscription is enabled. disable_on_error is still true, > # so the subscription can be disabled due to missing realtion until > # the test_mismatching_types table is created. > $node_subscriber->safe_psql( > 'postgres', qq( > CREATE TABLE public.test_mismatching_types ( > a int PRIMARY KEY > ); > ALTER SUBSCRIPTION tsub ENABLE; > ALTER SUBSCRIPTION tsub REFRESH PUBLICATION; > )); > > ~~ > > I found the "Ensure the subscription is enabled..." comment and the > necessity for enabling the subscription to be confusing. > > Can't some complications all be eliminated just by creating the table > on the subscribe side first? > Hmm, that would make this test inconsistent with other tests and probably difficult to understand and extend. I don't like to say this but I think introducing disable_on_error has introduced more complexities in the patch due to the requirement of enabling subscription again and again. I feel it would be better without using disable_on_error option in these tests. One minor point: + format can be faster than the text format, but it is less portable + across machine architectures and PostgreSQL versions. As per email [1], it would be better to use tag here with PostgreSQL. [1] - https://www.postgresql.org/message-id/932629.1679322674%40sss.pgh.pa.us -- With Regards, Amit Kapila.
Re: Allow logical replication to copy tables in binary format
Here are my review comments for v18-0001 == doc/src/sgml/logical-replication.sgml 1. + target table. However, logical replication in binary format is more + restrictive. See the binary option of + CREATE SUBSCRIPTION + for details. Because you've changed the linkend to be the binary option, IMO now the part also needs to be modified. Otherwise, this page has multiple "CREATE SUBSCRIPTION" links which jump to different places, which just seems wrong to me. SUGGESTION (for the "See the" sentence) See the binary option of CREATE SUBSCRIPTION for details. == doc/src/sgml/ref/alter_subscription.sgml 2. + + See the binary option of + CREATE SUBSCRIPTION + for details about copying pre-existing data in binary format. + (Same as review comment #1 above) SUGGESTION See the binary option of CREATE SUBSCRIPTION for details about copying pre-existing data in binary format. == src/backend/replication/logical/tablesync.c 3. + /* + * Prior to v16, initial table synchronization will use text format even + * if the binary option is enabled for a subscription. + */ + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 16 && + MySubscription->binary) + { + appendStringInfoString(, " WITH (FORMAT binary)"); + options = lappend(options, + makeDefElem("format", + (Node *) makeString("binary"), -1)); + } I think there can only be 0 or 1 list element in 'options'. So, why does the code here use lappend(options,...) instead of just using list_make1(...)? == src/test/subscription/t/014_binary.pl 4. # - # Test mismatched column types with/without binary mode # - # Test syncing tables with mismatching column types $node_publisher->safe_psql( 'postgres', qq( CREATE TABLE public.test_mismatching_types ( a bigint PRIMARY KEY ); INSERT INTO public.test_mismatching_types (a) VALUES (1), (2); )); # Check the subscriber log from now on. $offset = -s $node_subscriber->logfile; # Ensure the subscription is enabled. disable_on_error is still true, # so the subscription can be disabled due to missing realtion until # the test_mismatching_types table is created. $node_subscriber->safe_psql( 'postgres', qq( CREATE TABLE public.test_mismatching_types ( a int PRIMARY KEY ); ALTER SUBSCRIPTION tsub ENABLE; ALTER SUBSCRIPTION tsub REFRESH PUBLICATION; )); ~~ I found the "Ensure the subscription is enabled..." comment and the necessity for enabling the subscription to be confusing. Can't some complications all be eliminated just by creating the table on the subscribe side first? For example, I rearranged that test (above fragment) like below and it still works OK for me: # - # Test mismatched column types with/without binary mode # - # Create the table on the subscriber side $node_subscriber->safe_psql( 'postgres', qq( CREATE TABLE public.test_mismatching_types ( a int PRIMARY KEY ))); # Check the subscriber log from now on. $offset = -s $node_subscriber->logfile; # Test syncing tables with mismatching column types $node_publisher->safe_psql( 'postgres', qq( CREATE TABLE public.test_mismatching_types ( a bigint PRIMARY KEY ); INSERT INTO public.test_mismatching_types (a) VALUES (1), (2); )); # Refresh the publication to trigger the tablesync $node_subscriber->safe_psql( 'postgres', qq( ALTER SUBSCRIPTION tsub REFRESH PUBLICATION; )); -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Allow logical replication to copy tables in binary format
Hi, Please see the attached patch. vignesh C , 18 Mar 2023 Cmt, 12:03 tarihinde şunu yazdı: > On Fri, 17 Mar 2023 at 17:55, Melih Mutlu wrote: > 1) Currently we refer the link to the beginning of create subscription > page, this can be changed to refer to binary option contents in create > subscription: > Done. > 2) Running pgperltidy shows the test script 014_binary.pl could be > slightly improved as in the attachment. > Couldn't apply the patch you attached but I ran pgperltidy myself and I guess the result should be similar. Peter Smith , 18 Mar 2023 Cmt, 12:41 tarihinde şunu yazdı: > Commit message > > 1. > Binary copy is supported for v16 or later. > Done as you suggested. Amit Kapila , 18 Mar 2023 Cmt, 13:11 tarihinde şunu yazdı: > On Sat, Mar 18, 2023 at 3:11 PM Peter Smith wrote: > > > > SUGGESTION > > If the publisher is v16 or later, then any initial table > > synchronization will use the same format as specified by the > > subscription binary mode. If the publisher is before v16, then any > > initial table synchronization will use text format regardless of the > > subscription binary mode. > > > > I agree that the previous comment should be updated but I would prefer > something along the lines: "Prior to v16, initial table > synchronization will use text format even if the binary option is > enabled for a subscription." > Done. Amit Kapila , 20 Mar 2023 Pzt, 05:13 tarihinde şunu yazdı: > On Mon, Mar 20, 2023 at 3:37 AM Peter Smith wrote: > > > > There are a couple of TAP tests where the copy binary is expected to > > fail. And when it fails, you do binary=false (changing the format back > > to 'text') so the test is then expected to be able to proceed. > > > > I don't know if this happens in practice, but IIUC in theory, if the > > timing is extremely bad, the tablesync could relaunch in binary mode > > multiple times (any fail multiple times?) before your binary=false > > change takes effect. > > > > So, I was wondering if it could help to use the subscription > > 'disable_on_error=true' parameter for those cases so that the > > tablesync won't needlessly attempt to relaunch until you have set > > binary=false and then re-enabled the subscription. > > > > +1. That would make tests more reliable. > Done. Hayato Kuroda (Fujitsu) , 20 Mar 2023 Pzt, 07:13 tarihinde şunu yazdı: > Dear Melih, > > Thank you for updating the patch. > I checked your added description about initial data sync and I think it's > OK. > > Few minor comments: > Fixed both comments. Thanks, -- Melih Mutlu Microsoft v18-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
RE: Allow logical replication to copy tables in binary format
Dear Melih, Thank you for updating the patch. I checked your added description about initial data sync and I think it's OK. Few minor comments: 01. copy_table ``` + List *options = NIL; ``` I found a unnecessary blank just after "List". You can remove it and align definition. 02. copy_table ``` + options = lappend(options, makeDefElem("format", (Node *) makeString("binary"), -1)); ``` The line seems to exceed 80 characters. How do you think to change like following? ``` options = lappend(options, makeDefElem("format", (Node *) makeString("binary"), -1)); ``` Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Allow logical replication to copy tables in binary format
On Mon, Mar 20, 2023 at 3:37 AM Peter Smith wrote: > > There are a couple of TAP tests where the copy binary is expected to > fail. And when it fails, you do binary=false (changing the format back > to 'text') so the test is then expected to be able to proceed. > > I don't know if this happens in practice, but IIUC in theory, if the > timing is extremely bad, the tablesync could relaunch in binary mode > multiple times (any fail multiple times?) before your binary=false > change takes effect. > > So, I was wondering if it could help to use the subscription > 'disable_on_error=true' parameter for those cases so that the > tablesync won't needlessly attempt to relaunch until you have set > binary=false and then re-enabled the subscription. > +1. That would make tests more reliable. -- With Regards, Amit Kapila.
Re: Allow logical replication to copy tables in binary format
There are a couple of TAP tests where the copy binary is expected to fail. And when it fails, you do binary=false (changing the format back to 'text') so the test is then expected to be able to proceed. I don't know if this happens in practice, but IIUC in theory, if the timing is extremely bad, the tablesync could relaunch in binary mode multiple times (any fail multiple times?) before your binary=false change takes effect. So, I was wondering if it could help to use the subscription 'disable_on_error=true' parameter for those cases so that the tablesync won't needlessly attempt to relaunch until you have set binary=false and then re-enabled the subscription. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Allow logical replication to copy tables in binary format
On Sat, Mar 18, 2023 at 3:11 PM Peter Smith wrote: > > On Fri, Mar 17, 2023 at 11:25 PM Melih Mutlu wrote: > == > src/backend/replication/logical/tablesync.c > > 2. > @@ -1168,6 +1170,15 @@ copy_table(Relation rel) > > appendStringInfoString(, ") TO STDOUT"); > } > + > + /* The binary option for replication is supported since v16 */ > + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 16 && > + MySubscription->binary) > + { > + appendStringInfoString(, " WITH (FORMAT binary)"); > + options = lappend(options, makeDefElem("format", (Node *) > makeString("binary"), -1)); > + } > > > Logical replication binary mode was introduced in v14, so the old > comment ("The binary option for replication is supported since v14") > was correct. Unfortunately, after changing the code check to 16000, I > think the new comment ("The binary option for replication is supported > since v16") became incorrect, and so it needs some rewording. Maybe it > should say something like below: > > SUGGESTION > If the publisher is v16 or later, then any initial table > synchronization will use the same format as specified by the > subscription binary mode. If the publisher is before v16, then any > initial table synchronization will use text format regardless of the > subscription binary mode. > I agree that the previous comment should be updated but I would prefer something along the lines: "Prior to v16, initial table synchronization will use text format even if the binary option is enabled for a subscription." -- With Regards, Amit Kapila.
Re: Allow logical replication to copy tables in binary format
On Fri, Mar 17, 2023 at 11:25 PM Melih Mutlu wrote: > > Hi, > > Sharing v17. > > Amit Kapila , 17 Mar 2023 Cum, 03:02 tarihinde şunu > yazdı: >> >> I think to reduce the risk of breakage, let's change the check to >> >=v16. Also, accordingly, update the doc and commit message. > > > Done. > Here are my review comments for v17-0001 == Commit message 1. Binary copy is supported for v16 or later. ~ As written that's very general and not quite correct. E.g. COPY ... WITH (FORMAT binary) has been available for a long time. IMO that commit message sentence ought to be more specific. SUGGESTION Binary copy for logical replication table synchronization is supported only when both publisher and subscriber are v16 or later. == src/backend/replication/logical/tablesync.c 2. @@ -1168,6 +1170,15 @@ copy_table(Relation rel) appendStringInfoString(, ") TO STDOUT"); } + + /* The binary option for replication is supported since v16 */ + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 16 && + MySubscription->binary) + { + appendStringInfoString(, " WITH (FORMAT binary)"); + options = lappend(options, makeDefElem("format", (Node *) makeString("binary"), -1)); + } Logical replication binary mode was introduced in v14, so the old comment ("The binary option for replication is supported since v14") was correct. Unfortunately, after changing the code check to 16000, I think the new comment ("The binary option for replication is supported since v16") became incorrect, and so it needs some rewording. Maybe it should say something like below: SUGGESTION If the publisher is v16 or later, then any initial table synchronization will use the same format as specified by the subscription binary mode. If the publisher is before v16, then any initial table synchronization will use text format regardless of the subscription binary mode. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Allow logical replication to copy tables in binary format
On Fri, 17 Mar 2023 at 17:55, Melih Mutlu wrote: > > Hi, > > Sharing v17. > > Amit Kapila , 17 Mar 2023 Cum, 03:02 tarihinde şunu > yazdı: >> >> I think to reduce the risk of breakage, let's change the check to >> >=v16. Also, accordingly, update the doc and commit message. > > > Done. > > Peter Smith , 17 Mar 2023 Cum, 04:58 tarihinde şunu > yazdı: >> >> IMO the sentence "However, logical replication in binary format is >> more restrictive." should just be plain text. > > > Done. > > shiy.f...@fujitsu.com , 17 Mar 2023 Cum, 05:26 > tarihinde şunu yazdı: >> >> It looks that you forgot to pass `offset` into wait_for_log(). > > > Yes, I somehow didn't include those lines into the patch. Thanks for > noticing. Fixed them now. Thanks for the updated patch, few comments: 1) Currently we refer the link to the beginning of create subscription page, this can be changed to refer to binary option contents in create subscription: + + See the binary option of + CREATE SUBSCRIPTION + for details about copying pre-existing data in binary format. + 2) Running pgperltidy shows the test script 014_binary.pl could be slightly improved as in the attachment. Regards, Vignesh diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index bad25e54cd..9c96f900b4 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -253,7 +253,7 @@ will be filled with the default value as specified in the definition of the target table. However, logical replication in binary format is more restrictive. See the binary option of - CREATE SUBSCRIPTION + CREATE SUBSCRIPTION for details. diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index ccd4ed6f59..fb0ddd2d7f 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -179,7 +179,7 @@ ALTER SUBSCRIPTION name RENAME TO < See the binary option of - CREATE SUBSCRIPTION + CREATE SUBSCRIPTION for details about copying pre-existing data in binary format. diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index e0be2b586f..edf52fa7d4 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -185,7 +185,7 @@ CREATE SUBSCRIPTION subscription_name - + binary (boolean) diff --git a/src/test/subscription/t/014_binary.pl b/src/test/subscription/t/014_binary.pl index d20f94b8d3..f5fa5976e1 100644 --- a/src/test/subscription/t/014_binary.pl +++ b/src/test/subscription/t/014_binary.pl @@ -57,16 +57,18 @@ $node_publisher->safe_psql( my $publisher_connstring = $node_publisher->connstr . ' dbname=postgres'; $node_subscriber->safe_psql('postgres', - "CREATE SUBSCRIPTION tsub CONNECTION '$publisher_connstring' " + "CREATE SUBSCRIPTION tsub CONNECTION '$publisher_connstring' " . "PUBLICATION tpub WITH (slot_name = tpub_slot, binary = true)"); # Ensure the COPY command is executed in binary format on the publisher -$node_publisher->wait_for_log(qr/LOG: ( [A-Z0-9]+:)? statement: COPY (.+)? TO STDOUT WITH \(FORMAT binary\)/); +$node_publisher->wait_for_log( + qr/LOG: ( [A-Z0-9]+:)? statement: COPY (.+)? TO STDOUT WITH \(FORMAT binary\)/ +); # Ensure nodes are in sync with each other $node_subscriber->wait_for_subscription_sync($node_publisher, 'tsub'); -my $sync_check = qq( +my $sync_check = qq( SELECT a, b, c, d FROM test_numerical ORDER BY a; SELECT a, b, c FROM test_arrays ORDER BY a; ); @@ -207,10 +209,12 @@ $node_publisher->safe_psql( my $offset = -s $node_subscriber->logfile; # Refresh the publication to trigger the tablesync -$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tsub REFRESH PUBLICATION"); +$node_subscriber->safe_psql('postgres', + "ALTER SUBSCRIPTION tsub REFRESH PUBLICATION"); # It should fail -$node_subscriber->wait_for_log(qr/ERROR: ( [A-Z0-9]+:)? no binary input function available for type/, +$node_subscriber->wait_for_log( + qr/ERROR: ( [A-Z0-9]+:)? no binary input function available for type/, $offset); # Create and set send/rcv functions for the custom type @@ -231,9 +235,10 @@ $node_subscriber->safe_psql('postgres', $ddl); $node_subscriber->wait_for_subscription_sync($node_publisher, 'tsub'); # Check the synced data on the subscriber -$result = $node_subscriber->safe_psql('postgres', 'SELECT a FROM test_myvarchar;'); +$result = + $node_subscriber->safe_psql('postgres', 'SELECT a FROM test_myvarchar;'); -is( $result, 'a', 'check synced data on subscriber with custom type'); +is($result, 'a', 'check synced data on subscriber with custom type'); # - # Test mismatched column types with/without binary mode @@ -241,7
Re: Allow logical replication to copy tables in binary format
Hi, Sharing v17. Amit Kapila , 17 Mar 2023 Cum, 03:02 tarihinde şunu yazdı: > I think to reduce the risk of breakage, let's change the check to > >=v16. Also, accordingly, update the doc and commit message. > Done. Peter Smith , 17 Mar 2023 Cum, 04:58 tarihinde şunu yazdı: > IMO the sentence "However, logical replication in binary format is > more restrictive." should just be plain text. > Done. shiy.f...@fujitsu.com , 17 Mar 2023 Cum, 05:26 tarihinde şunu yazdı: > It looks that you forgot to pass `offset` into wait_for_log(). Yes, I somehow didn't include those lines into the patch. Thanks for noticing. Fixed them now. Thanks, -- Melih Mutlu Microsoft v17-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
Re: Allow logical replication to copy tables in binary format
On Fri, Mar 17, 2023 at 6:42 AM Peter Smith wrote: > > On Thu, Mar 16, 2023 at 1:55 PM Amit Kapila wrote: > > > > On Wed, Mar 15, 2023 at 3:33 PM Melih Mutlu wrote: > > > > > > Amit Kapila , 15 Mar 2023 Çar, 12:31 tarihinde > > > şunu yazdı: > > >> > > >> On Tue, Mar 14, 2023 at 4:32 PM Melih Mutlu > > >> wrote: > > > > > > > > >> > > >> What purpose does this test serve w.r.t this patch? Before checking > > >> the sync for different column orders, the patch has already changed > > >> binary to false, so it doesn't seem to test the functionality of this > > >> patch. Am, I missing something? > > > > > > > > > I missed that binary has changed to false before testing column orders. I > > > moved that test case up before changing binary to false. > > > Please see v14 [1]. > > > > > > > After thinking some more about this test, I don't think we need this > > test as this doesn't add any value to this patch. This tests the > > column orders which is well-established functionality of the apply > > worker. > > > > I agree that different column order is a "well-established > functionality of the apply worker". > > But when I searched the TAP tests I could not find any existing tests > that check the combination of > - different column orders > - CREATE SUBSCRIPTION with parameters binary=true and copy_data=true > > So there seemed to be a gap in the test coverage, which is why I suggested it. > > I guess that test was not strictly tied to this patch. Should I post > this new test suggestion as a separate thread or do you think there is > no point because it will not get any support? > Personally, I don't think we need to test every possible combination unless it is really achieving something meaningful. In this particular case, I don't see the need or maybe I am missing something. -- With Regards, Amit Kapila.
RE: Allow logical replication to copy tables in binary format
On Thu, Mar 16, 2023 9:20 PM Melih Mutlu wrote: > > Hi, > > Please see the attached v16. > Thanks for updating the patch. +# Cannot sync due to type mismatch +$node_subscriber->wait_for_log(qr/ERROR: ( [A-Z0-9]+:)? incorrect binary data format/); +# Ensure the COPY command is executed in text format on the publisher +$node_publisher->wait_for_log(qr/LOG: ( [A-Z0-9]+:)? statement: COPY (.+)? TO STDOUT\n/); It looks that you forgot to pass `offset` into wait_for_log(). Regards, Shi Yu
Re: Allow logical replication to copy tables in binary format
On Fri, Mar 17, 2023 at 12:20 AM Melih Mutlu wrote: > > Hi, > > Please see the attached v16. > > Peter Smith , 16 Mar 2023 Per, 03:03 tarihinde şunu > yazdı: >> >> Here are some review comments for v15-0001 > > > I applied your comments in the updated patch. Thanks. I checked patchv16-0001 and have only one minor comment. == doc/src/sgml/logical-replication.sgml 1. diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index 6b0e300adc..bad25e54cd 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -251,7 +251,10 @@ column of type bigint. The target table can also have additional columns not provided by the published table. Any such columns will be filled with the default value as specified in the definition of the - target table. + target table. However, logical replication in binary + format is more restrictive. See the binary option of + CREATE SUBSCRIPTION + for details. IMO the sentence "However, logical replication in binary format is more restrictive." should just be plain text. There should not be the binary markup in that 1st sentence. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Allow logical replication to copy tables in binary format
On Thu, Mar 16, 2023 at 1:55 PM Amit Kapila wrote: > > On Wed, Mar 15, 2023 at 3:33 PM Melih Mutlu wrote: > > > > Amit Kapila , 15 Mar 2023 Çar, 12:31 tarihinde > > şunu yazdı: > >> > >> On Tue, Mar 14, 2023 at 4:32 PM Melih Mutlu wrote: > > > > > >> > >> What purpose does this test serve w.r.t this patch? Before checking > >> the sync for different column orders, the patch has already changed > >> binary to false, so it doesn't seem to test the functionality of this > >> patch. Am, I missing something? > > > > > > I missed that binary has changed to false before testing column orders. I > > moved that test case up before changing binary to false. > > Please see v14 [1]. > > > > After thinking some more about this test, I don't think we need this > test as this doesn't add any value to this patch. This tests the > column orders which is well-established functionality of the apply > worker. > I agree that different column order is a "well-established functionality of the apply worker". But when I searched the TAP tests I could not find any existing tests that check the combination of - different column orders - CREATE SUBSCRIPTION with parameters binary=true and copy_data=true So there seemed to be a gap in the test coverage, which is why I suggested it. I guess that test was not strictly tied to this patch. Should I post this new test suggestion as a separate thread or do you think there is no point because it will not get any support? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Allow logical replication to copy tables in binary format
On Thu, Mar 16, 2023 at 6:59 PM Melih Mutlu wrote: > > Amit Kapila , 16 Mar 2023 Per, 06:25 tarihinde şunu > yazdı: >> >> On Thu, Mar 16, 2023 at 8:27 AM Euler Taveira wrote: >> > >> > On Wed, Mar 8, 2023, at 11:50 PM, Amit Kapila wrote: >> > >> > It is not clear to me which version check you wanted to add because we >> > seem to have a binary option in COPY from the time prior to logical >> > replication. I feel we need a publisher version 14 check as that is >> > where we start to support binary mode transfer in logical replication. >> > See the check in function libpqrcv_startstreaming(). >> > >> > ... then you are breaking existent cases. Even if you have a convincing >> > argument, you are introducing a behavior change in prior versions (commit >> > messages should always indicate that you are breaking backward >> > compatibility). >> > >> > + >> > + /* >> > +* The binary option for replication is supported since v14 >> > +*/ >> > + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 && >> > + MySubscription->binary) >> > + { >> > + appendStringInfo(, " WITH (FORMAT binary)"); >> > + options = lappend(options, makeDefElem("format", (Node *) >> > makeString("binary"), -1)); >> > + } >> > + >> > >> > What are the arguments to support since v14 instead of the to-be-released >> > version? I read the thread but it is not clear. It was said about the >> > restrictive nature of this feature and it will be frustrating to see that >> > the >> > same setup (with the same commands) works with v14 and v15 but it doesn't >> > with >> > v16. >> > >> >> If the failure has to happen it will anyway happen later when the >> publisher will be upgraded to v16. The reason for the version checks >> as v14 was to allow the initial sync from the same version where the >> binary mode for replication was introduced. However, if we expect >> failures in the existing setup, I am fine with supporting this for >= >> v16. > > > Upgrading the subscriber to v16 and keeping the subscriber in v14 could break > existing subscriptions. I don't know how likely such a case is. > > I don't have a strong preference on this. What do you think? Should we change > it >=v16 or keep it as it is? > I think to reduce the risk of breakage, let's change the check to >=v16. Also, accordingly, update the doc and commit message. -- With Regards, Amit Kapila.
Re: Allow logical replication to copy tables in binary format
Amit Kapila , 16 Mar 2023 Per, 06:25 tarihinde şunu yazdı: > On Thu, Mar 16, 2023 at 8:27 AM Euler Taveira wrote: > > > > On Wed, Mar 8, 2023, at 11:50 PM, Amit Kapila wrote: > > > > It is not clear to me which version check you wanted to add because we > > seem to have a binary option in COPY from the time prior to logical > > replication. I feel we need a publisher version 14 check as that is > > where we start to support binary mode transfer in logical replication. > > See the check in function libpqrcv_startstreaming(). > > > > ... then you are breaking existent cases. Even if you have a convincing > > argument, you are introducing a behavior change in prior versions (commit > > messages should always indicate that you are breaking backward > compatibility). > > > > + > > + /* > > +* The binary option for replication is supported since v14 > > +*/ > > + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 && > > + MySubscription->binary) > > + { > > + appendStringInfo(, " WITH (FORMAT binary)"); > > + options = lappend(options, makeDefElem("format", (Node *) > makeString("binary"), -1)); > > + } > > + > > > > What are the arguments to support since v14 instead of the to-be-released > > version? I read the thread but it is not clear. It was said about the > > restrictive nature of this feature and it will be frustrating to see > that the > > same setup (with the same commands) works with v14 and v15 but it > doesn't with > > v16. > > > > If the failure has to happen it will anyway happen later when the > publisher will be upgraded to v16. The reason for the version checks > as v14 was to allow the initial sync from the same version where the > binary mode for replication was introduced. However, if we expect > failures in the existing setup, I am fine with supporting this for >= > v16. > Upgrading the subscriber to v16 and keeping the subscriber in v14 could break existing subscriptions. I don't know how likely such a case is. I don't have a strong preference on this. What do you think? Should we change it >=v16 or keep it as it is? Best, -- Melih Mutlu Microsoft
Re: Allow logical replication to copy tables in binary format
Hi, Please see the attached v16. Peter Smith , 16 Mar 2023 Per, 03:03 tarihinde şunu yazdı: > Here are some review comments for v15-0001 > I applied your comments in the updated patch. shiy.f...@fujitsu.com , 16 Mar 2023 Per, 05:35 tarihinde şunu yazdı: > On Thu, Mar 16, 2023 2:26 AM Melih Mutlu wrote: > > > > Right, it needs to be ordered. Fixed. > > > > Hi, > > Thanks for updating the patch. I tested some cases like toast data, > combination > of row filter and column lists, and it works well. > Thanks for testing. I changed wait_for_log lines as you suggested. houzj.f...@fujitsu.com , 16 Mar 2023 Per, 05:55 tarihinde şunu yazdı: > 1. > + { > + appendStringInfo(, " WITH (FORMAT binary)"); > > We could use appendStringInfoString here. > Done. > 2. > I think it would be better to pass the log offset when using wait_for_log, > because otherwise it will check the whole log file to find the target > message, > This might not be a big problem, but it has a risk of getting unexpected > log message > which was generated by previous commands. > You're right. I added offsets for wait_for_log's . Thanks, -- Melih Mutlu Microsoft v16-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
Re: Allow logical replication to copy tables in binary format
On Thu, Mar 16, 2023 at 8:27 AM Euler Taveira wrote: > > On Wed, Mar 8, 2023, at 11:50 PM, Amit Kapila wrote: > > It is not clear to me which version check you wanted to add because we > seem to have a binary option in COPY from the time prior to logical > replication. I feel we need a publisher version 14 check as that is > where we start to support binary mode transfer in logical replication. > See the check in function libpqrcv_startstreaming(). > > ... then you are breaking existent cases. Even if you have a convincing > argument, you are introducing a behavior change in prior versions (commit > messages should always indicate that you are breaking backward compatibility). > > + > + /* > +* The binary option for replication is supported since v14 > +*/ > + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 && > + MySubscription->binary) > + { > + appendStringInfo(, " WITH (FORMAT binary)"); > + options = lappend(options, makeDefElem("format", (Node *) > makeString("binary"), -1)); > + } > + > > What are the arguments to support since v14 instead of the to-be-released > version? I read the thread but it is not clear. It was said about the > restrictive nature of this feature and it will be frustrating to see that the > same setup (with the same commands) works with v14 and v15 but it doesn't with > v16. > If the failure has to happen it will anyway happen later when the publisher will be upgraded to v16. The reason for the version checks as v14 was to allow the initial sync from the same version where the binary mode for replication was introduced. However, if we expect failures in the existing setup, I am fine with supporting this for >= v16. > IMO it should be >= 16 and documentation should explain that v14/v15 uses > text format during initial table synchronization even if binary = true. > Yeah, if we change the version then the corresponding text in the patch should also be changed. > Should there be a fallback mode (text) if initial table synchronization failed > because of the binary option? Maybe a different setting (auto) to support such > behavior. > I think the workaround is that the user disables binary mode for the time of initial sync. I think if we want to extend and add a fallback (text) mode then it is better to keep it as default behavior rather than introducing a new setting like 'auto'. Personally, I feel it can be added later after doing some more study. -- With Regards, Amit Kapila.
Re: Allow logical replication to copy tables in binary format
On Wed, Mar 8, 2023, at 11:50 PM, Amit Kapila wrote: > On Wed, Mar 8, 2023 at 6:17 PM Melih Mutlu wrote: > > > > On 7 Mar 2023 Tue at 04:10 Amit Kapila wrote: > >> > >> As per what I could read in this thread, most people prefer to use the > >> existing binary option rather than inventing a new way (option) to > >> binary copy in the initial sync phase. Do you agree? > > > > > > I agree. > > What do you think about the version checks? I removed any kind of check > > since it’s currently a different option. Should we check publisher version > > before doing binary copy to ensure that the publisher node supports binary > > option of COPY command? > > > > It is not clear to me which version check you wanted to add because we > seem to have a binary option in COPY from the time prior to logical > replication. I feel we need a publisher version 14 check as that is > where we start to support binary mode transfer in logical replication. > See the check in function libpqrcv_startstreaming(). ... then you are breaking existent cases. Even if you have a convincing argument, you are introducing a behavior change in prior versions (commit messages should always indicate that you are breaking backward compatibility). + + /* +* The binary option for replication is supported since v14 +*/ + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 && + MySubscription->binary) + { + appendStringInfo(, " WITH (FORMAT binary)"); + options = lappend(options, makeDefElem("format", (Node *) makeString("binary"), -1)); + } + What are the arguments to support since v14 instead of the to-be-released version? I read the thread but it is not clear. It was said about the restrictive nature of this feature and it will be frustrating to see that the same setup (with the same commands) works with v14 and v15 but it doesn't with v16. IMO it should be >= 16 and documentation should explain that v14/v15 uses text format during initial table synchronization even if binary = true. Should there be a fallback mode (text) if initial table synchronization failed because of the binary option? Maybe a different setting (auto) to support such behavior. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: Allow logical replication to copy tables in binary format
On Wed, Mar 15, 2023 at 3:33 PM Melih Mutlu wrote: > > Amit Kapila , 15 Mar 2023 Çar, 12:31 tarihinde şunu > yazdı: >> >> On Tue, Mar 14, 2023 at 4:32 PM Melih Mutlu wrote: > > >> >> What purpose does this test serve w.r.t this patch? Before checking >> the sync for different column orders, the patch has already changed >> binary to false, so it doesn't seem to test the functionality of this >> patch. Am, I missing something? > > > I missed that binary has changed to false before testing column orders. I > moved that test case up before changing binary to false. > Please see v14 [1]. > After thinking some more about this test, I don't think we need this test as this doesn't add any value to this patch. This tests the column orders which is well-established functionality of the apply worker. -- With Regards, Amit Kapila.
RE: Allow logical replication to copy tables in binary format
Hi, Thanks for updating the patch, I think it is a useful feature. I looked at the v15 patch and the patch looks mostly good to me. Here are few comments: 1. + { + appendStringInfo(, " WITH (FORMAT binary)"); We could use appendStringInfoString here. 2. +# It should fail +$node_subscriber->wait_for_log(qr/ERROR: ( [A-Z0-9]+:)? no binary input function available for type/); ... +# Cannot sync due to type mismatch +$node_subscriber->wait_for_log(qr/ERROR: ( [A-Z0-9]+:)? incorrect binary data format/); ... +# Ensure the COPY command is executed in text format on the publisher +$node_publisher->wait_for_log(qr/LOG: ( [a-z0-9]+:)? COPY (.+)? TO STDOUT\n/); I think it would be better to pass the log offset when using wait_for_log, because otherwise it will check the whole log file to find the target message, This might not be a big problem, but it has a risk of getting unexpected log message which was generated by previous commands. Best Regards, Hou zj
Re: Allow logical replication to copy tables in binary format
On Thu, Mar 16, 2023 at 5:59 AM Peter Smith wrote: > > On Wed, Mar 15, 2023 at 5:31 PM Amit Kapila wrote: > > > > On Wed, Mar 15, 2023 at 11:52 AM Peter Smith wrote: > > > > > > == > > > src/backend/replication/logical/tablesync.c > > > > > > 5. > > > + > > > + /* > > > + * If the publisher version is earlier than v14, it COPY command doesn't > > > + * support the binary option. > > > + */ > > > + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 && > > > + MySubscription->binary) > > > + { > > > + appendStringInfo(, " WITH (FORMAT binary)"); > > > + options = lappend(options, makeDefElem("format", (Node *) > > > makeString("binary"), -1)); > > > + } > > > > > > Sorry, I gave a poor review comment for this previously. Now I have > > > revisited all the thread discussions about version checking. I feel > > > that some explanation should be given in the code comment so that > > > future readers of this code can understand why you decided to use v14 > > > checking. > > > > > > Something like this: > > > > > > SUGGESTION > > > If the publisher version is earlier than v14, we use text format COPY. > > > > > > > I think this isn't explicit that we supported the binary format since > > v14. So, I would prefer my version of the comment as suggested in the > > previous email. > > > > Hmm, but my *full* suggestion was bigger than what is misquoted above, > and it certainly did say " logical replication binary mode transfer > was not introduced until v14". > > SUGGESTION > If the publisher version is earlier than v14, we use text format COPY. > Note - In fact COPY syntax "WITH (FORMAT binary)" has existed since > v9, but since the logical replication binary mode transfer was not > introduced until v14 it was decided to check using the later version. > I find this needlessly verbose. > ~~ > > Anyway, the shortened comment as in the latest v15 patch is fine by me too. > Okay, then let's go with that. -- With Regards, Amit Kapila.
RE: Allow logical replication to copy tables in binary format
On Thu, Mar 16, 2023 2:26 AM Melih Mutlu wrote: > > Right, it needs to be ordered. Fixed. > Hi, Thanks for updating the patch. I tested some cases like toast data, combination of row filter and column lists, and it works well. Here is a comment: +# Ensure the COPY command is executed in binary format on the publisher +$node_publisher->wait_for_log(qr/LOG: ( [a-z0-9]+:)? COPY (.+)? TO STDOUT WITH \(FORMAT binary\)/); The test failed with `log_error_verbosity = verbose` because it couldn't match the following log: 2023-03-16 09:45:50.096 CST [2499415] pg_16398_sync_16391_7210954376230900539 LOG: 0: statement: COPY public.test_arrays (a, b, c) TO STDOUT WITH (FORMAT binary) I think we should make it pass, see commit 19408aae7f. Should it be changed to: $node_publisher->wait_for_log(qr/LOG: ( [A-Z0-9]+:)? statement: COPY (.+)? TO STDOUT WITH \(FORMAT binary\)/); Besides, for the same reason, this line also needs to be modified. +$node_publisher->wait_for_log(qr/LOG: ( [a-z0-9]+:)? COPY (.+)? TO STDOUT\n/); Regards, Shi Yu
Re: Allow logical replication to copy tables in binary format
On Wed, Mar 15, 2023 at 5:31 PM Amit Kapila wrote: > > On Wed, Mar 15, 2023 at 11:52 AM Peter Smith wrote: > > > > == > > src/backend/replication/logical/tablesync.c > > > > 5. > > + > > + /* > > + * If the publisher version is earlier than v14, it COPY command doesn't > > + * support the binary option. > > + */ > > + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 && > > + MySubscription->binary) > > + { > > + appendStringInfo(, " WITH (FORMAT binary)"); > > + options = lappend(options, makeDefElem("format", (Node *) > > makeString("binary"), -1)); > > + } > > > > Sorry, I gave a poor review comment for this previously. Now I have > > revisited all the thread discussions about version checking. I feel > > that some explanation should be given in the code comment so that > > future readers of this code can understand why you decided to use v14 > > checking. > > > > Something like this: > > > > SUGGESTION > > If the publisher version is earlier than v14, we use text format COPY. > > > > I think this isn't explicit that we supported the binary format since > v14. So, I would prefer my version of the comment as suggested in the > previous email. > Hmm, but my *full* suggestion was bigger than what is misquoted above, and it certainly did say " logical replication binary mode transfer was not introduced until v14". SUGGESTION If the publisher version is earlier than v14, we use text format COPY. Note - In fact COPY syntax "WITH (FORMAT binary)" has existed since v9, but since the logical replication binary mode transfer was not introduced until v14 it was decided to check using the later version. ~~ Anyway, the shortened comment as in the latest v15 patch is fine by me too. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Allow logical replication to copy tables in binary format
Here are some review comments for v15-0001 == doc/src/sgml/logical-replication.sgml 1. + target table. However, logical replication in binary format is more restrictive, + see binary option of + CREATE SUBSCRIPTION + for more details. IMO (and Chat-GPT agrees) the new text should be 2 sentences. Also, I changed "more details" --> "details" because none are provided here,. SUGGESTION However, logical replication in binary format is more restrictive. See the binary option of CREATE SUBSCRIPTION for details. == doc/src/sgml/ref/alter_subscription.sgml 2. + + See binary option of + for details of copying pre-existing data in binary format. + Should the link should be defined more like you did above using the markup to get the better font? SUGGESTION (also minor rewording) See the binary option of CREATE SUBSCRIPTION for details about copying pre-existing data in binary format. == doc/src/sgml/ref/create_subscription.sgml 3. - Specifies whether the subscription will request the publisher to - send the data in binary format (as opposed to text). - The default is false. - Even when this option is enabled, only data types having - binary send and receive functions will be transferred in binary. + Specifies whether the subscription will request the publisher to send + the data in binary format (as opposed to text). The default is + false. Any initial table synchronization copy + (see copy_data) also uses the same format. Binary + format can be faster than the text format, but it is less portable + across machine architectures and PostgreSQL versions. Binary format + is very data type specific; for example, it will not allow copying + from a smallint column to an integer column, even though that would + work fine in text format. Even when this option is enabled, only data + types having binary send and receive functions will be transferred in + binary. Note that the initial synchronization requires all data types + to have binary send and receive functions, otherwise the synchronization + will fail (see for more about + send/receive functions). IMO that part saying "from a smallint column to an integer column" should have markups for "smallint" and "integer". == src/backend/replication/logical/tablesync.c 4. + /* + * The binary option for replication is supported since v14 + */ + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 && + MySubscription->binary) + { + appendStringInfo(, " WITH (FORMAT binary)"); + options = lappend(options, makeDefElem("format", (Node *) makeString("binary"), -1)); + } Should this now be a single-line comment instead of spanning 3 lines? == src/test/subscription/t/014_binary.pl 5. Everything looked OK to me, but the TAP file has only small comments for each test step, which forces you to read everything from top-to-bottom to understand what is going on. I felt it might be easier to understand the tests if you add a few "bigger" comments just to break the tests into the categories being tested. For example, something like: # -- # Ensure binary mode also executes COPY in binary format # -- ~ # -- # Ensure normal binary replication works # -- ~ # -- # Use ALTER SUBSCRIPTION to change to text format and then back to binary format # -- ~ # --- # Test binary replication without and with send/receive functions # --- ~ # -- # Test different column orders on pub/sub tables # -- ~ # - # Test mismatched column types with/without binary mode # - -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Allow logical replication to copy tables in binary format
Hi, vignesh C , 15 Mar 2023 Çar, 18:12 tarihinde şunu yazdı: > One comment: > 1) There might be a chance the result order of select may vary as > "ORDER BY" is not specified, Should we add "ORDER BY" as the table > has multiple rows: > +# Check the synced data on the subscriber > +$result = $node_subscriber->safe_psql('postgres', 'SELECT a,b FROM > +public.test_col_order;'); > + > +is( $result, '1|2 > +3|4', 'check synced data on subscriber for different column order'); > Right, it needs to be ordered. Fixed. Thanks, -- Melih Mutlu Microsoft v15-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
Re: Allow logical replication to copy tables in binary format
On Wed, 15 Mar 2023 at 15:30, Melih Mutlu wrote: > > Hi, > > Please see the attached patch. One comment: 1) There might be a chance the result order of select may vary as "ORDER BY" is not specified, Should we add "ORDER BY" as the table has multiple rows: +# Check the synced data on the subscriber +$result = $node_subscriber->safe_psql('postgres', 'SELECT a,b FROM +public.test_col_order;'); + +is( $result, '1|2 +3|4', 'check synced data on subscriber for different column order'); Regards, Vignesh
Re: Allow logical replication to copy tables in binary format
Amit Kapila , 15 Mar 2023 Çar, 12:31 tarihinde şunu yazdı: > On Tue, Mar 14, 2023 at 4:32 PM Melih Mutlu > wrote: > > What purpose does this test serve w.r.t this patch? Before checking > the sync for different column orders, the patch has already changed > binary to false, so it doesn't seem to test the functionality of this > patch. Am, I missing something? > I missed that binary has changed to false before testing column orders. I moved that test case up before changing binary to false. Please see v14 [1]. [1] https://www.postgresql.org/message-id/CAGPVpCTaXYctCUp3z%3D_BstonHiZcC5Jj7584i7B8jeZQq4RJkw%40mail.gmail.com Thanks, -- Melih Mutlu Microsoft
Re: Allow logical replication to copy tables in binary format
Hi, Please see v14 [1]. Peter Smith , 15 Mar 2023 Çar, 09:22 tarihinde şunu yazdı: > Here are some review comments for v13-0001 > > == > doc/src/sgml/logical-replication.sgml > > 1. > That's why in the previous v12 review [1] (comment #3) I suggested > that this page should just say something quite generic like "However, > replication in binary format is more restrictive", and link back to > the other page which has all the gory details. > You're right. Changed it with what you suggested. > 2. > My previous v12 review [1] (comment #6) suggested maybe updating this > page. But it was not done in v13. Did you accidentally miss the review > comment, or chose not to do it? > Sorry, I missed this. Added a line leading to CREATE SUBSCRIPTION doc. > == > doc/src/sgml/ref/create_subscription.sgml > > 3. > BEFORE > Binary format is also very data type specific, it will not allow > copying between different column types as opposed to text format. > > SUGGESTION (worded more similar to what is already on the COPY page [2]) > Binary format is very data type specific; for example, it will not > allow copying from a smallint column to an integer column, even though > that would work fine in text format. > Done. > 4. > SUGGESTION > If the publisher is a PostgreSQL version > before 14, then any initial table synchronization will use text format > even if binary = true. > Done. > SUGGESTION > If the publisher version is earlier than v14, we use text format COPY. > Note - In fact COPY syntax "WITH (FORMAT binary)" has existed since > v9, but since the logical replication binary mode transfer was not > introduced until v14 it was decided to check using the later version. > Changed it as suggested here [2]. [1] https://www.postgresql.org/message-id/CAGPVpCTaXYctCUp3z%3D_BstonHiZcC5Jj7584i7B8jeZQq4RJkw%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAA4eK1%2BC7ykvdBxh_t1BdbX5Da1bM1BgsE%3D-i2koPkd3pSid0A%40mail.gmail.com Thanks, -- Melih Mutlu Microsoft
Re: Allow logical replication to copy tables in binary format
Hi, Please see the attached patch. Takamichi Osumi (Fujitsu) , 14 Mar 2023 Sal, 18:20 tarihinde şunu yazdı: > (1) create_subscription.sgml > > + column types as opposed to text format. Even when this option > is enabled, > + only data types having binary send and receive functions will be > + transferred in binary. Note that the initial synchronization > requires > > (1-1) > > I think it's helpful to add a reference for the description about send and > receive functions (e.g. to the page of CREATE TYPE). > Done. > > (1-2) > > Also, it would be better to have a cross reference from there to this doc > as one paragraph probably in "Notes". I suggested this, because send and > receive functions are described as "optional" there and missing them leads > to error in the context of binary table synchronization. > I'm not sure whether this is necessary. In case of missing send/receive functions, error logs are already clear about what's wrong and logical replication docs also explain what could go wrong with binary. > (3) copy_table() > > + /* > +* If the publisher version is earlier than v14, it COPY command > doesn't > +* support the binary option. > +*/ > > This sentence doesn't look correct grammatically. We can replace "it COPY > command" with "subscription" for example. Kindly please fix it. > Changed this with Amit's suggestion [1]. [1] https://www.postgresql.org/message-id/CAA4eK1%2BC7ykvdBxh_t1BdbX5Da1bM1BgsE%3D-i2koPkd3pSid0A%40mail.gmail.com Thanks, -- Melih Mutlu Microsoft v14-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
Re: Allow logical replication to copy tables in binary format
On Tue, Mar 14, 2023 at 4:32 PM Melih Mutlu wrote: > > Attached v13. > I have a question related to the below test in the patch: +# Setting binary to false should allow syncing +$node_subscriber->safe_psql( +'postgres', qq( +ALTER SUBSCRIPTION tsub SET (binary = false);)); + +# Ensure the COPY command is executed in text format on the publisher +$node_publisher->wait_for_log(qr/LOG: ( [a-z0-9]+:)? COPY (.+)? TO STDOUT\n/); + +$node_subscriber->wait_for_subscription_sync($node_publisher, 'tsub'); + +# Check the synced data on the subscriber +$result = $node_subscriber->safe_psql('postgres', 'SELECT a FROM test_mismatching_types ORDER BY a;'); + +is( $result, '1 +2', 'check synced data on subscriber with binary = false'); + +# Test syncing tables with different column order +$node_publisher->safe_psql( +'postgres', qq( +CREATE TABLE public.test_col_order ( +a bigint, b int +); +INSERT INTO public.test_col_order (a,b) +VALUES (1,2),(3,4); +)); What purpose does this test serve w.r.t this patch? Before checking the sync for different column orders, the patch has already changed binary to false, so it doesn't seem to test the functionality of this patch. Am, I missing something? -- With Regards, Amit Kapila.
RE: Allow logical replication to copy tables in binary format
Hi, On Wednesday, March 15, 2023 2:34 PM Amit Kapila wrote: > On Tue, Mar 14, 2023 at 8:50 PM Takamichi Osumi (Fujitsu) > wrote: > > > > On Tuesday, March 14, 2023 8:02 PM Melih Mutlu > wrote: > > (3) copy_table() > > > > + /* > > +* If the publisher version is earlier than v14, it COPY command > doesn't > > +* support the binary option. > > +*/ > > > > This sentence doesn't look correct grammatically. We can replace "it COPY > command" with "subscription" for example. Kindly please fix it. > > > > How about something like: "The binary option for replication is supported > since > v14."? Yes, this looks best to me. I agree with this suggestion. Best Regards, Takamichi Osumi
Re: Allow logical replication to copy tables in binary format
On Wed, Mar 15, 2023 at 11:52 AM Peter Smith wrote: > > == > src/backend/replication/logical/tablesync.c > > 5. > + > + /* > + * If the publisher version is earlier than v14, it COPY command doesn't > + * support the binary option. > + */ > + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 && > + MySubscription->binary) > + { > + appendStringInfo(, " WITH (FORMAT binary)"); > + options = lappend(options, makeDefElem("format", (Node *) > makeString("binary"), -1)); > + } > > Sorry, I gave a poor review comment for this previously. Now I have > revisited all the thread discussions about version checking. I feel > that some explanation should be given in the code comment so that > future readers of this code can understand why you decided to use v14 > checking. > > Something like this: > > SUGGESTION > If the publisher version is earlier than v14, we use text format COPY. > I think this isn't explicit that we supported the binary format since v14. So, I would prefer my version of the comment as suggested in the previous email. -- With Regards, Amit Kapila.
Re: Allow logical replication to copy tables in binary format
Here are some review comments for v13-0001 == doc/src/sgml/logical-replication.sgml 1. @@ -241,10 +241,13 @@ types of the columns do not need to match, as long as the text representation of the data can be converted to the target type. For example, you can replicate from a column of type integer to a - column of type bigint. The target table can also have - additional columns not provided by the published table. Any such columns - will be filled with the default value as specified in the definition of the - target table. + column of type bigint. However, replication in binary format is + type specific and does not allow to replicate data between different types + according to its restrictions (See binary option of + CREATE SUBSCRIPTION + for details). The target table can also have additional columns not provided + by the published table. Any such columns will be filled with the default + value as specified in the definition of the target table. I don’t really think we should mention details of what the binary problems are here, because then: i) it is just duplicating information already on the CREATE SUBSCRIPTION page ii) you will miss some restrictions. (e.g. here you said something about "type specific" but didn't mention send/receive functions would be mandatory for the copy_data option) That's why in the previous v12 review [1] (comment #3) I suggested that this page should just say something quite generic like "However, replication in binary format is more restrictive", and link back to the other page which has all the gory details. == doc/src/sgml/ref/alter_subscription.sgml 2. My previous v12 review [1] (comment #6) suggested maybe updating this page. But it was not done in v13. Did you accidentally miss the review comment, or chose not to do it? == doc/src/sgml/ref/create_subscription.sgml 3. - Specifies whether the subscription will request the publisher to - send the data in binary format (as opposed to text). - The default is false. - Even when this option is enabled, only data types having - binary send and receive functions will be transferred in binary. + Specifies whether the subscription will request the publisher to send + the data in binary format (as opposed to text). The default is + false. Any initial table synchronization copy + (see copy_data) also uses the same format. Binary + format can be faster than the text format, but it is less portable + across machine architectures and PostgreSQL versions. Binary format is + also very data type specific, it will not allow copying between different + column types as opposed to text format. Even when this option is enabled, + only data types having binary send and receive functions will be + transferred in binary. Note that the initial synchronization requires + all data types to have binary send and receive functions, otherwise + the synchronization will fail. BEFORE Binary format is also very data type specific, it will not allow copying between different column types as opposed to text format. SUGGESTION (worded more similar to what is already on the COPY page [2]) Binary format is very data type specific; for example, it will not allow copying from a smallint column to an integer column, even though that would work fine in text format. ~~~ 4. + + If the publisher is a PostgreSQL version + before 14, then any initial table synchronization will use text format + even if this option is enabled. + IMO it will be clearer to explicitly say the option instead of 'this option'. SUGGESTION If the publisher is a PostgreSQL version before 14, then any initial table synchronization will use text format even if binary = true. == src/backend/replication/logical/tablesync.c 5. + + /* + * If the publisher version is earlier than v14, it COPY command doesn't + * support the binary option. + */ + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 14 && + MySubscription->binary) + { + appendStringInfo(, " WITH (FORMAT binary)"); + options = lappend(options, makeDefElem("format", (Node *) makeString("binary"), -1)); + } Sorry, I gave a poor review comment for this previously. Now I have revisited all the thread discussions about version checking. I feel that some explanation should be given in the code comment so that future readers of this code can understand why you decided to use v14 checking. Something like this: SUGGESTION If the publisher version is earlier than v14, we use text format COPY. Note - In fact COPY syntax "WITH (FORMAT binary)" has existed since v9, but since the logical replication binary mode transfer was not introduced until v14 it was decided to check using the later version. -- [1] PS v12 review -
Re: Allow logical replication to copy tables in binary format
On Tue, Mar 14, 2023 at 8:50 PM Takamichi Osumi (Fujitsu) wrote: > > On Tuesday, March 14, 2023 8:02 PM Melih Mutlu wrote: > (3) copy_table() > > + /* > +* If the publisher version is earlier than v14, it COPY command > doesn't > +* support the binary option. > +*/ > > This sentence doesn't look correct grammatically. We can replace "it COPY > command" with "subscription" for example. Kindly please fix it. > How about something like: "The binary option for replication is supported since v14."? -- With Regards, Amit Kapila.
RE: Allow logical replication to copy tables in binary format
On Tuesday, March 14, 2023 8:02 PM Melih Mutlu wrote: > Attached v13. Hi, Thanks for sharing v13. Few minor review comments. (1) create_subscription.sgml + column types as opposed to text format. Even when this option is enabled, + only data types having binary send and receive functions will be + transferred in binary. Note that the initial synchronization requires (1-1) I think it's helpful to add a reference for the description about send and receive functions (e.g. to the page of CREATE TYPE). (1-2) Also, it would be better to have a cross reference from there to this doc as one paragraph probably in "Notes". I suggested this, because send and receive functions are described as "optional" there and missing them leads to error in the context of binary table synchronization. (3) copy_table() + /* +* If the publisher version is earlier than v14, it COPY command doesn't +* support the binary option. +*/ This sentence doesn't look correct grammatically. We can replace "it COPY command" with "subscription" for example. Kindly please fix it. Best Regards, Takamichi Osumi
Re: Allow logical replication to copy tables in binary format
Hi, Attached v13. Peter Smith , 14 Mar 2023 Sal, 03:07 tarihinde şunu yazdı: > Here are some review comments for patch v12-0001 > Thanks for reviewing. I tried to make explanations in docs better according to your comments. What do you think? Amit Kapila , 14 Mar 2023 Sal, 06:17 tarihinde şunu yazdı: > I think it would better to write the tests for this feature in the > existing test file 014_binary as that would save some time for node > setup/shutdown and also that would be a more appropriate place for > these tests. I removed 032_binary_copy.pl and added those tests into 014_binary.pl. Also added the case with different column order as Peter suggested. Best, -- Melih Mutlu Microsoft v13-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
Re: Allow logical replication to copy tables in binary format
On Tue, Mar 14, 2023 at 6:18 AM Peter Smith wrote: > > On Tue, Mar 14, 2023 at 11:06 AM Peter Smith wrote: > > > > Here are some review comments for patch v12-0001 > > > > == > > General > > > > 1. > > There is no new test code. Are we sure that there are already > > sufficient TAP tests doing binary testing with/without copy_data and > > covering all the necessary combinations? > > > > Oops. Please ignore this comment. Somehow I missed seeing those > 032_binary_copy.pl tests earlier. > I think it would better to write the tests for this feature in the existing test file 014_binary as that would save some time for node setup/shutdown and also that would be a more appropriate place for these tests. -- With Regards, Amit Kapila.
Re: Allow logical replication to copy tables in binary format
Here are some review comments for patch v12-0001 (test code only) == src/test/subscription/t/014_binary.pl # Check the synced data on subscribers ~ There are a couple of comments like the above that say: "on subscribers" instead of "on subscriber". ~~~ I wondered if it might be useful to also include another test case that demonstrates you can still use COPY with binary format even when the table column orders are different, so long as the same names have the same data types. In other words, it shows apparently, the binary row COPY processes per column; not one single binary data copy spanning all the replicated columns. For example, # # Test syncing tables with different column order $node_publisher->safe_psql( 'postgres', qq( CREATE TABLE public.test_col_order ( a bigint, b int ); INSERT INTO public.test_col_order (a,b) VALUES (1,2),(3,4); )); $node_subscriber->safe_psql( 'postgres', qq( CREATE TABLE public.test_col_order ( b int, a bigint ); ALTER SUBSCRIPTION tsub REFRESH PUBLICATION; )); # Ensure nodes are in sync with each other $node_subscriber->wait_for_subscription_sync($node_publisher, 'tsub'); # Check the synced data on subscribers $result = $node_subscriber->safe_psql('postgres', 'SELECT a,b FROM public.test_col_order;'); is( $result, '1|2 3|4', 'check synced data on subscriber for different column order and binary = true'); # -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Allow logical replication to copy tables in binary format
On Tue, Mar 14, 2023 at 11:06 AM Peter Smith wrote: > > Here are some review comments for patch v12-0001 > > == > General > > 1. > There is no new test code. Are we sure that there are already > sufficient TAP tests doing binary testing with/without copy_data and > covering all the necessary combinations? > Oops. Please ignore this comment. Somehow I missed seeing those 032_binary_copy.pl tests earlier. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Allow logical replication to copy tables in binary format
Here are some review comments for patch v12-0001 == General 1. There is no new test code. Are we sure that there are already sufficient TAP tests doing binary testing with/without copy_data and covering all the necessary combinations? == Commit Message 2. Without this patch, table are copied in text format even if the subscription is created with binary option enabled. This patch allows logical replication to perform in binary format starting from initial sync. When binary format is beneficial to use, allowing the subscription to copy tables in binary in table sync phase may reduce the time spent on copy depending on column types. ~ a. "table" -> "tables" b. I don't think need to keep referring to the initial table synchronization many times. SUGGESTION Without this patch, table synchronization COPY uses text format even if the subscription is created with the binary option enabled. Copying tables in binary format may reduce the time spent depending on column types. == doc/src/sgml/logical-replication.sgml 3. @@ -241,10 +241,11 @@ types of the columns do not need to match, as long as the text representation of the data can be converted to the target type. For example, you can replicate from a column of type integer to a - column of type bigint. The target table can also have - additional columns not provided by the published table. Any such columns - will be filled with the default value as specified in the definition of the - target table. + column of type bigint. However, replication in binary format is + type specific and does not allow to replicate data between different types + according to its restrictions. The target table can also have additional + columns not provided by the published table. Any such columns will be filled + with the default value as specified in the definition of the target table. I am not sure if there is enough information here about the binary restrictions. - e.g. does the column order matter for tablesync COPY binary? - e.g. no mention of the send/receive function requirements of tablesync COPY. But maybe here is not the place to write all such details anyway; Instead of duplicating information IMO here should give a link to the CREATE SUBSCRIPTION notes -- something like: SUGGESTION Note that replication in binary format is more restrictive. See CREATE SUBSCRIPTION binary subscription parameter for details. == doc/src/sgml/ref/create_subscription.sgml 4. @@ -189,11 +189,17 @@ CREATE SUBSCRIPTION subscription_namebinary (boolean) - Specifies whether the subscription will request the publisher to - send the data in binary format (as opposed to text). - The default is false. - Even when this option is enabled, only data types having - binary send and receive functions will be transferred in binary. + Specifies whether the subscription will both copy the initial data to + synchronize relations and request the publisher to send the data in + binary format (as opposed to text). The default is false. + Binary format can be faster than the text format, but it is less portable + across machine architectures and PostgreSQL versions. Binary format is + also very data type specific, it will not allow copying between different + column types as opposed to text format. Even when this option is enabled, + only data types having binary send and receive functions will be + transferred in binary. Note that the initial synchronization requires + all data types to have binary send and receive functions, otherwise + the synchronization will fail. There seems to be a small ambiguity because this wording comes more from our code-level understanding, rather than what the user sees. e.g. I think "will be transferred" could mean also the COPY phase as far as the user is concerned. Maybe some slight rewording can help. There is also some use of "copy" (e.g. "will not allow copying") which can be confused with the initial tablesync phase which is not what was intended. SUGGESTION Specifies whether the subscription will request the publisher to send the data in binary format (as opposed to text). The default is false. Any initial table synchronization copy [link to copy_data] also uses the same format. Using binary format can be faster than the text format, but it is less portable across machine architectures and PostgreSQL versions. Binary format is also data type specific, it will not allow transfer between different column types as opposed to text format. Even when the binary option is enabled, only data types having binary send/receive functions can be transferred in binary format. If these functions don't exist then the publisher send will revert to sending text format. Note that the binary initial table synchronization copy requires all data types to
Re: Allow logical replication to copy tables in binary format
Hi, Attached v12 with a unified option. Setting binary = true now allows the initial sync to happen in binary format. Thanks, -- Melih Mutlu Microsoft v12-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
Re: Allow logical replication to copy tables in binary format
On Wed, Mar 8, 2023 at 6:17 PM Melih Mutlu wrote: > > On 7 Mar 2023 Tue at 04:10 Amit Kapila wrote: >> >> As per what I could read in this thread, most people prefer to use the >> existing binary option rather than inventing a new way (option) to >> binary copy in the initial sync phase. Do you agree? > > > I agree. > What do you think about the version checks? I removed any kind of check since > it’s currently a different option. Should we check publisher version before > doing binary copy to ensure that the publisher node supports binary option of > COPY command? > It is not clear to me which version check you wanted to add because we seem to have a binary option in COPY from the time prior to logical replication. I feel we need a publisher version 14 check as that is where we start to support binary mode transfer in logical replication. See the check in function libpqrcv_startstreaming(). -- With Regards, Amit Kapila.
Re: Allow logical replication to copy tables in binary format
Hi, On 7 Mar 2023 Tue at 04:10 Amit Kapila wrote: > As per what I could read in this thread, most people prefer to use the > existing binary option rather than inventing a new way (option) to > binary copy in the initial sync phase. Do you agree? I agree. What do you think about the version checks? I removed any kind of check since it’s currently a different option. Should we check publisher version before doing binary copy to ensure that the publisher node supports binary option of COPY command? Thanks, -- Melih Mutlu Microsoft
RE: Allow logical replication to copy tables in binary format
Dear Melih, >> I think we should add description to doc that it is more likely happen to >> fail >> the initial copy user should enable binary format after synchronization if >> tables have original datatype. > > I tried to explain when binary copy can cause failures in the doc. What > exactly > do you think is missing? I assumed here that "copy_format" and "binary" were combined into one option. Currently the binary option has descriptions : ``` Even when this option is enabled, only data types having binary send and receive functions will be transferred in binary ``` But this is not suitable for initial data sync, as we knew. I meant to say that it must be updated if options are combined. Note that following is not necessary for PG16, just an improvement for newer version. Is it possible to automatically switch the binary option from 'true' to 'false' when data transfer fails? As we found that while synchronizing the initial data with binary format may lead another error, and it can be solved if the options is changed. When DBAs check a log after synchronization and find the output like "binary option was changed and worker will restart..." or something, they can turn "binary" on again. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Allow logical replication to copy tables in binary format
On Wed, Mar 1, 2023 at 7:58 PM Melih Mutlu wrote: > > Bharath Rupireddy , 1 Mar 2023 Çar, > 15:02 tarihinde şunu yazdı: >> > > That was my intention in the beginning with this patch. Then the new option > also made some sense at some point, and I added copy_binary option according > to reviews. > The earlier versions of the patch didn't have that. Without the new option, > this patch would also be smaller. > > But before changing back to the point where these are all tied to binary > option without a new option, I think we should decide if that's really the > ideal way to do it. > As per what I could read in this thread, most people prefer to use the existing binary option rather than inventing a new way (option) to binary copy in the initial sync phase. Do you agree? If so, it is better to update the patch accordingly as this is the last CF for this release and we have a limited time left. -- With Regards, Amit Kapila.
Re: Allow logical replication to copy tables in binary format
On Thu, Mar 2, 2023 at 4:00 PM Amit Kapila wrote: > > On Thu, Mar 2, 2023 at 7:27 AM Peter Smith wrote: > > ... > > IIUC most people seem to be coming down in favour of there being a > > single unified option (the existing 'binary==true/false) which would > > apply to both the COPY and the data replication parts. > > > > I also agree > > - Yes, it is simpler. > > - Yes, there are various workarounds in case the COPY part failed > > > > But, AFAICT the main question remains unanswered -- Are we happy to > > break existing applications already using binary=true. E.g. I think > > there might be cases where applications are working *only* because > > their binary=true is internally (and probably unbeknownst to the user) > > reverting to text. So if we unified everything under one 'binary' > > option then binary=true will force COPY binary so now some previously > > working applications will get COPY errors requiring workarounds. Is > > that acceptable? > > > > I think one can look at this from another angle also where users would > be expecting that when binary = true and copy_data = true, all the > data transferred between publisher and subscriber should be in binary > format. Users have a workaround to set binary=true only after the > initial sync. Also, if at all, the behaviour change would be after > major version upgrade which shouldn't be a problem. > > > TBH I am not sure anymore if the complications justify the patch. > > > > It seems we have to choose from 2 bad choices: > > - separate options = this works but would be more confusing for the user > > - unified option = this would be simpler and faster, but risks > > breaking existing applications currently using 'binary=true' > > > > I would prefer a unified option as apart from other things you and > others mentioned that will be less of a maintenance burden in the > future. My concern was mostly just about the potential to break the behaviour of existing binary=true applications in some edge cases. If you are happy that doing so shouldn't be a problem, then I am also +1 to use the unified option. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Allow logical replication to copy tables in binary format
On Thu, Mar 2, 2023 at 10:30 AM Amit Kapila wrote: > > > TBH I am not sure anymore if the complications justify the patch. > > > > It seems we have to choose from 2 bad choices: > > - separate options = this works but would be more confusing for the user > > - unified option = this would be simpler and faster, but risks > > breaking existing applications currently using 'binary=true' > > > > I would prefer a unified option as apart from other things you and > others mentioned that will be less of a maintenance burden in the > future. > +1 When someone sets the binary=true while creating a subscription, the expectation would be that the data transfer will happen in binary mode if binary in/out functions are available. As per current implementation, that's not happening in the table-sync phase. So, it makes sense to fix that behaviour in a major version release. For the existing applications that are using (or unknowingly misusing) the feature, as Amit mentioned, they have a workaround. -- Thanks & Regards, Kuntal Ghosh
Re: Allow logical replication to copy tables in binary format
On Thu, Mar 2, 2023 at 7:27 AM Peter Smith wrote: > > On Thu, Mar 2, 2023 at 5:10 AM Melih Mutlu wrote: > > > > Hi, > > > > Hayato Kuroda (Fujitsu) , 1 Mar 2023 Çar, 18:40 > > tarihinde şunu yazdı: > >> > >> Dear Melih, > >> > >> If we do not have to treat the case Shi pointed out[1] as code-level, I > >> agreed to > >> same option binary because it is simpler. > > > > > > How is this an issue if we let the binary option do binary copy and not an > > issue if we have a separate copy_binary option? > > You can easily have the similar errors when you set copy_binary=true if a > > type is missing binary send/receive functions. > > And also, as Amit mentioned, the same issue can easily be avoided if > > binary=false until the initial sync is done. It can be set to true later. > > > >> > > IIUC most people seem to be coming down in favour of there being a > single unified option (the existing 'binary==true/false) which would > apply to both the COPY and the data replication parts. > > I also agree > - Yes, it is simpler. > - Yes, there are various workarounds in case the COPY part failed > > But, AFAICT the main question remains unanswered -- Are we happy to > break existing applications already using binary=true. E.g. I think > there might be cases where applications are working *only* because > their binary=true is internally (and probably unbeknownst to the user) > reverting to text. So if we unified everything under one 'binary' > option then binary=true will force COPY binary so now some previously > working applications will get COPY errors requiring workarounds. Is > that acceptable? > I think one can look at this from another angle also where users would be expecting that when binary = true and copy_data = true, all the data transferred between publisher and subscriber should be in binary format. Users have a workaround to set binary=true only after the initial sync. Also, if at all, the behaviour change would be after major version upgrade which shouldn't be a problem. > TBH I am not sure anymore if the complications justify the patch. > > It seems we have to choose from 2 bad choices: > - separate options = this works but would be more confusing for the user > - unified option = this would be simpler and faster, but risks > breaking existing applications currently using 'binary=true' > I would prefer a unified option as apart from other things you and others mentioned that will be less of a maintenance burden in the future. -- With Regards, Amit Kapila.
Re: Allow logical replication to copy tables in binary format
On Thu, Mar 2, 2023 at 5:10 AM Melih Mutlu wrote: > > Hi, > > Hayato Kuroda (Fujitsu) , 1 Mar 2023 Çar, 18:40 > tarihinde şunu yazdı: >> >> Dear Melih, >> >> If we do not have to treat the case Shi pointed out[1] as code-level, I >> agreed to >> same option binary because it is simpler. > > > How is this an issue if we let the binary option do binary copy and not an > issue if we have a separate copy_binary option? > You can easily have the similar errors when you set copy_binary=true if a > type is missing binary send/receive functions. > And also, as Amit mentioned, the same issue can easily be avoided if > binary=false until the initial sync is done. It can be set to true later. > >> IIUC most people seem to be coming down in favour of there being a single unified option (the existing 'binary==true/false) which would apply to both the COPY and the data replication parts. I also agree - Yes, it is simpler. - Yes, there are various workarounds in case the COPY part failed But, AFAICT the main question remains unanswered -- Are we happy to break existing applications already using binary=true. E.g. I think there might be cases where applications are working *only* because their binary=true is internally (and probably unbeknownst to the user) reverting to text. So if we unified everything under one 'binary' option then binary=true will force COPY binary so now some previously working applications will get COPY errors requiring workarounds. Is that acceptable? TBH I am not sure anymore if the complications justify the patch. It seems we have to choose from 2 bad choices: - separate options = this works but would be more confusing for the user - unified option = this would be simpler and faster, but risks breaking existing applications currently using 'binary=true' -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Allow logical replication to copy tables in binary format
Hi, Hayato Kuroda (Fujitsu) , 1 Mar 2023 Çar, 18:40 tarihinde şunu yazdı: > Dear Melih, > > If we do not have to treat the case Shi pointed out[1] as code-level, I > agreed to > same option binary because it is simpler. How is this an issue if we let the binary option do binary copy and not an issue if we have a separate copy_binary option? You can easily have the similar errors when you set copy_binary=true if a type is missing binary send/receive functions. And also, as Amit mentioned, the same issue can easily be avoided if binary=false until the initial sync is done. It can be set to true later. > I read the use-cases addressed by Bharath[2] > and I cannot find advantage for case 1 and 3 expect the case that binary > functions > are not implemented. > Note that case 3 is already how it works on HEAD. Its advantages, as you already mentioned, is when some types are missing the binary functions. I think that's why case 3 should be allowed even if a new option is added or not. Previously I said that the combination of "copy_format = binary" and > "copy_data = false" > seemed strange[3], but this approach could solve it and other related ones > automatically. > I think it is quite similar to the case where binary=true and enabled=false. In that case, the format is set but the subscription does not replicate anything. And this is allowed. copy_binary=true and copy_data=false combination also sets the copy format but does not copy anything. Even if any table will not be copied at that moment, tables which might be added later might need to be copied (by ALTER SUBSCRIPTION). And setting the copy format beforehand can be useful in such cases. > I think we should add description to doc that it is more likely happen to > fail > the initial copy user should enable binary format after synchronization if > tables have original datatype. > I tried to explain when binary copy can cause failures in the doc. What exactly do you think is missing? Best, -- Melih Mutlu Microsoft
RE: Allow logical replication to copy tables in binary format
Dear Melih, If we do not have to treat the case Shi pointed out[1] as code-level, I agreed to same option binary because it is simpler. I read the use-cases addressed by Bharath[2] and I cannot find advantage for case 1 and 3 expect the case that binary functions are not implemented. Previously I said that the combination of "copy_format = binary" and "copy_data = false" seemed strange[3], but this approach could solve it and other related ones automatically. I think we should add description to doc that it is more likely happen to fail the initial copy user should enable binary format after synchronization if tables have original datatype. [1]: https://www.postgresql.org/message-id/OSZPR01MB6310B58F069FF8E148B247FDFDA49%40OSZPR01MB6310.jpnprd01.prod.outlook.com [2] https://www.postgresql.org/message-id/CALj2ACXiUsJoXt%3DfMpa4yYseB5h3un_syVh-J3RxL4-6r9Dx2A%40mail.gmail.com [3]: https://www.postgresql.org/message-id/TYAPR01MB5866968CF42FBAB73E8EEDF8F5AA9%40TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Allow logical replication to copy tables in binary format
Hi, Bharath Rupireddy , 1 Mar 2023 Çar, 15:02 tarihinde şunu yazdı: > On Wed, Mar 1, 2023 at 4:47 PM Dilip Kumar wrote: > > I agree with this thought, basically adding an extra option will > > always complicate things for the user. And logically it doesn't make > > much sense to copy data in text mode and then stream in binary mode > > (except in some exception cases and for that, we can always alter the > > subscription). So IMHO it makes more sense that if the binary option > > is selected then ideally it should choose to do the initial sync also > > in the binary mode. > I agree that copying in text then streaming in binary does not have a good use-case. I think I was suggesting earlier to use a separate option for binary > table sync copy based on my initial knowledge of binary COPY. Now that > I have a bit more understanding of binary COPY and subscription's > existing binary option, +1 for using the same option for table sync > too. > > If used the existing subscription binary option for the table sync, > there can be following possibilities for the users: > 1. users might want to enable the binary option for table sync and > disable it for subsequent replication > 2. users might want to enable the binary option for both table sync > and for subsequent replication > 3. users might want to disable the binary option for table sync and > enable it for subsequent replication > 4. users might want to disable binary option for both table sync and > for subsequent replication > > Binary copy use-cases are a bit narrower compared to the existing > subscription binary option, it works only if: > a) the column data types have appropriate binary send/receive functions > b) not replicating between different major versions or different platforms > c) both publisher and subscriber tables have the exact same column > types (not when replicating from smallint to int or numeric to int8 > and so on) > d) both publisher and subscriber supports COPY with binary option > > Now if one enabled the binary option for table sync, that means, they > must have ensured all (a), (b), (c), and (d) are met. The point is if > one decides to use binary copy for table sync, it means that the > subsequent binary replication works too without any problem. If > required, one can disable it for normal replication i.e. post-table > sync. > That was my intention in the beginning with this patch. Then the new option also made some sense at some point, and I added copy_binary option according to reviews. The earlier versions of the patch didn't have that. Without the new option, this patch would also be smaller. But before changing back to the point where these are all tied to binary option without a new option, I think we should decide if that's really the ideal way to do it. I believe that the patch is all good now with the binary_copy option which is not tied to anything, explanations in the doc and separate tests etc. But I also agree that binary=true should make everything in binary and binary=false should do them in text format. It makes more sense. Best, -- Melih Mutlu Microsoft
Re: Allow logical replication to copy tables in binary format
On Wed, Mar 1, 2023 at 4:47 PM Dilip Kumar wrote: > > > > walsender ERROR: no binary output function available for type > > > public.myvarchar > > > walsender STATEMENT: COPY public.tbl1 (a) TO STDOUT WITH (FORMAT binary) > > > > > > > Thanks for sharing the example. I think to address this user can > > create a SUBSCRIPTION with 'binary = false' and then after the initial > > copy enables it with ALTER SUBSCRIPTION. Personally, I feel it is not > > required to have a separate option to allow copy in binary mode. Note, > > where there is some use for it but having more options for similar > > work is also confusing as users need to pay attention to different > > options and their values. It won't be difficult to add such an option > > in the future if we see such cases and or users specifically require > > something like this. > > I agree with this thought, basically adding an extra option will > always complicate things for the user. And logically it doesn't make > much sense to copy data in text mode and then stream in binary mode > (except in some exception cases and for that, we can always alter the > subscription). So IMHO it makes more sense that if the binary option > is selected then ideally it should choose to do the initial sync also > in the binary mode. I think I was suggesting earlier to use a separate option for binary table sync copy based on my initial knowledge of binary COPY. Now that I have a bit more understanding of binary COPY and subscription's existing binary option, +1 for using the same option for table sync too. If used the existing subscription binary option for the table sync, there can be following possibilities for the users: 1. users might want to enable the binary option for table sync and disable it for subsequent replication 2. users might want to enable the binary option for both table sync and for subsequent replication 3. users might want to disable the binary option for table sync and enable it for subsequent replication 4. users might want to disable binary option for both table sync and for subsequent replication Binary copy use-cases are a bit narrower compared to the existing subscription binary option, it works only if: a) the column data types have appropriate binary send/receive functions b) not replicating between different major versions or different platforms c) both publisher and subscriber tables have the exact same column types (not when replicating from smallint to int or numeric to int8 and so on) d) both publisher and subscriber supports COPY with binary option Now if one enabled the binary option for table sync, that means, they must have ensured all (a), (b), (c), and (d) are met. The point is if one decides to use binary copy for table sync, it means that the subsequent binary replication works too without any problem. If required, one can disable it for normal replication i.e. post-table sync. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Allow logical replication to copy tables in binary format
On Mon, Feb 27, 2023 at 2:31 PM Amit Kapila wrote: > > On Mon, Feb 20, 2023 at 3:37 PM shiy.f...@fujitsu.com > wrote: > > > > On Thu, Feb 16, 2023 8:48 PM Amit Kapila wrote: > > > > > > So, doesn't this mean that there is no separate failure mode during > > > the initial copy? I am clarifying this to see if the patch really > > > needs a separate copy_format option for initial sync? > > > > > > > In the case that the data type doesn't have binary output function, for > > apply > > phase, the column will be sent in text format (see > > logicalrep_write_tuple()) and > > it works fine. But with copy_format = binary, the walsender exits with an > > error. > > > ... > ... > > > > Then I got the following error in the publisher log. > > > > walsender ERROR: no binary output function available for type > > public.myvarchar > > walsender STATEMENT: COPY public.tbl1 (a) TO STDOUT WITH (FORMAT binary) > > > > Thanks for sharing the example. I think to address this user can > create a SUBSCRIPTION with 'binary = false' and then after the initial > copy enables it with ALTER SUBSCRIPTION. Personally, I feel it is not > required to have a separate option to allow copy in binary mode. Note, > where there is some use for it but having more options for similar > work is also confusing as users need to pay attention to different > options and their values. It won't be difficult to add such an option > in the future if we see such cases and or users specifically require > something like this. I agree with this thought, basically adding an extra option will always complicate things for the user. And logically it doesn't make much sense to copy data in text mode and then stream in binary mode (except in some exception cases and for that, we can always alter the subscription). So IMHO it makes more sense that if the binary option is selected then ideally it should choose to do the initial sync also in the binary mode. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Allow logical replication to copy tables in binary format
Hi, Attached v11. vignesh C , 28 Şub 2023 Sal, 13:59 tarihinde şunu yazdı: > Thanks for the patch, Few comments: > 1) Are primary key required for the tables, if not required we could > remove the primary key which will speed up the test by not creating > the indexes and inserting in the indexes. Even if required just create > for one of the tables: > I think that having a primary key in tables for logical replication tests is good for checking if log. rep. duplicates any row. Other tests also have primary keys in almost all tables. Bharath Rupireddy , 28 Şub 2023 Sal, 15:27 tarihinde şunu yazdı: > 1. > + > + If true, initial data synchronization will be performed in binary > format > + > It's not just the initial table sync right? The table sync can happen > at any other point of time when ALTER SUBSCRIPTION ... REFRESH > PUBLICATION WITH (copy = true) is run. > How about - "If true, the subscriber requests publication for > pre-existing data in binary format"? > I changed it as you suggested. I sometimes feel like the phrase "initial sync" is used for initial sync of a table, not a subscription. Or table syncs triggered by ALTER SUBSCRIPTION are ignored in some places where "initial sync" is used. 2. > Perhaps, this should cover the recommended cases for enabling this new > option - something like below (may not need to have exact wording, but > the recommended cases?): > "It is recommended to enable this option only when 1) the column data > types have appropriate binary send/receive functions, 2) not > replicating between different major versions or different platforms, > 3) both publisher and subscriber tables have the exact same column > types (not when replicating from smallint to int or numeric to int8 > and so on), 4) both publisher and subscriber supports COPY with binary > option, otherwise the table copy can fail." > I added a line stating that binary format is less portable across machine architectures and versions as stated in COPY [1]. I don't think we should add line saying "recommended", but state the restrictions clearly instead. It's also similar in COPY docs as well. I think the explanation now covers all your points, right? Jelte Fennema , 28 Şub 2023 Sal, 16:25 tarihinde şunu yazdı: > > 3. I think the newly added tests must verify if the binary COPY is > > picked up when enabled. Perhaps, looking at the publisher's server log > > for 'COPY ... WITH BINARY format'? Maybe it's an overkill, otherwise, > > we have no way of testing that the option took effect. > > Another way to test that BINARY is enabled could be to trigger one > of the failure cases. Yes, there is already a failure case for binary copy which resolves with swithcing binary_copy to false. But I also added checks for publisher logs now too. [1] https://www.postgresql.org/docs/devel/sql-copy.html Thanks, -- Melih Mutlu Microsoft v11-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
Re: Allow logical replication to copy tables in binary format
> 3. I think the newly added tests must verify if the binary COPY is > picked up when enabled. Perhaps, looking at the publisher's server log > for 'COPY ... WITH BINARY format'? Maybe it's an overkill, otherwise, > we have no way of testing that the option took effect. Another way to test that BINARY is enabled could be to trigger one of the failure cases.
Re: Allow logical replication to copy tables in binary format
On Tue, Feb 28, 2023 at 1:22 AM Melih Mutlu wrote: > > Hi, > > Thanks for all of your reviews! > > So, I made some changes in the v10 according to your comments. Thanks. Some quick comments on v10: 1. + + If true, initial data synchronization will be performed in binary format + It's not just the initial table sync right? The table sync can happen at any other point of time when ALTER SUBSCRIPTION ... REFRESH PUBLICATION WITH (copy = true) is run. How about - "If true, the subscriber requests publication for pre-existing data in binary format"? 2. + Specifies whether pre-existing data on the publisher will be copied + to the subscriber in binary format. The default is false. + Binary format is very data type specific, it will not allow copying + between different column types as opposed to text format. Note that + if this option is enabled, all data types which will be copied during + the initial synchronization should have binary send and receive functions. + If this option is disabled, data format for the initial synchronization + will be text. Perhaps, this should cover the recommended cases for enabling this new option - something like below (may not need to have exact wording, but the recommended cases?): "It is recommended to enable this option only when 1) the column data types have appropriate binary send/receive functions, 2) not replicating between different major versions or different platforms, 3) both publisher and subscriber tables have the exact same column types (not when replicating from smallint to int or numeric to int8 and so on), 4) both publisher and subscriber supports COPY with binary option, otherwise the table copy can fail." 3. I think the newly added tests must verify if the binary COPY is picked up when enabled. Perhaps, looking at the publisher's server log for 'COPY ... WITH BINARY format'? Maybe it's an overkill, otherwise, we have no way of testing that the option took effect. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Allow logical replication to copy tables in binary format
On Tue, 28 Feb 2023 at 01:22, Melih Mutlu wrote: > > Hi, > > Thanks for all of your reviews! > > So, I made some changes in the v10 according to your comments. > > 1- copy_format is changed to a boolean parameter copy_binary. > It sounds easier to use a boolean to enable/disable binary copy. Default > value is false, so nothing changes in the current behaviour if copy_binary is > not specified. > It's still persisted into the catalog. This is needed since its value will be > needed by tablesync workers later. And tablesync workers fetch subscription > configurations from the catalog. > In the copy_data case, it is not directly stored anywhere but it affects the > state of the table which is stored in the catalog. So, I guess this is the > convenient way if we decide to go with a new parameter. > > 2- copy_binary is not tied to another parameter > The patch does not disallow any combination of copy_binary with binary and > copy_data options. I tried to explain what binary copy can and cannot do in > the docs. Rest is up to the user now. > > 3- Removed version check for copy_binary > HEAD already does not have any check for binary option. Making binary copy > work only if publisher and subscriber are the same major version can be too > restricted. > > 4- Added separate test file > Although I believe 002_types.pl and 014_binary.pl can be improved more for > binary enabled subscription cases, this patch might not be the best place to > make those changes. > 032_binary_copy.pl now has the tests for binary copy option. There are also > some regression tests in subscription.sql. > > Finally, some other small fixes are done according to the reviews. > > Also, thanks Bharath for performance testing [1]. It can be seen that the > patch has some benefits. > > I appreciate further thoughts/reviews. Thanks for the patch, Few comments: 1) Are primary key required for the tables, if not required we could remove the primary key which will speed up the test by not creating the indexes and inserting in the indexes. Even if required just create for one of the tables: +# Create tables on both sides of the replication +my $ddl = qq( + CREATE TABLE public.test_numerical ( + a INTEGER PRIMARY KEY, + b NUMERIC, + c FLOAT, + d BIGINT + ); + CREATE TABLE public.test_arrays ( + a INTEGER[] PRIMARY KEY, + b NUMERIC[], + c TEXT[] + ); +CREATE TABLE public.test_range_array ( + a INTEGER PRIMARY KEY, + b TSTZRANGE, + c int8range[] + ); +CREATE TYPE public.test_comp_basic_t AS (a FLOAT, b TEXT, c INTEGER); +CREATE TABLE public.test_one_comp ( + a INTEGER PRIMARY KEY, + b public.test_comp_basic_t + );); + 2) We can have the Insert/Select of tables in the same order so that it is easy to verify. test_range_array/test_one_comp insertion/selection order was different. +# Insert some content and before creating a subscription +$node_publisher->safe_psql( + 'postgres', qq( +INSERT INTO public.test_numerical (a, b, c, d) VALUES + (1, 1.2, 1.3, 10), +(2, 2.2, 2.3, 20); + INSERT INTO public.test_arrays (a, b, c) VALUES + ('{1,2,3}', '{1.1, 1.2, 1.3}', '{"one", "two", "three"}'), +('{3,1,2}', '{1.3, 1.1, 1.2}', '{"three", "one", "two"}'); +INSERT INTO test_range_array (a, b, c) VALUES + (1, tstzrange('Mon Aug 04 00:00:00 2014 CEST'::timestamptz, 'infinity'), '{"[1,2]", "[10,20]"}'), + (2, tstzrange('Sat Aug 02 00:00:00 2014 CEST'::timestamptz, 'Mon Aug 04 00:00:00 2014 CEST'::timestamptz), '{"[2,3]", "[20,30]"}'); + INSERT INTO test_one_comp (a, b) VALUES + (1, ROW(1.0, 'a', 1)), + (2, ROW(2.0, 'b', 2)); + )); + +# Create the subscription with copy_binary = true +my $publisher_connstring = $node_publisher->connstr . ' dbname=postgres'; +$node_subscriber->safe_psql('postgres', + "CREATE SUBSCRIPTION tsub CONNECTION '$publisher_connstring' " + . "PUBLICATION tpub WITH (slot_name = tpub_slot, copy_binary = true)"); + +# Ensure nodes are in sync with each other +$node_subscriber->wait_for_subscription_sync($node_publisher, 'tsub'); + +my $sync_check = qq( + SET timezone = '+2'; + SELECT a, b, c, d FROM test_numerical ORDER BY a; + SELECT a, b, c FROM test_arrays ORDER BY a; + SELECT a, b FROM test_one_comp ORDER BY a; + SELECT a, b, c FROM test_range_array ORDER BY a; +); 3) Should we check the results for test_myvarchar table only, since there is no change in other tables, we need not check other tables again: +# Now tablesync should succeed +$node_subscriber->wait_for_subscription_sync($node_publisher, 'tsub'); + +$sync_check = qq( + SET timezone = '+2'; + SELECT a, b, c, d FROM test_numerical ORDER BY a; + SELECT a, b, c
Re: Allow logical replication to copy tables in binary format
Hi, Thanks for all of your reviews! So, I made some changes in the v10 according to your comments. 1- copy_format is changed to a boolean parameter copy_binary. It sounds easier to use a boolean to enable/disable binary copy. Default value is false, so nothing changes in the current behaviour if copy_binary is not specified. It's still persisted into the catalog. This is needed since its value will be needed by tablesync workers later. And tablesync workers fetch subscription configurations from the catalog. In the copy_data case, it is not directly stored anywhere but it affects the state of the table which is stored in the catalog. So, I guess this is the convenient way if we decide to go with a new parameter. 2- copy_binary is not tied to another parameter The patch does not disallow any combination of copy_binary with binary and copy_data options. I tried to explain what binary copy can and cannot do in the docs. Rest is up to the user now. 3- Removed version check for copy_binary HEAD already does not have any check for binary option. Making binary copy work only if publisher and subscriber are the same major version can be too restricted. 4- Added separate test file Although I believe 002_types.pl and 014_binary.pl can be improved more for binary enabled subscription cases, this patch might not be the best place to make those changes. 032_binary_copy.pl now has the tests for binary copy option. There are also some regression tests in subscription.sql. Finally, some other small fixes are done according to the reviews. Also, thanks Bharath for performance testing [1]. It can be seen that the patch has some benefits. I appreciate further thoughts/reviews. [1] https://www.postgresql.org/message-id/CALj2ACUfE08ZNjKK-nK9JiwGhwUMRLM%2BqRhNKTVM9HipFk7Fow%40mail.gmail.com Thanks, -- Melih Mutlu Microsoft v10-0001-Allow-logical-replication-to-copy-table-in-binar.patch Description: Binary data
Re: Allow logical replication to copy tables in binary format
On Tue, Feb 21, 2023 at 7:18 PM Bharath Rupireddy wrote: > > On Mon, Feb 20, 2023 at 5:17 PM Melih Mutlu wrote: > > > > Thanks for letting me know. > > Attached the fixed version of the patch. > > Thanks. I have few comments on v9 patch: > > 1. > +/* Do not allow binary = false with copy_format = binary > */ > +if (!opts.binary && > +sub->copyformat == LOGICALREP_COPY_AS_BINARY && > +!IsSet(opts.specified_opts, SUBOPT_COPY_FORMAT)) > +ereport(ERROR, > + > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("cannot set %s for a > subscription with %s", > +"binary = false", > "copy_format = binary"))); > > I don't understand why we'd need to tie an option (binary) that deals > with data types at column-level with another option (copy_format) that > requests the entire table data to be in binary. This'd essentially > make one to set binary = true to use copy_format = binary, no? IMHO, > this inter-dependency is not good for better usability. > > 2. Why can't the tests that this patch adds be simple? Why would it > need to change the existing tests at all? I'm thinking to create a new > 00X_binary_copy_format.pl or such and setting up logical replication > with copy_format = binary and letting table sync worker request > publisher in binary format - you can verify this via publisher server > logs - look for COPY with BINARY option. If required, have the table > with different data types. This greatly reduces the patch's footprint. I've done performance testing with the v9 patch. I can constantly observe 1.34X improvement or 25% improvement in table sync/copy performance with the patch: HEAD binary = false Time: 214398.637 ms (03:34.399) PATCHED binary = true, copy_format = binary: Time: 160509.262 ms (02:40.509) There's a clear reduction (5.68% to 4.49%) in the CPU cycles spent in copy_table with the patch: perf report HEAD: -5.68% 0.00% postgres postgres [.] copy_table - copy_table - 5.67% CopyFrom - 4.26% NextCopyFrom - 2.16% NextCopyFromRawFields - 1.55% CopyReadLine - 1.52% CopyReadLineText - 0.76% CopyLoadInputBuf 0.50% CopyConvertBuf 0.60% CopyReadAttributesText - 1.93% InputFunctionCall 0.69% timestamptz_in 0.53% byteain - 0.73% CopyMultiInsertInfoFlush - 0.73% CopyMultiInsertBufferFlush - 0.66% table_multi_insert 0.65% heap_multi_insert perf report PATCHED: -4.49% 0.00% postgres postgres [.] copy_table - copy_table - 4.48% CopyFrom - 2.36% NextCopyFrom - 1.81% CopyReadBinaryAttribute 1.47% ReceiveFunctionCall - 1.21% CopyMultiInsertInfoFlush - 1.21% CopyMultiInsertBufferFlush - 1.11% table_multi_insert - 1.09% heap_multi_insert - 0.71% RelationGetBufferForTuple - 0.63% ReadBufferBI - 0.62% ReadBufferExtended 0.62% ReadBuffer_common I've tried to choose the table columns such that the binary send/recv vs non-binary/plain/text copy has some benefit. The table has 100mn rows, and is of 11GB size. I've measured the benefit using Melih's helper function wait_for_rep(). Note that I had to compile source code with -ggdb3 -O0 for perf report, otherwise with -O3 for performance numbers: wal_level = 'logical' shared_buffers = '8GB' wal_buffers = '1GB' max_wal_size = '32GB' create table foo(i int, n numeric, v varchar, b bytea, t timestamp with time zone default current_timestamp); insert into foo select i, i+1, md5(i::text), md5(i::text)::bytea from generate_series(1, 1) i; CREATE OR REPLACE PROCEDURE wait_for_rep() LANGUAGE plpgsql AS $$ BEGIN WHILE (SELECT count(*) != 0 FROM pg_subscription_rel WHERE srsubstate <> 'r') LOOP COMMIT; END LOOP; END; $$; -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Allow logical replication to copy tables in binary format
On Fri, Feb 24, 2023 at 8:32 AM Peter Smith wrote: > > Here is my summary of this thread so far, plus some other thoughts. Thanks. > 1. Initial Goal > -- > > Currently, the CREATE SUBSCRIPTION ... WITH(binary=true) mode does > data replication in binary mode, but tablesync COPY phase is still > only in text mode. IIUC Melih just wanted to unify those phases so > binary=true would mean also do the COPY phase in binary [1]. > > Actually, this was my very first expectation too. > > 2. Objections to unification > --- > > Bharath posted suggesting tying the COPY/replication parts is not a > good idea [2]. But if these are not to be unified then it requires a > new subscription option to be introduced, and currently, the thread > refers to this new option as copy_format=binary. Looking closely at the existing binary=true option and COPY's binary protocol, essentially they depend on the data type's binary send and receive functions. > 3. A problem: binary replication is not always binary! > -- > > Shi-san reported [3] that under special circumstances (e.g. if the > datatype has no binary output function) the current HEAD binary=true > mode for replication has the ability to fall back to text replication. > Since the binary COPY doesn't do this, it means binary COPY could fail > in some cases where the binary=true replication will be OK. Right. Apply workers can fallback to text mode transparently, whereas with binary COPY it's a bit difficult to do so. > AFAIK, this means that even if we *wanted* to unify everything with > just 'binary=true' it can't be done like that. Hm, it looks like that. > 4. New option is needed > - > > OK, so let's assume these options must be separated (because of the > problem of #3). > > ~ > > 4a. New string option called copy_format? > > Currently, the patch/thread describes a new 'copy_format' string > option with values 'text' (default) and 'binary'. > > Why? If there are only 2 values then IMO it would be better to have a > *boolean* option called something like binary_copy=true/false. > > ~ > > 4b. Or, we could extend copy_data > > Jelte suggested [4] we could extend copy_data = 'text' (aka on/true) > OR 'binary' OR 'none' (aka off/false). How about copy_binary = {true/false}? So, the options for the user are: copy_binary - defaults to false and when true, the subscriber requests publisher to send pre-existing table's data in binary format (as opposed to text) during data synchronization/table copy phase. It is recommended to enable this option only when 1) the column data types have appropriate binary send/receive functions, 2) not replicating between different major versions or different platforms, 3) both publisher and subscriber tables have the exact same column types (not when replicating from smallint to int or numeric to int8 and so on), 4) both publisher and subscriber supports COPY with binary option, otherwise the table copy can fail. binary - defaults to false and when true, the subscriber requests publisher to send table's data in binary format (as opposed to text) during normal replication phase. <> Note that with this we made users responsible to choose copy_binary rather than we being smart. > AFAIK currently the patch disallows some combinations: > > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("%s and %s are mutually exclusive options", > + "binary = false", "copy_format = binary"))); > > 6. pub/sub version checking > > > There were some discussions about doing some server checking to reject > some PG combinations from being allowed to use the copy_format=binary. IMHO, these restrictions make the feature more complicated to use and be removed and the options be made simple to use (usability and simplicity clearly wins the race). If there's any kind of feedback from the users/developers, we can always come back and improve. > But if the publication was defined as FOR ALL TABLES, or as ALL TABLES > IN SCHEMA, then I think the tablesync can still crash if a user > creates a new table that suffers the same COPY binary trouble Shi-san > described earlier [3]. > > What is the user supposed to do when that tablesync fails? > > They had no way to predict it could happen, And it will be too painful > to have to DROP and re-create the entire SUBSCRIPTION again now that > it has happened. Can't ALTER SUBSCRIPTION SET copy_format = 'text'; and ALTER SUBSCRIPTION ... REFRESH PUBLICATION WITH (copy_data = true); work here instead of drop and recreate subscription? > 7a. A thought bubble > > I wondered (perhaps this is naive) would it be it possible to enhance > the COPY command to enhance the "binary" mode so it can be made to > fall back to text mode if it needs to in the same way that binary > replication does. > > If such an enhanced COPY format mode worked,
Re: Allow logical replication to copy tables in binary format
On Mon, Feb 20, 2023 at 3:37 PM shiy.f...@fujitsu.com wrote: > > On Thu, Feb 16, 2023 8:48 PM Amit Kapila wrote: > > > > So, doesn't this mean that there is no separate failure mode during > > the initial copy? I am clarifying this to see if the patch really > > needs a separate copy_format option for initial sync? > > > > In the case that the data type doesn't have binary output function, for apply > phase, the column will be sent in text format (see logicalrep_write_tuple()) > and > it works fine. But with copy_format = binary, the walsender exits with an > error. > ... ... > > Then I got the following error in the publisher log. > > walsender ERROR: no binary output function available for type > public.myvarchar > walsender STATEMENT: COPY public.tbl1 (a) TO STDOUT WITH (FORMAT binary) > Thanks for sharing the example. I think to address this user can create a SUBSCRIPTION with 'binary = false' and then after the initial copy enables it with ALTER SUBSCRIPTION. Personally, I feel it is not required to have a separate option to allow copy in binary mode. Note, where there is some use for it but having more options for similar work is also confusing as users need to pay attention to different options and their values. It won't be difficult to add such an option in the future if we see such cases and or users specifically require something like this. -- With Regards, Amit Kapila.
RE: Allow logical replication to copy tables in binary format
On Monday, February 20, 2023 8:47 PM Melih Mutlu wrote: > Thanks for letting me know. > Attached the fixed version of the patch. Hi, Melih Thanks for updating the patch. Minor comments on v9. (1) commit message "The patch introduces a new parameter, copy_format, to CREATE SUBSCRIPTION to allow to choose the format used in initial table synchronization." This patch introduces the new parameter not only to CREATE SUBSCRIPTION and ALTER SUBSCRIPTION, then this description should be more general, something like below. "The patch introduces a new parameter, copy_format, as subscription option to allow user to choose the format of initial table synchronization." (2) copy_table We don't need to check the publisher's version below. + + /* If the publisher is v16 or later, specify the format to copy data. */ + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 16) + { + char *format = MySubscription->copyformat == LOGICALREP_COPY_AS_BINARY ? "binary" : "text"; + appendStringInfo(, " WITH (FORMAT %s)", format); + options = lappend(options, makeDefElem("format", (Node *) makeString(format), -1)); + } + We don't have this kind of check for "binary" option and it seems this is user's responsibility to avoid any errors during replication. If we want to add this kind of check, then we can add checks for both "binary" and "copy_format" option together as an independent patch. (3) subscription.sql/out The other existing other subscription options check the invalid input for newly created option (e.g. "foo" for disable_on_error, streaming mode). So, I think we can add this type of test for this feature. Best Regards, Takamichi Osumi
Re: Allow logical replication to copy tables in binary format
Here is my summary of this thread so far, plus some other thoughts. (I wrote this mostly for my own internal notes/understanding, but maybe it is a helpful summary for others so I am posting it here too). ~~ 1. Initial Goal -- Currently, the CREATE SUBSCRIPTION ... WITH(binary=true) mode does data replication in binary mode, but tablesync COPY phase is still only in text mode. IIUC Melih just wanted to unify those phases so binary=true would mean also do the COPY phase in binary [1]. Actually, this was my very first expectation too. 2. Objections to unification --- Bharath posted suggesting tying the COPY/replication parts is not a good idea [2]. But if these are not to be unified then it requires a new subscription option to be introduced, and currently, the thread refers to this new option as copy_format=binary. 3. A problem: binary replication is not always binary! -- Shi-san reported [3] that under special circumstances (e.g. if the datatype has no binary output function) the current HEAD binary=true mode for replication has the ability to fall back to text replication. Since the binary COPY doesn't do this, it means binary COPY could fail in some cases where the binary=true replication will be OK. AFAIK, this means that even if we *wanted* to unify everything with just 'binary=true' it can't be done like that. 4. New option is needed - OK, so let's assume these options must be separated (because of the problem of #3). ~ 4a. New string option called copy_format? Currently, the patch/thread describes a new 'copy_format' string option with values 'text' (default) and 'binary'. Why? If there are only 2 values then IMO it would be better to have a *boolean* option called something like binary_copy=true/false. ~ 4b. Or, we could extend copy_data Jelte suggested [4] we could extend copy_data = 'text' (aka on/true) OR 'binary' OR 'none' (aka off/false). That was interesting, although - my first impression was to worry about backward compatibility issues for existing application code. I don't know if those are easily solved. - AFAIK such "hybrid" boolean/enum options are kind of frowned upon so I am not sure if a committer would be in favour of introducing another one. 5. Combining options -- Because options are separated, it means combinations become possible... ~~ 5a. Combining option: "copy_format = binary" and "copy_data = false" Kuroda [5] wrote such a combination should not be allowed. I kind of disagree with that. IMO everything should be flexible as possible. The patch might end up accidentally stopping something that has a valid use case. Anyway, such restrictions are easy to add later. ~~ 5b. Combining options: binary=true/copy_format=binary, and binary=false/copy_format=binary become possible. AFAIK currently the patch disallows some combinations: + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("%s and %s are mutually exclusive options", + "binary = false", "copy_format = binary"))); I kind of disagree with introducing such rules/restrictions. IMO all this patch needs to do is clearly document all necessary precautions etc. But if the user still wants to do something, we should just let them do it. If they manage to shoot themselves in the foot, well it was their choice after reading the docs, and it's their foot. 6. pub/sub version checking There were some discussions about doing some server checking to reject some PG combinations from being allowed to use the copy_format=binary. Jelte suggested [5] that it is the "responsibility of the operator to evaluate the risk". I agree. Yes, the patch certainly needs to *document* all precautions, but having too many restrictions might end up accidentally stopping something useful. Anyway, this seems like #5a. I prefer KISS Principle. More restrictions can be added later if found necessary. 7. More doubts & a thought bubble - 7a. What is user action for this scenario? I am unsure about this scenario - imagine that everything is working properly and the copy_format=binary/copy_data=true is all working nicely for weeks for a certain pub/sub... But if the publication was defined as FOR ALL TABLES, or as ALL TABLES IN SCHEMA, then I think the tablesync can still crash if a user creates a new table that suffers the same COPY binary trouble Shi-san described earlier [3]. What is the user supposed to do when that tablesync fails? They had no way to predict it could happen, And it will be too painful to have to DROP and re-create the entire SUBSCRIPTION again now that it has happened. ~~ 7a. A thought bubble I wondered (perhaps this is naive) would it be it possible to enhance the COPY command to enhance the "binary" mode so it can be made to fall back to text mode if it
Re: Allow logical replication to copy tables in binary format
> This is because copy_data is not something stored in pg_subscription > or another catalog. But this is not an issue for copy_fornat since its > value will be stored in the catalog. This can allow users to set the > format even if copy_data=false and no initial sync is needed at that > moment. One other approach that might make sense is to expand the values that copy_data accepts to include the value "binary" (and probably "text" for clarity). That way people could easily choose for each sync if they want to use binary copy, text copy or no copy. Based on your message, this would mean that copy_format=binary would not be stored in catalogs anymore, does that have any bad side-effects for the implementation?
Re: Allow logical replication to copy tables in binary format
shiy.f...@fujitsu.com , 23 Şub 2023 Per, 12:29 tarihinde şunu yazdı: > On Thu, Feb 23, 2023 12:40 PM Kuroda, Hayato/黒田 隼人 < > kuroda.hay...@fujitsu.com> wrote: > > > > > > I'm not sure the combination of "copy_format = binary" and > "copy_data = > > false" > > > > should be accepted or not. How do you think? > > > > > > It seems quite useless indeed to specify the format of a copy that > won't > > happen. > > > > I understood that the conbination of "copy_format = binary" and > "copy_data = > > false" > > should be rejected in parse_subscription_options() and > AlterSubscription(). Is it > > right? > > I'm expecting that is done in next version. > > > > The copy_data option only takes effect once in CREATE SUBSCIPTION or ALTER > SUBSCIPTION REFRESH PUBLICATION command, but the copy_format option can > take > affect multiple times if the subscription is refreshed multiple times. > Even if > the subscription is created with copy_date=false, copy_format can take > affect > when executing ALTER SUBSCIPTION REFRESH PUBLICATION. So, I am not sure we > want > to reject this usage. > I believe the places copy_data and copy_format are needed are pretty much the same. I couldn't think of a case where copy_format is needed but copy_data isn't. Please let me know if I'm missing something. CREATE SUBSCRIPTION, ALTER SUBSCRIPTION SET/ADD/REFRESH PUBLICATION are all the places where initial sync can happen. For all these commands, copy_data needs to be given as a parameter or it will be set to the default value which is true. Even if copy_data=false when the sub was created, REFRESH PUBLICATION (without an explicit copy_data parameter) will copy some tables if needed regardless of what copy_data was in CREATE SUBSCRIPTION. This is because copy_data is not something stored in pg_subscription or another catalog. But this is not an issue for copy_fornat since its value will be stored in the catalog. This can allow users to set the format even if copy_data=false and no initial sync is needed at that moment. So that future initial syncs which can be triggered by ALTER SUBSCRIPTION will be performed in the correct format. So, I also think we should allow setting copy_format even if copy_data=false. Another way to deal with this issue could be expecting the user to specify format every time copy_format is needed, similar to the case for copy_data, and moving on with text (default) format if it's not specified for the current CREATE/ALTER SUBSCRIPTION execution. But I don't think this would make things easier. Best, -- Melih Mutlu Microsoft
RE: Allow logical replication to copy tables in binary format
On Thu, Feb 23, 2023 12:40 PM Kuroda, Hayato/黒田 隼人 wrote: > > > > I'm not sure the combination of "copy_format = binary" and "copy_data = > false" > > > should be accepted or not. How do you think? > > > > It seems quite useless indeed to specify the format of a copy that won't > happen. > > I understood that the conbination of "copy_format = binary" and "copy_data = > false" > should be rejected in parse_subscription_options() and AlterSubscription(). > Is it > right? > I'm expecting that is done in next version. > The copy_data option only takes effect once in CREATE SUBSCIPTION or ALTER SUBSCIPTION REFRESH PUBLICATION command, but the copy_format option can take affect multiple times if the subscription is refreshed multiple times. Even if the subscription is created with copy_date=false, copy_format can take affect when executing ALTER SUBSCIPTION REFRESH PUBLICATION. So, I am not sure we want to reject this usage. Besides, here are my comments on the v9 patch. 1. src/bin/pg_dump/pg_dump.c if (fout->remoteVersion >= 16) - appendPQExpBufferStr(query, " s.suborigin\n"); + { + appendPQExpBufferStr(query, " s.suborigin,\n"); + appendPQExpBufferStr(query, " s.subcopyformat\n"); + } else - appendPQExpBuffer(query, " '%s' AS suborigin\n", LOGICALREP_ORIGIN_ANY); + { + appendPQExpBuffer(query, " '%s' AS suborigin,\n", LOGICALREP_ORIGIN_ANY); + appendPQExpBuffer(query, " '%c' AS subcopyformat\n", LOGICALREP_COPY_AS_TEXT); + } src/bin/psql/describe.c if (pset.sversion >= 16) + { appendPQExpBuffer(, ", suborigin AS \"%s\"\n", gettext_noop("Origin")); + /* Copy format is only supported in v16 and higher */ + appendPQExpBuffer(, + ", subcopyformat AS \"%s\"\n", + gettext_noop("Copy Format")); + } I think we can call only once appendPQExpBuffer() for the two options which are supported in v16. For example, if (pset.sversion >= 16) { appendPQExpBuffer(, ", suborigin AS \"%s\"\n" ", subcopyformat AS \"%s\"\n", gettext_noop("Origin"), gettext_noop("Copy Format")); } 2. src/bin/psql/tab-complete.c @@ -1926,7 +1926,7 @@ psql_completion(const char *text, int start, int end) /* ALTER SUBSCRIPTION SET ( */ else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SET", "(")) COMPLETE_WITH("binary", "disable_on_error", "origin", "slot_name", - "streaming", "synchronous_commit"); + "streaming", "synchronous_commit", "copy_format"); /* ALTER SUBSCRIPTION SKIP ( */ else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SKIP", "(")) COMPLETE_WITH("lsn"); @@ -3269,7 +3269,8 @@ psql_completion(const char *text, int start, int end) else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", "(")) COMPLETE_WITH("binary", "connect", "copy_data", "create_slot", "disable_on_error", "enabled", "origin", "slot_name", - "streaming", "synchronous_commit", "two_phase"); + "streaming", "synchronous_commit", "two_phase", + "copy_format"); The options should be listed in alphabetical order. See commit d547f7cf5ef. Regards, Shi Yu
RE: Allow logical replication to copy tables in binary format
Dear Jelte, > I don't think it's necessary to check versions. Yes, there are > situations where binary will fail across major versions. But in many > cases it does not. To me it seems the responsibility of the operator > to evaluate this risk. And if the operator chooses wrong and uses > binary copy across incompatible versions, then it will still fail hard > in that case during the copy phase (so still a very early error). So I > don't see a reason to check pre-emptively, afaict it will only > disallow some valid usecases and introduce more code. > > Furthermore no major version check is done for "binary = true" either > (afaik). The only additional failure scenario that copy_format=binary > introduces is when one of the types does not implement a send function > on the source. With binary=true, this would continue to work, but with > copy_format=binary this stops working. All other failure scenarios > that binary encoding of types introduces apply to both binary=true and > copy_format=binary (the only difference being in which phase of the > replication these failures happen, the apply or the copy phase). I thought that current specification was lack of consideration, but you meant to say that it is intentional one to keep the availability, right? Indeed my suggestion seems to be too pessimistic, but I want to listen to other opinions more... > > I'm not sure the combination of "copy_format = binary" and "copy_data = > > false" > > should be accepted or not. How do you think? > > It seems quite useless indeed to specify the format of a copy that won't > happen. I understood that the conbination of "copy_format = binary" and "copy_data = false" should be rejected in parse_subscription_options() and AlterSubscription(). Is it right? I'm expecting that is done in next version. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Allow logical replication to copy tables in binary format
> If in future version the general data type is added, the copy command in > binary > format will not work well, right? It is because the inferior version does not > have > recv/send functions for added type. It means that there is a possibility that > replication between different versions may be failed if binary format is > specified. > Therefore, I think we should accept copy_format = binary only when the major > version of servers are the same. I don't think it's necessary to check versions. Yes, there are situations where binary will fail across major versions. But in many cases it does not. To me it seems the responsibility of the operator to evaluate this risk. And if the operator chooses wrong and uses binary copy across incompatible versions, then it will still fail hard in that case during the copy phase (so still a very early error). So I don't see a reason to check pre-emptively, afaict it will only disallow some valid usecases and introduce more code. Furthermore no major version check is done for "binary = true" either (afaik). The only additional failure scenario that copy_format=binary introduces is when one of the types does not implement a send function on the source. With binary=true, this would continue to work, but with copy_format=binary this stops working. All other failure scenarios that binary encoding of types introduces apply to both binary=true and copy_format=binary (the only difference being in which phase of the replication these failures happen, the apply or the copy phase). > I'm not sure the combination of "copy_format = binary" and "copy_data = false" > should be accepted or not. How do you think? It seems quite useless indeed to specify the format of a copy that won't happen.
RE: Allow logical replication to copy tables in binary format
Dear Melih, Thank you for updating the patch! Followings are my comments. 01. catalogs.sgml ``` If true, the subscription will request that the publisher send data - in binary format ``` I'm not clear here. subbinary does not directly mean that whether the worker requests to send data or not. How about: If true, the subscription will request that the publisher send data in binary format, except initial data synchronization 02. create_subscription.sgml ``` + the binary format is very data type specific, it will not allow copying + between different column types as opposed to text format. Note that ``` The name of formats are not specified as , whereas in previous sentence they are. We can use format either of them. 03. parse_subscription_options() I'm not sure the combination of "copy_format = binary" and "copy_data = false" should be accepted or not. How do you think? 04. parse_subscription_options() ``` + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("%s and %s are mutually exclusive options", + "binary = false", "copy_format = binary"))); + } ``` A comment for translator seemed to be missed. 05. CreateSubscription() ``` values[Anum_pg_subscription_suborigin - 1] = CStringGetTextDatum(opts.origin); + values[Anum_pg_subscription_subcopyformat - 1] = CharGetDatum(opts.copy_format); ``` I think this should be done the same ordering with FormData_pg_subscription. Maybe after the line? ``` values[Anum_pg_subscription_subdisableonerr - 1] = BoolGetDatum(opts.disableonerr); ``` 06. AlterSubscription() If we decided not to accept "copy_format = binary" and "copy_data = false", here should be also fixed. 07. AlterSubscription() ``` + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg("cannot set %s for a subscription with %s", + "binary = false", "copy_format = binary"))); ... + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg("cannot set %s for a subscription with %s", + "copy_format = binary", "binary = false"))); ``` Comments for translator seemed to be missed. 08. defGetCopyFormat() ``` + /* +* If no parameter value given, set it to text format. +*/ + if (!def->arg) + return LOGICALREP_COPY_AS_TEXT; ``` I think the case no parameter is given should be rejected. It should be accepted only when the parameter has boolean data type. Note that defGetStreamingMode() is accepted no-parameter style due to the compatibility. At first streaming is boolean, and then "parallel" is added. 09. describeSubscriptions ``` + /* Copy format is only supported in v16 and higher */ ``` I think this comment should be atop of if statement, and it can mention about Origin too. 10. pg_subscription.h ``` + charsubcopyformat BKI_DEFAULT(LOGICALREP_COPY_AS_TEXT); /* Copy format * ``` I'm not sure whether BKI_DEFAULT() is needed or not. Other options like twophase does not have the default value as catalog level. The default is set in parse_subscription_options() and then the value will be set to catalog. 11. typedef struct Subscription In catalog entry the subcopyformat is aligned just after subdisableonerr, but in struct Subscription, copyformat is added at the end. Can we place just after disableonerr? 12. Reply > Since binary copy relies on COPY command, we may have problems across > different server versions in cases where COPY is not portable. > What I understand from this [1], COPY works across server versions later > than 7.4. This shouldn't be a problem for logical replication. > Currently the patch also does not allow binary copy if the publisher > version is older than 16. If in future version the general data type is added, the copy command in binary format will not work well, right? It is because the inferior version does not have recv/send functions for added type. It means that there is a possibility that replication between different versions may be failed if binary format is specified. Therefore, I think we should accept copy_format = binary only when the major version of servers are the same. Note that this comments is not the request to the patch. Maybe the modification should be done not only for copy_format but also binary, and it may be out of scope the
Re: Allow logical replication to copy tables in binary format
On Mon, Feb 20, 2023 at 5:17 PM Melih Mutlu wrote: > > Thanks for letting me know. > Attached the fixed version of the patch. Thanks. I have few comments on v9 patch: 1. +/* Do not allow binary = false with copy_format = binary */ +if (!opts.binary && +sub->copyformat == LOGICALREP_COPY_AS_BINARY && +!IsSet(opts.specified_opts, SUBOPT_COPY_FORMAT)) +ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot set %s for a subscription with %s", +"binary = false", "copy_format = binary"))); I don't understand why we'd need to tie an option (binary) that deals with data types at column-level with another option (copy_format) that requests the entire table data to be in binary. This'd essentially make one to set binary = true to use copy_format = binary, no? IMHO, this inter-dependency is not good for better usability. 2. Why can't the tests that this patch adds be simple? Why would it need to change the existing tests at all? I'm thinking to create a new 00X_binary_copy_format.pl or such and setting up logical replication with copy_format = binary and letting table sync worker request publisher in binary format - you can verify this via publisher server logs - look for COPY with BINARY option. If required, have the table with different data types. This greatly reduces the patch's footprint. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Allow logical replication to copy tables in binary format
Amit Kapila , 16 Şub 2023 Per, 15:47 tarihinde şunu yazdı: > On Mon, Jan 30, 2023 at 4:19 PM Melih Mutlu > wrote: > >> 8. Note that the COPY binary format isn't portable across platforms > >> (Windows to Linux for instance) or major versions > >> https://www.postgresql.org/docs/devel/sql-copy.html, whereas, logical > >> replication is > https://www.postgresql.org/docs/devel/logical-replication.html. > >> I don't see any handling of such cases in copy_table but only a check > >> for the publisher version. I think we need to account for all the > >> cases - allow binary COPY only when publisher and subscriber are of > >> same versions, architectures, platforms. The trick here is how we > >> identify if the subscriber is of the same type and taste > >> (architectures and platforms) as the publisher. Looking for > >> PG_VERSION_STR of publisher and subscriber might be naive, but I'm not > >> sure if there's a better way to do it. > > > > > > I think having the "copy_format" option with text as default, like I > replied to your 3rd review above, will keep things as they are now. > > The patch now will only allow users to choose binary copy as well, if > they want it and acknowledge the limitations that come with binary copy. > > COPY command's portability issues shouldn't be an issue right now, since > the patch still supports text format. Right? > > > > One thing that is not completely clear from above is whether we will > have any problem if the subscription uses binary mode for copying > across the server versions. Do we use binary file during the copy used > in logical replication? > Since binary copy relies on COPY command, we may have problems across different server versions in cases where COPY is not portable. What I understand from this [1], COPY works across server versions later than 7.4. This shouldn't be a problem for logical replication. Currently the patch also does not allow binary copy if the publisher version is older than 16. > > Logical replication between different types like int and smallint is already not > working properly on HEAD too. > > Yes, the scenario you shared looks like working. But you didn't create the > subscription with binary=true. The patch did not change subscription with > binary=false case. I believe what you should experiment is binary=true case > which already fails in the apply phase on HEAD. > > > > Well, with this patch, it will begin to fail in the table copy phase. But I don't > think this is a problem because logical replication in binary format is already > broken for replications between different data types. > > > > So, doesn't this mean that there is no separate failure mode during > the initial copy? I am clarifying this to see if the patch really > needs a separate copy_format option for initial sync? > It will fail in a case such as [2] while it would work on HEAD. What I meant by my above comment was that binary enabled subscriptions are not already working properly if they replicate between different types. So, the failure caused by replicating, for example, from smallint to int is not really introduced by this patch. Such subscriptions would fail in apply phase anyway. With this patch they will fail while binary copy. [1] https://www.postgresql.org/docs/current/sql-copy.html [2] https://www.postgresql.org/message-id/OSZPR01MB6310B58F069FF8E148B247FDFDA49%40OSZPR01MB6310.jpnprd01.prod.outlook.com Best, -- Melih Mutlu Microsoft
Re: Allow logical replication to copy tables in binary format
Hi, Hayato Kuroda (Fujitsu) , 20 Şub 2023 Pzt, 10:12 tarihinde şunu yazdı: > Dear Melih, > > Thank you for updating the patch. Before reviewing, I found that > cfbot have not accepted v8 patch [1]. > Thanks for letting me know. Attached the fixed version of the patch. Best, -- Melih Mutlu Microsoft v9-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
RE: Allow logical replication to copy tables in binary format
On Thu, Feb 16, 2023 8:48 PM Amit Kapila wrote: > > On Mon, Jan 30, 2023 at 4:19 PM Melih Mutlu > wrote: > > > > Logical replication between different types like int and smallint is > > already not > working properly on HEAD too. > > Yes, the scenario you shared looks like working. But you didn't create the > subscription with binary=true. The patch did not change subscription with > binary=false case. I believe what you should experiment is binary=true case > which already fails in the apply phase on HEAD. > > > > Well, with this patch, it will begin to fail in the table copy phase. But I > > don't > think this is a problem because logical replication in binary format is > already > broken for replications between different data types. > > > > So, doesn't this mean that there is no separate failure mode during > the initial copy? I am clarifying this to see if the patch really > needs a separate copy_format option for initial sync? > In the case that the data type doesn't have binary output function, for apply phase, the column will be sent in text format (see logicalrep_write_tuple()) and it works fine. But with copy_format = binary, the walsender exits with an error. For example: -- create table on publisher and subscriber CREATE TYPE myvarchar; CREATE FUNCTION myvarcharin(cstring, oid, integer) RETURNS myvarchar LANGUAGE internal IMMUTABLE PARALLEL SAFE STRICT AS 'varcharin'; CREATE FUNCTION myvarcharout(myvarchar) RETURNS cstring LANGUAGE internal IMMUTABLE PARALLEL SAFE STRICT AS 'varcharout'; CREATE TYPE myvarchar ( input = myvarcharin, output = myvarcharout, alignment = integer, storage = main ); CREATE TABLE tbl1 (a myvarchar); -- create publication and insert some data on publisher create publication pub for table tbl1; INSERT INTO tbl1 values ('a'); -- create subscription on subscriber create subscription sub connection 'dbname=postgres port=5432' publication pub with(binary, copy_format = binary); Then I got the following error in the publisher log. walsender ERROR: no binary output function available for type public.myvarchar walsender STATEMENT: COPY public.tbl1 (a) TO STDOUT WITH (FORMAT binary) Regards, Shi Yu
RE: Allow logical replication to copy tables in binary format
Dear Melih, Thank you for updating the patch. Before reviewing, I found that cfbot have not accepted v8 patch [1]. IIUC src/psql/describe.c has been modified in v8, but src/test/regress/expected/subscription.out has not been changed accordingly. [1]: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/42/3840 Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Allow logical replication to copy tables in binary format
On Mon, Jan 30, 2023 at 4:19 PM Melih Mutlu wrote: > > Hi Bharath, > > Thanks for reviewing. > > Bharath Rupireddy , 18 Oca 2023 Çar, > 10:17 tarihinde şunu yazdı: >> >> On Thu, Jan 12, 2023 at 1:53 PM Melih Mutlu wrote: >> 1. The performance numbers posted upthread [1] look impressive for the >> use-case tried, that's a 2.25X improvement or 55.6% reduction in >> execution times. However, it'll be great to run a few more varied >> tests to confirm the benefit. > > > Sure, do you have any specific test case or suggestion in mind? > >> >> 2. It'll be great to capture the perf report to see if the time spent >> in copy_table() is reduced with the patch. > > > Will provide that in another email soon. > >> >> 3. I think blending initial table sync's binary copy option with >> data-type level binary send/receive is not good. Moreover, data-type >> level binary send/receive has its own restrictions per 9de77b5453. >> IMO, it'll be good to have a new option, say copy_data_format synonyms >> with COPY command's FORMAT option but with only text (default) and >> binary values. > > > Added a "copy_format" option for subscriptions with text as default value. So > it would be possible to create a binary subscription but copy tables in text > format to avoid restrictions that you're concerned about. > >> >> 4. Why to add tests to existing 002_types.pl? Can we add a new file >> with all the data types covered? > > > Since 002_types.pl is where the replication of data types are covered. I > thought it would be okay to test replication with the binary option in that > file. > Sure, we can add a new file with different data types for testing > subscriptions with binary option. But then I feel like it would have too many > duplicated lines with 002_types.pl. > If you think that 002_types.pl lacks some data types needs to be tested, then > we should add those into 002_types.pl too whether we test subscription with > binary option in that file or some other place, right? > >> >> 5. It's not clear to me as to why these rows were removed in the patch? >> -(1, '{1, 2, 3}'), >> -(2, '{2, 3, 1}'), >> (3, '{3, 2, 1}'), >> (4, '{4, 3, 2}'), >> (5, '{5, NULL, 3}'); >> >> -- test_tbl_arrays >> INSERT INTO tst_arrays (a, b, c, d) VALUES >> -('{1, 2, 3}', '{"a", "b", "c"}', '{1.1, 2.2, 3.3}', '{"1 >> day", "2 days", "3 days"}'), >> -('{2, 3, 1}', '{"b", "c", "a"}', '{2.2, 3.3, 1.1}', '{"2 >> minutes", "3 minutes", "1 minute"}'), > > > Previously, it wasn't actually testing the initial table sync since all > tables were empty when subscription was created. > I just simply split the data initially inserted to test initial table sync. > > With this patch, it inserts the first two rows for all data types before > subscriptions get created. > You can see these lines: >> >> +# Insert initial test data >> +$node_publisher->safe_psql( >> + 'postgres', qq( >> + -- test_tbl_one_array_col >> + INSERT INTO tst_one_array (a, b) VALUES >> + (1, '{1, 2, 3}'), >> + (2, '{2, 3, 1}'); >> + >> + -- test_tbl_arrays >> + INSERT INTO tst_arrays (a, b, c, d) VALUES > > > >> >> 6. BTW, the subbinary description is missing in pg_subscription docs >> https://www.postgresql.org/docs/devel/catalog-pg-subscription.html? >> - Specifies whether the subscription will request the publisher to >> - send the data in binary format (as opposed to text). >> + Specifies whether the subscription will copy the initial data to >> + synchronize relations in binary format and also request the >> publisher >> + to send the data in binary format too (as opposed to text). > > > Done. > >> >> 7. A nitpick - space is needed after >= before 16. >> +if (walrcv_server_version(LogRepWorkerWalRcvConn) >=16 && > > > Done. > >> >> 8. Note that the COPY binary format isn't portable across platforms >> (Windows to Linux for instance) or major versions >> https://www.postgresql.org/docs/devel/sql-copy.html, whereas, logical >> replication is >> https://www.postgresql.org/docs/devel/logical-replication.html. >> I don't see any handling of such cases in copy_table but only a check >> for the publisher version. I think we need to account for all the >> cases - allow binary COPY only when publisher and subscriber are of >> same versions, architectures, platforms. The trick here is how we >> identify if the subscriber is of the same type and taste >> (architectures and platforms) as the publisher. Looking for >> PG_VERSION_STR of publisher and subscriber might be naive, but I'm not >> sure if there's a better way to do it. > > > I think having the "copy_format" option with text as default, like I replied > to your 3rd review above, will keep things as they are now. > The patch now will only allow users to choose binary copy as well, if they > want it and acknowledge the limitations that come with binary
Re: Allow logical replication to copy tables in binary format
Hi, Thanks for reviewing. Please see the v8 here [1] Takamichi Osumi (Fujitsu) , 1 Şub 2023 Çar, 06:05 tarihinde şunu yazdı: > (1) general comment > > I wondered if the addition of the new option/parameter can introduce some > confusion to the users. > > case 1. When binary = true and copy_format = text, the table sync is > conducted by text. > case 2. When binary = false and copy_format = binary, the table sync is > conducted by binary. > (Note that the case 2 can be accepted after addressing the 3rd comment of > Bharath-san in [1]. > I agree with the 3rd comment by itself.) > I replied to Bharath's comment [1], can you please check to see if that makes sense? > The name of the new subscription parameter looks confusing. > How about "init_sync_format" or something ? > The option to enable initial sync is named "copy_data", so I named the new parameter as "copy_format" to refer to that copy meant by "copy_data". Maybe "copy_data_format" would be better. I can change it if you think it's better. > (2) The commit message doesn't get updated. > Done > (3) whitespace errors. > Done > (4) pg_dump.c > Done > (5) describe.c > Done > (6) > > + * Extract the copy format value from a DefElem. > + */ > +char > +defGetCopyFormat(DefElem *def) > > Shouldn't this function be static and remove the change of > subscriptioncmds.h ? > I wanted to make "defGetCopyFormat" be consistent with "defGetStreamingMode" since they're basically doing the same work for different parameters. And that function isn't static, so I didn't make "defGetCopyFormat" static too. > (7) catalogs.sgml > Done (8) create_subscription.sgml > Done Also; The latest patch doesn't get updated for more than two weeks > after some review comments. If you don't have time, > I would like to help updating the patch for you and other reviewers. > > Kindly let me know your status. > Sorry for the delay. This patch is currently one of my priorities. Hopefully I will share quicker updates from now on. [1] https://www.postgresql.org/message-id/CAGPVpCQYi9AYQSS%3DRmGgVNjz5ZEnLB8mACwd9aioVhLmbgiAMA%40mail.gmail.com Thanks, -- Melih Mutlu Microsoft
Re: Allow logical replication to copy tables in binary format
Hi, Please see the attached patch for following changes. Bharath Rupireddy , 30 Oca 2023 Pzt, 15:34 tarihinde şunu yazdı: > On Mon, Jan 30, 2023 at 4:19 PM Melih Mutlu > wrote: It'd be better and clearer to have a separate TAP test file IMO since > the focus of the feature here isn't to just test for data types. With > separate tests, you can verify "ERROR: incorrect binary data format > in logical replication column 1" cases. > Moved some tests from 002_types.pl to 014_binary.pl since this is where most binary features are tested. It covers now "incorrect data format" cases too. Also added some regression tests for copy_format parameter. > With the above said, do you think checking for publisher versions is > needed? The user can decide to enable/disable binary COPY based on the > publisher's version no? > +/* If the publisher is v16 or later, specify the format to copy data. > */ > +if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 16) > +{ > If the user decides to enable it, then it might be nice to not allow it for previous versions. But I'm not sure. I'm okay to remove it if you all agree. > Few more comments on v7: > 1. > + Specifies the format in which pre-existing data on the > publisher will > + copied to the subscriber. Supported formats are > + text and binary. The > default is > + text. > It'll be good to call out the cases in the documentation as to where > copy_format can be enabled and needs to be disabled. > Modified that description a bit. Can you check if that's okay now? > 2. > + errmsg("%s value should be either \"text\" or \"binary\"", > How about "value must be either "? > Done > 3. > Why should the subscription's binary option and copy_format option be > tied at all? Tying these two options hurts usability. Is there a > fundamental reason? I think they both are/must be independent. One > deals with data types and another deals with how initial table data is > copied. > My initial purpose with this patch was just making subscriptions with binary option enabled fully binary from initial copy to apply. Things have changed a bit when we decided to move binary copy behind a parameter. I didn't actually think there would be any use case where a user wants the initial copy to be in binary format for a sub with binary = false. Do you think it would be useful to copy in binary even for a sub with binary disabled? Thanks, -- Melih Mutlu Microsoft v8-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
RE: Allow logical replication to copy tables in binary format
Hi, Melih On Monday, January 30, 2023 7:50 PM Melih Mutlu wrote: > Thanks for reviewing. ... > Well, with this patch, it will begin to fail in the table copy phase... The latest patch doesn't get updated for more than two weeks after some review comments. If you don't have time, I would like to help updating the patch for you and other reviewers. Kindly let me know your status. Best Regards, Takamichi Osumi
RE: Allow logical replication to copy tables in binary format
On Monday, January 30, 2023 7:50 PM Melih Mutlu wrote: > Thanks for reviewing. ... > Please see [1] and you'll get the following error in your case: > "ERROR: incorrect binary data format in logical replication column 1" Hi, thanks for sharing v7. (1) general comment I wondered if the addition of the new option/parameter can introduce some confusion to the users. case 1. When binary = true and copy_format = text, the table sync is conducted by text. case 2. When binary = false and copy_format = binary, the table sync is conducted by binary. (Note that the case 2 can be accepted after addressing the 3rd comment of Bharath-san in [1]. I agree with the 3rd comment by itself.) The name of the new subscription parameter looks confusing. How about "init_sync_format" or something ? (2) The commit message doesn't get updated. The commit message needs to mention the new subscription option. (3) whitespace errors. $ git am v7-0001-Allow-logical-replication-to-copy-table-in-binary.patch Applying: Allow logical replication to copy table in binary .git/rebase-apply/patch:95: trailing whitespace. copied to the subscriber. Supported formats are .git/rebase-apply/patch:101: trailing whitespace. that data will not be copied if copy_data = false. warning: 2 lines add whitespace errors. (4) pg_dump.c if (fout->remoteVersion >= 16) - appendPQExpBufferStr(query, " s.suborigin\n"); + appendPQExpBufferStr(query, " s.suborigin,\n"); else - appendPQExpBuffer(query, " '%s' AS suborigin\n", LOGICALREP_ORIGIN_ANY); + appendPQExpBuffer(query, " '%s' AS suborigin,\n", LOGICALREP_ORIGIN_ANY); + + if (fout->remoteVersion >= 16) + appendPQExpBufferStr(query, " s.subcopyformat\n"); + else + appendPQExpBuffer(query, " '%c' AS subcopyformat\n", LOGICALREP_COPY_AS_TEXT); This new branch for v16 can be made together with the previous same condition. (5) describe.c + + /* Copy format is only supported in v16 and higher */ + if (pset.sversion >= 16) + appendPQExpBuffer(, + ", subcopyformat AS \"%s\"\n", + gettext_noop("Copy Format")); Similarly to (4), this creates a new branch for v16. Please see the above codes of this part. (6) + * Extract the copy format value from a DefElem. + */ +char +defGetCopyFormat(DefElem *def) Shouldn't this function be static and remove the change of subscriptioncmds.h ? (7) catalogs.sgml The subcopyformat should be mentioned here and the current description for subbinary should be removed. (8) create_subscription.sgml + text. + + binary format can be selected only if Unnecessary blank line. [1] - https://www.postgresql.org/message-id/CALj2ACW5Oa7_v25iZb326UXvtM_tjQfw0Tc3hPJ8zN4FZqc9cw%40mail.gmail.com Best Regards, Takamichi Osumi
Re: Allow logical replication to copy tables in binary format
On Mon, Jan 30, 2023 at 4:19 PM Melih Mutlu wrote: > Thanks for providing an updated patch. >> On Thu, Jan 12, 2023 at 1:53 PM Melih Mutlu wrote: >> 1. The performance numbers posted upthread [1] look impressive for the >> use-case tried, that's a 2.25X improvement or 55.6% reduction in >> execution times. However, it'll be great to run a few more varied >> tests to confirm the benefit. > > Sure, do you have any specific test case or suggestion in mind? Have a huge amount of publisher's table (with mix of columns like int, text, double, bytea and so on) prior data so that the subscriber's table sync workers have to do a "good" amount of work to copy, then measure the copy_table time with and without patch. >> 2. It'll be great to capture the perf report to see if the time spent >> in copy_table() is reduced with the patch. > > Will provide that in another email soon. Thanks. >> 4. Why to add tests to existing 002_types.pl? Can we add a new file >> with all the data types covered? > > Since 002_types.pl is where the replication of data types are covered. I > thought it would be okay to test replication with the binary option in that > file. > Sure, we can add a new file with different data types for testing > subscriptions with binary option. But then I feel like it would have too many > duplicated lines with 002_types.pl. > If you think that 002_types.pl lacks some data types needs to be tested, then > we should add those into 002_types.pl too whether we test subscription with > binary option in that file or some other place, right? > > Previously, it wasn't actually testing the initial table sync since all > tables were empty when subscription was created. > I just simply split the data initially inserted to test initial table sync. > > With this patch, it inserts the first two rows for all data types before > subscriptions get created. > You can see these lines: It'd be better and clearer to have a separate TAP test file IMO since the focus of the feature here isn't to just test for data types. With separate tests, you can verify "ERROR: incorrect binary data format in logical replication column 1" cases. >> 8. Note that the COPY binary format isn't portable across platforms >> (Windows to Linux for instance) or major versions >> https://www.postgresql.org/docs/devel/sql-copy.html, whereas, logical >> replication is >> https://www.postgresql.org/docs/devel/logical-replication.html. >> I don't see any handling of such cases in copy_table but only a check >> for the publisher version. I think we need to account for all the >> cases - allow binary COPY only when publisher and subscriber are of >> same versions, architectures, platforms. The trick here is how we >> identify if the subscriber is of the same type and taste >> (architectures and platforms) as the publisher. Looking for >> PG_VERSION_STR of publisher and subscriber might be naive, but I'm not >> sure if there's a better way to do it. > > I think having the "copy_format" option with text as default, like I replied > to your 3rd review above, will keep things as they are now. > The patch now will only allow users to choose binary copy as well, if they > want it and acknowledge the limitations that come with binary copy. > COPY command's portability issues shouldn't be an issue right now, since the > patch still supports text format. Right? With the above said, do you think checking for publisher versions is needed? The user can decide to enable/disable binary COPY based on the publisher's version no? +/* If the publisher is v16 or later, specify the format to copy data. */ +if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 16) +{ Few more comments on v7: 1. + Specifies the format in which pre-existing data on the publisher will + copied to the subscriber. Supported formats are + text and binary. The default is + text. It'll be good to call out the cases in the documentation as to where copy_format can be enabled and needs to be disabled. 2. + errmsg("%s value should be either \"text\" or \"binary\"", How about "value must be either "? 3. +if (!opts->binary && +opts->copy_format == LOGICALREP_COPY_AS_BINARY) +{ +ereport(ERROR, +(errcode(ERRCODE_SYNTAX_ERROR), + errmsg("%s and %s are mutually exclusive options", +"binary = false", "copy_format = binary"))); +"CREATE SUBSCRIPTION tap_sub_binary CONNECTION '$publisher_connstr' PUBLICATION tap_pub WITH (slot_name = tap_sub_binary_slot, binary = true, copy_format = 'binary')" Why should the subscription's binary option and copy_format option be tied at all? Tying these two options hurts usability. Is there a fundamental reason? I think they both are/must be independent. One deals with data types and another deals with how initial table data is copied. -- Bharath Rupireddy PostgreSQL Contributors
Re: Allow logical replication to copy tables in binary format
Hi Bharath, Thanks for reviewing. Bharath Rupireddy , 18 Oca 2023 Çar, 10:17 tarihinde şunu yazdı: > On Thu, Jan 12, 2023 at 1:53 PM Melih Mutlu > wrote: > 1. The performance numbers posted upthread [1] look impressive for the > use-case tried, that's a 2.25X improvement or 55.6% reduction in > execution times. However, it'll be great to run a few more varied > tests to confirm the benefit. > Sure, do you have any specific test case or suggestion in mind? > 2. It'll be great to capture the perf report to see if the time spent > in copy_table() is reduced with the patch. > Will provide that in another email soon. > 3. I think blending initial table sync's binary copy option with > data-type level binary send/receive is not good. Moreover, data-type > level binary send/receive has its own restrictions per 9de77b5453. > IMO, it'll be good to have a new option, say copy_data_format synonyms > with COPY command's FORMAT option but with only text (default) and > binary values. > Added a "copy_format" option for subscriptions with text as default value. So it would be possible to create a binary subscription but copy tables in text format to avoid restrictions that you're concerned about. > 4. Why to add tests to existing 002_types.pl? Can we add a new file > with all the data types covered? > Since 002_types.pl is where the replication of data types are covered. I thought it would be okay to test replication with the binary option in that file. Sure, we can add a new file with different data types for testing subscriptions with binary option. But then I feel like it would have too many duplicated lines with 002_types.pl. If you think that 002_types.pl lacks some data types needs to be tested, then we should add those into 002_types.pl too whether we test subscription with binary option in that file or some other place, right? > 5. It's not clear to me as to why these rows were removed in the patch? > -(1, '{1, 2, 3}'), > -(2, '{2, 3, 1}'), > (3, '{3, 2, 1}'), > (4, '{4, 3, 2}'), > (5, '{5, NULL, 3}'); > > -- test_tbl_arrays > INSERT INTO tst_arrays (a, b, c, d) VALUES > -('{1, 2, 3}', '{"a", "b", "c"}', '{1.1, 2.2, 3.3}', '{"1 > day", "2 days", "3 days"}'), > -('{2, 3, 1}', '{"b", "c", "a"}', '{2.2, 3.3, 1.1}', '{"2 > minutes", "3 minutes", "1 minute"}'), > Previously, it wasn't actually testing the initial table sync since all tables were empty when subscription was created. I just simply split the data initially inserted to test initial table sync. With this patch, it inserts the first two rows for all data types before subscriptions get created. You can see these lines: > +# Insert initial test data > +$node_publisher->safe_psql( > + 'postgres', qq( > + -- test_tbl_one_array_col > + INSERT INTO tst_one_array (a, b) VALUES > + (1, '{1, 2, 3}'), > + (2, '{2, 3, 1}'); > + > + -- test_tbl_arrays > + INSERT INTO tst_arrays (a, b, c, d) VALUES > 6. BTW, the subbinary description is missing in pg_subscription docs > https://www.postgresql.org/docs/devel/catalog-pg-subscription.html? > - Specifies whether the subscription will request the publisher to > - send the data in binary format (as opposed to text). > + Specifies whether the subscription will copy the initial data to > + synchronize relations in binary format and also request the > publisher > + to send the data in binary format too (as opposed to text). > Done. > 7. A nitpick - space is needed after >= before 16. > +if (walrcv_server_version(LogRepWorkerWalRcvConn) >=16 && > Done. > 8. Note that the COPY binary format isn't portable across platforms > (Windows to Linux for instance) or major versions > https://www.postgresql.org/docs/devel/sql-copy.html, whereas, logical > replication is > https://www.postgresql.org/docs/devel/logical-replication.html. > I don't see any handling of such cases in copy_table but only a check > for the publisher version. I think we need to account for all the > cases - allow binary COPY only when publisher and subscriber are of > same versions, architectures, platforms. The trick here is how we > identify if the subscriber is of the same type and taste > (architectures and platforms) as the publisher. Looking for > PG_VERSION_STR of publisher and subscriber might be naive, but I'm not > sure if there's a better way to do it. > I think having the "copy_format" option with text as default, like I replied to your 3rd review above, will keep things as they are now. The patch now will only allow users to choose binary copy as well, if they want it and acknowledge the limitations that come with binary copy. COPY command's portability issues shouldn't be an issue right now, since the patch still supports text format. Right? > Also, the COPY binary format is very data type specific - per the docs > "for example it will not
Re: Allow logical replication to copy tables in binary format
On Thu, Jan 12, 2023 at 1:53 PM Melih Mutlu wrote: > > Hi, > > Thanks for your reviews. Thanks. I have some comments: 1. The performance numbers posted upthread [1] look impressive for the use-case tried, that's a 2.25X improvement or 55.6% reduction in execution times. However, it'll be great to run a few more varied tests to confirm the benefit. 2. It'll be great to capture the perf report to see if the time spent in copy_table() is reduced with the patch. 3. I think blending initial table sync's binary copy option with data-type level binary send/receive is not good. Moreover, data-type level binary send/receive has its own restrictions per 9de77b5453. IMO, it'll be good to have a new option, say copy_data_format synonyms with COPY command's FORMAT option but with only text (default) and binary values. 4. Why to add tests to existing 002_types.pl? Can we add a new file with all the data types covered? 5. It's not clear to me as to why these rows were removed in the patch? -(1, '{1, 2, 3}'), -(2, '{2, 3, 1}'), (3, '{3, 2, 1}'), (4, '{4, 3, 2}'), (5, '{5, NULL, 3}'); -- test_tbl_arrays INSERT INTO tst_arrays (a, b, c, d) VALUES -('{1, 2, 3}', '{"a", "b", "c"}', '{1.1, 2.2, 3.3}', '{"1 day", "2 days", "3 days"}'), -('{2, 3, 1}', '{"b", "c", "a"}', '{2.2, 3.3, 1.1}', '{"2 minutes", "3 minutes", "1 minute"}'), 6. BTW, the subbinary description is missing in pg_subscription docs https://www.postgresql.org/docs/devel/catalog-pg-subscription.html? - Specifies whether the subscription will request the publisher to - send the data in binary format (as opposed to text). + Specifies whether the subscription will copy the initial data to + synchronize relations in binary format and also request the publisher + to send the data in binary format too (as opposed to text). 7. A nitpick - space is needed after >= before 16. +if (walrcv_server_version(LogRepWorkerWalRcvConn) >=16 && 8. Note that the COPY binary format isn't portable across platforms (Windows to Linux for instance) or major versions https://www.postgresql.org/docs/devel/sql-copy.html, whereas, logical replication is https://www.postgresql.org/docs/devel/logical-replication.html. I don't see any handling of such cases in copy_table but only a check for the publisher version. I think we need to account for all the cases - allow binary COPY only when publisher and subscriber are of same versions, architectures, platforms. The trick here is how we identify if the subscriber is of the same type and taste (architectures and platforms) as the publisher. Looking for PG_VERSION_STR of publisher and subscriber might be naive, but I'm not sure if there's a better way to do it. Also, the COPY binary format is very data type specific - per the docs "for example it will not work to output binary data from a smallint column and read it into an integer column, even though that would work fine in text format.". I did a small experiment [2], the logical replication works with compatible data types (int -> smallint, even int -> text), whereas the COPY binary format doesn't. I think it'll complicate things a bit to account for the above cases and allow COPY with binary format for logical replication. [1] https://www.postgresql.org/message-id/CAGPVpCQEKDVKQPf6OFQ-9WiRYB1YRejm--YJTuwgzuvj1LEJ2A%40mail.gmail.com [2] DROP TABLE foo; DROP PUBLICATION mypub; CREATE TABLE foo(c1 bigint, c2 int, c3 smallint); INSERT INTO foo SELECT i , i+1, i+2 FROM generate_series(1, 5) i; CREATE PUBLICATION mypub FOR TABLE foo; SELECT COUNT(*) FROM foo; SELECT * FROM foo; DROP SUBSCRIPTION mysub; DROP TABLE foo; CREATE TABLE foo(c1 smallint, c2 smallint, c3 smallint); -- works without any problem -- OR CREATE TABLE foo(c1 smallint, c2 text, c3 smallint); -- works without any problem CREATE SUBSCRIPTION mysub CONNECTION 'port=5432 dbname=postgres user=ubuntu' PUBLICATION mypub; SELECT COUNT(*) FROM foo; SELECT * FROM foo; drop table foo; create table foo(c1 bigint, c2 int, c3 smallint); insert into foo select i, i+1, i+2 from generate_series(1, 10) i; copy foo(c1, c2, c3) to '/home/ubuntu/postgres/inst/bin/data/foo.text' with (format 'text'); copy foo(c1, c2, c3) to '/home/ubuntu/postgres/inst/bin/data/foo.binary' with (format 'binary'); drop table bar; create table bar(c1 smallint, c2 smallint, c3 smallint); -- or create table bar(c1 smallint, c2 text, c3 smallint); copy bar(c1, c2, c3) from '/home/ubuntu/postgres/inst/bin/data/foo.text' with (format 'text'); copy bar(c1, c2, c3) from '/home/ubuntu/postgres/inst/bin/data/foo.binary' with (format 'binary'); -- produces "ERROR: incorrect binary data format" -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Allow logical replication to copy tables in binary format
Hi, Thanks for your reviews. Takamichi Osumi (Fujitsu) , 12 Oca 2023 Per, 06:07 tarihinde şunu yazdı: > On Wednesday, January 11, 2023 7:45 PM Melih Mutlu > wrote: > (1-1) There is no need to log the version and the query by elog here. > (1-2) Also, I suggest we can remove the server_version variable itself, > because we have only one actual reference for it. > There is a style that we call walrcv_server_version in the > 'if condition' directly like existing codes in > fetch_remote_table_info(). > (1-3) Suggestion to improve comments. > FROM: > /* If the publisher is v16 or later, copy in binary format.*/ > TO: > /* If the publisher is v16 or later, copy data in the required data > format. */ > Forgot to remove that LOG line. Removed it now and applied other suggestions too. > I think if we change the order of this part of check like below, then > it would look more aligned with other existing test codes introduced by > this patch. > Right. Changed it to make it more aligned with the rest. Thanks, -- Melih Mutlu Microsoft v6-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
RE: Allow logical replication to copy tables in binary format
On Wednesday, January 11, 2023 7:45 PM Melih Mutlu wrote: > Thanks for your review. > Done. Hi, minor comments on v5. (1) publisher's version check + /* If the publisher is v16 or later, copy in binary format.*/ + server_version = walrcv_server_version(LogRepWorkerWalRcvConn); + if (server_version >=16 && MySubscription->binary) + { + appendStringInfoString(, " WITH (FORMAT binary)"); + options = lappend(options, makeDefElem("format", (Node *) makeString("binary"), -1)); + } + + elog(LOG, "version: %i, %s", server_version, cmd.data); (1-1) There is no need to log the version and the query by elog here. (1-2) Also, I suggest we can remove the server_version variable itself, because we have only one actual reference for it. There is a style that we call walrcv_server_version in the 'if condition' directly like existing codes in fetch_remote_table_info(). (1-3) Suggestion to improve comments. FROM: /* If the publisher is v16 or later, copy in binary format.*/ TO: /* If the publisher is v16 or later, copy data in the required data format. */ (2) Minor suggestion for some test code alignment. $result = $node_subscriber->safe_psql('postgres', "SELECT sum(a) FROM tst_dom_constr"); -is($result, '21', 'sql-function constraint on domain'); +is($result, '33', 'sql-function constraint on domain'); + +$result_binary = + $node_subscriber->safe_psql('postgres', + "SELECT sum(a) FROM tst_dom_constr"); +is($result_binary, '33', 'sql-function constraint on domain'); I think if we change the order of this part of check like below, then it would look more aligned with other existing test codes introduced by this patch. --- my $domain_check = 'SELECT sum(a) FROM tst_dom_constr'; $result = $node_subscriber->safe_psql('postgres', $domain_check); $result_binary = $node_subscriber->safe_psql('postgres', $domain_check); is($result, '33', 'sql-function constraint on domain'); is($result_binary, '33', 'sql-function constraint on domain in binary'); --- Best Regards, Takamichi Osumi
Re: Allow logical replication to copy tables in binary format
On Wed, 11 Jan 2023 at 16:14, Melih Mutlu wrote: > > Hi, > > Thanks for your review. > > shiy.f...@fujitsu.com , 11 Oca 2023 Çar, 11:56 > tarihinde şunu yazdı: >> >> On Mon, Nov 14, 2022 8:08 PM Melih Mutlu wrote: >> 1. >> +# Binary enabled subscription should fail >> +$node_subscriber_binary->wait_for_log("ERROR: insufficient data left in >> message"); >> >> Should it be changed to "ERROR: ( [A-Z0-9]+:)? ", like other subscription >> tests. > > > Done. > >> >> 2. >> +# Binary disabled subscription should succeed >> +$node_publisher->wait_for_catchup('tap_sub'); >> >> If we want to wait for table synchronization to finish, should we call >> wait_for_subscription_sync()? > > > Done. > >> >> 3. >> I also think it might be better to support copy binary only for publishers of >> v16 or later. Do you plan to implement it in the patch? > > > Done. For some reason CFBot is not able to apply the patch as in [1], Could you have a look and post an updated patch if required: === Applying patches on top of PostgreSQL commit ID c96de2ce1782116bd0489b1cd69ba88189a495e8 === === applying patch ./v5-0001-Allow-logical-replication-to-copy-table-in-binary.patch gpatch: Only garbage was found in the patch input. [1] - http://cfbot.cputube.org/patch_41_3840.log Regards, Vignesh
Re: Allow logical replication to copy tables in binary format
Hi, Thanks for your review. shiy.f...@fujitsu.com , 11 Oca 2023 Çar, 11:56 tarihinde şunu yazdı: > On Mon, Nov 14, 2022 8:08 PM Melih Mutlu wrote: > 1. > +# Binary enabled subscription should fail > +$node_subscriber_binary->wait_for_log("ERROR: insufficient data left in > message"); > > Should it be changed to "ERROR: ( [A-Z0-9]+:)? ", like other subscription > tests. > Done. > 2. > +# Binary disabled subscription should succeed > +$node_publisher->wait_for_catchup('tap_sub'); > > If we want to wait for table synchronization to finish, should we call > wait_for_subscription_sync()? > Done. > 3. > I also think it might be better to support copy binary only for publishers > of > v16 or later. Do you plan to implement it in the patch? > Done. Thanks, -- Melih Mutlu Microsoft v5-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
RE: Allow logical replication to copy tables in binary format
On Mon, Nov 14, 2022 8:08 PM Melih Mutlu wrote: > > Attached patch with updated version of this patch. Thanks for your patch. I tried to do a performance test for this patch, the result looks good to me. (The steps are similar to what Melih shared [1].) The time to synchronize about 1GB data in binary (the average of the middle eight was taken): HEAD: 16.854 s Patched: 6.805 s Besides, here are some comments. 1. +# Binary enabled subscription should fail +$node_subscriber_binary->wait_for_log("ERROR: insufficient data left in message"); Should it be changed to "ERROR: ( [A-Z0-9]+:)? ", like other subscription tests. 2. +# Binary disabled subscription should succeed +$node_publisher->wait_for_catchup('tap_sub'); If we want to wait for table synchronization to finish, should we call wait_for_subscription_sync()? 3. I also think it might be better to support copy binary only for publishers of v16 or later. Do you plan to implement it in the patch? [1] https://www.postgresql.org/message-id/CAGPVpCQEKDVKQPf6OFQ-9WiRYB1YRejm--YJTuwgzuvj1LEJ2A%40mail.gmail.com Regards, Shi yu
Re: Allow logical replication to copy tables in binary format
Hello, osumi.takami...@fujitsu.com , 12 Eki 2022 Çar, 04:36 tarihinde şunu yazdı: > > I agree with the direction to support binary copy for v16 and > later. > > > > IIUC, the binary format replication with different data types > fails even > > during apply phase on HEAD. > > I thought that means, the upgrade concern only applies to a > scenario > > that the user executes > > only initial table synchronizations between the publisher and > subscriber > > and doesn't replicate any data at apply phase after that. I would > say > > this isn't a valid scenario and your proposal makes sense. > > > > No, logical replication in binary does not fail on apply phase if data > types are > > different. > With HEAD, I observe in some case we fail at apply phase because of > different data types like > integer vs. bigint as written scenario in [1]. In short, I think we can > slightly > adjust your documentation and make it more general so that the description > applies to > both table sync phase and apply phase. > Yes, you're right. I somehow had the impression that HEAD supports replication between different types in binary. But as can be shown in the scenario you mentioned, it does not work. I'll suggest a below change for your sentence of logical-replication.sgml. > FROM: > In binary case, it is not allowed to replicate data between different > types due to restrictions inherited from COPY. > TO: > Binary format is type specific and does not allow to replicate data > between different types according to its > restrictions. > In this case, this change makes sense since this patch does actually not introduce this issue. It already exists in HEAD too. > If my idea above is correct, then I feel we can remove all the fixes for > create_subscription.sgml. > I'm not sure if I should pursue this perspective of the document > improvement > any further after this email, since this isn't essentially because of this > patch. > I'm only keeping the following change in create_subscription.sgml to indicate binary option copies in binary format now. > - Specifies whether the subscription will request the publisher to > - send the data in binary format (as opposed to text). > + Specifies whether the subscription will copy the initial data to > + synchronize relations in binary format and also request the > publisher > + to send the data in binary format too (as opposed to text). > > The concern with upgrade (if data types are not the same) would be not > being > > able to create a new subscription with binary enabled or replicate new > tables > > added into publication. > > Replication of tables from existing subscriptions would not be affected > by this > > change since they will already be in the apply phase, not tablesync. > > Do you think this would still be an issue? > Okay, thanks for explaining this. I understand that > the upgrade concern applies to the table sync that is executed > between text format (before the patch) and binary format (after the patch). > I was thinking apply would work with different types in binary format. Since apply also would not work, then the scenario that I tried to explain earlier is not a concern anymore. Attached patch with updated version of this patch. Thanks, Melih v4-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data