On Mon, Jan 30, 2023 at 4:19 PM Melih Mutlu <m.melihmu...@gmail.com> wrote:
>

Thanks for providing an updated patch.

>> On Thu, Jan 12, 2023 at 1:53 PM Melih Mutlu <m.melihmu...@gmail.com> wrote:
>> 1. The performance numbers posted upthread [1] look impressive for the
>> use-case tried, that's a 2.25X improvement or 55.6% reduction in
>> execution times. However, it'll be great to run a few more varied
>> tests to confirm the benefit.
>
> Sure, do you have any specific test case or suggestion in mind?

Have a huge amount of publisher's table (with mix of columns like int,
text, double, bytea and so on) prior data so that the subscriber's
table sync workers have to do a "good" amount of work to copy, then
measure the copy_table time with and without patch.

>> 2. It'll be great to capture the perf report to see if the time spent
>> in copy_table() is reduced with the patch.
>
> Will provide that in another email soon.

Thanks.

>> 4. Why to add tests to existing 002_types.pl? Can we add a new file
>> with all the data types covered?
>
> Since 002_types.pl is where the replication of data types are covered. I 
> thought it would be okay to test replication with the binary option in that 
> file.
> Sure, we can add a new file with different data types for testing 
> subscriptions with binary option. But then I feel like it would have too many 
> duplicated lines with 002_types.pl.
> If you think that 002_types.pl lacks some data types needs to be tested, then 
> we should add those into 002_types.pl too whether we test subscription with 
> binary option in that file or some other place, right?
>
> Previously, it wasn't actually testing the initial table sync since all 
> tables were empty when subscription was created.
> I just simply split the data initially inserted to test initial table sync.
>
> With this patch, it inserts the first two rows for all data types before 
> subscriptions get created.
> You can see these lines:

It'd be better and clearer to have a separate TAP test file IMO since
the focus of the feature here isn't to just test for data types. With
separate tests, you can verify "ERROR:  incorrect binary data format
in logical replication column 1" cases.

>> 8. Note that the COPY binary format isn't portable across platforms
>> (Windows to Linux for instance) or major versions
>> https://www.postgresql.org/docs/devel/sql-copy.html, whereas, logical
>> replication is 
>> https://www.postgresql.org/docs/devel/logical-replication.html.
>> I don't see any handling of such cases in copy_table but only a check
>> for the publisher version. I think we need to account for all the
>> cases - allow binary COPY only when publisher and subscriber are of
>> same versions, architectures, platforms. The trick here is how we
>> identify if the subscriber is of the same type and taste
>> (architectures and platforms) as the publisher. Looking for
>> PG_VERSION_STR of publisher and subscriber might be naive, but I'm not
>> sure if there's a better way to do it.
>
> I think having the "copy_format" option with text as default, like I replied 
> to your 3rd review above, will keep things as they are now.
> The patch now will only allow users to choose binary copy as well, if they 
> want it and acknowledge the limitations that come with binary copy.
> COPY command's portability issues shouldn't be an issue right now, since the 
> patch still supports text format. Right?

With the above said, do you think checking for publisher versions is
needed? The user can decide to enable/disable binary COPY based on the
publisher's version no?
+    /* If the publisher is v16 or later, specify the format to copy data. */
+    if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 160000)
+    {

Few more comments on v7:
1.
+          Specifies the format in which pre-existing data on the publisher will
+          copied to the subscriber. Supported formats are
+          <literal>text</literal> and <literal>binary</literal>. The default is
+          <literal>text</literal>.
It'll be good to call out the cases in the documentation as to where
copy_format can be enabled and needs to be disabled.

2.
+             errmsg("%s value should be either \"text\" or \"binary\"",
How about "value must be either ...."?

3.
+    if (!opts->binary &&
+        opts->copy_format == LOGICALREP_COPY_AS_BINARY)
+    {
+        ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("%s and %s are mutually exclusive options",
+                                "binary = false", "copy_format = binary")));

+    "CREATE SUBSCRIPTION tap_sub_binary CONNECTION
'$publisher_connstr' PUBLICATION tap_pub WITH (slot_name =
tap_sub_binary_slot, binary = true, copy_format = 'binary')"
Why should the subscription's binary option and copy_format option be
tied at all? Tying these two options hurts usability. Is there a
fundamental reason? I think they both are/must be independent. One
deals with data types and another deals with how initial table data is
copied.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Reply via email to