On Mon, Jan 30, 2023 at 4:19 PM Melih Mutlu <m.melihmu...@gmail.com> wrote: >
Thanks for providing an updated patch. >> On Thu, Jan 12, 2023 at 1:53 PM Melih Mutlu <m.melihmu...@gmail.com> wrote: >> 1. The performance numbers posted upthread [1] look impressive for the >> use-case tried, that's a 2.25X improvement or 55.6% reduction in >> execution times. However, it'll be great to run a few more varied >> tests to confirm the benefit. > > Sure, do you have any specific test case or suggestion in mind? Have a huge amount of publisher's table (with mix of columns like int, text, double, bytea and so on) prior data so that the subscriber's table sync workers have to do a "good" amount of work to copy, then measure the copy_table time with and without patch. >> 2. It'll be great to capture the perf report to see if the time spent >> in copy_table() is reduced with the patch. > > Will provide that in another email soon. Thanks. >> 4. Why to add tests to existing 002_types.pl? Can we add a new file >> with all the data types covered? > > Since 002_types.pl is where the replication of data types are covered. I > thought it would be okay to test replication with the binary option in that > file. > Sure, we can add a new file with different data types for testing > subscriptions with binary option. But then I feel like it would have too many > duplicated lines with 002_types.pl. > If you think that 002_types.pl lacks some data types needs to be tested, then > we should add those into 002_types.pl too whether we test subscription with > binary option in that file or some other place, right? > > Previously, it wasn't actually testing the initial table sync since all > tables were empty when subscription was created. > I just simply split the data initially inserted to test initial table sync. > > With this patch, it inserts the first two rows for all data types before > subscriptions get created. > You can see these lines: It'd be better and clearer to have a separate TAP test file IMO since the focus of the feature here isn't to just test for data types. With separate tests, you can verify "ERROR: incorrect binary data format in logical replication column 1" cases. >> 8. Note that the COPY binary format isn't portable across platforms >> (Windows to Linux for instance) or major versions >> https://www.postgresql.org/docs/devel/sql-copy.html, whereas, logical >> replication is >> https://www.postgresql.org/docs/devel/logical-replication.html. >> I don't see any handling of such cases in copy_table but only a check >> for the publisher version. I think we need to account for all the >> cases - allow binary COPY only when publisher and subscriber are of >> same versions, architectures, platforms. The trick here is how we >> identify if the subscriber is of the same type and taste >> (architectures and platforms) as the publisher. Looking for >> PG_VERSION_STR of publisher and subscriber might be naive, but I'm not >> sure if there's a better way to do it. > > I think having the "copy_format" option with text as default, like I replied > to your 3rd review above, will keep things as they are now. > The patch now will only allow users to choose binary copy as well, if they > want it and acknowledge the limitations that come with binary copy. > COPY command's portability issues shouldn't be an issue right now, since the > patch still supports text format. Right? With the above said, do you think checking for publisher versions is needed? The user can decide to enable/disable binary COPY based on the publisher's version no? + /* If the publisher is v16 or later, specify the format to copy data. */ + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 160000) + { Few more comments on v7: 1. + Specifies the format in which pre-existing data on the publisher will + copied to the subscriber. Supported formats are + <literal>text</literal> and <literal>binary</literal>. The default is + <literal>text</literal>. It'll be good to call out the cases in the documentation as to where copy_format can be enabled and needs to be disabled. 2. + errmsg("%s value should be either \"text\" or \"binary\"", How about "value must be either ...."? 3. + if (!opts->binary && + opts->copy_format == LOGICALREP_COPY_AS_BINARY) + { + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("%s and %s are mutually exclusive options", + "binary = false", "copy_format = binary"))); + "CREATE SUBSCRIPTION tap_sub_binary CONNECTION '$publisher_connstr' PUBLICATION tap_pub WITH (slot_name = tap_sub_binary_slot, binary = true, copy_format = 'binary')" Why should the subscription's binary option and copy_format option be tied at all? Tying these two options hurts usability. Is there a fundamental reason? I think they both are/must be independent. One deals with data types and another deals with how initial table data is copied. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com