On Mon, 19 Feb 2024 at 11:15, Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > Dear hackers, > > > Since it may be useful, I will post top-up patch on Monday, if there are no > > updating. > > And here are top-up patches. Feel free to check and include. > > v22-0001: Same as v21-0001. > === rebased patches === > v22-0002: Update docs per recent changes. Same as v20-0002. > v22-0003: Add check versions of the target. Extracted from v20-0003. > v22-0004: Remove -S option. Mostly same as v20-0009, but commit massage was > slightly changed. > === Newbie === > V22-0005: Addressed my comments which seems to be trivial[1]. > Comments #1, 3, 4, 8, 10, 14, 17 were addressed here. > v22-0006: Consider the scenario when commands are failed after the recovery. > drop_subscription() is removed and some messages are added per [2]. > V22-0007: Revise server_is_in_recovery() per [1]. Comments #5, 6, 7, were > addressed here. > V22-0008: Fix a strange report when physical_primary_slot is null. Per > comment #9 [1]. > V22-0009: Prohibit reuse publications when it has already existed. Per > comments #11 and 12 [1]. > V22-0010: Avoid to call PQclear()/PQfinish()/pg_free() if the process exits > soon. Per comment #15 [1]. > V22-0011: Update testcode. Per comments #17- [1]. > > I did not handle below points because I have unclear points. > > a. > This patch set cannot detect the disconnection between the target (standby) > and > source (primary) during the catch up. Because the connection status must be > gotten > at the same time (=in the same query) with the recovery status, but now it is > now an > independed function (server_is_in_recovery()). > > b. > This patch set cannot detect the inconsistency reported by Shubham [3]. I > could not > come up with solutions without removing -P...
Few comments: 1) The below code can lead to assertion failure if the publisher is stopped while dropping the replication slot: + if (primary_slot_name != NULL) + { + conn = connect_database(dbinfo[0].pubconninfo); + if (conn != NULL) + { + drop_replication_slot(conn, &dbinfo[0], primary_slot_name); + } + else + { + pg_log_warning("could not drop replication slot \"%s\" on primary", + primary_slot_name); + pg_log_warning_hint("Drop this replication slot soon to avoid retention of WAL files."); + } + disconnect_database(conn); + } pg_createsubscriber: error: connection to database failed: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or directory Is the server running locally and accepting connections on that socket? pg_createsubscriber: warning: could not drop replication slot "standby_1" on primary pg_createsubscriber: hint: Drop this replication slot soon to avoid retention of WAL files. pg_createsubscriber: pg_createsubscriber.c:432: disconnect_database: Assertion `conn != ((void *)0)' failed. Aborted (core dumped) This is happening because we are calling disconnect_database in case of connection failure case too which has the following assert: +static void +disconnect_database(PGconn *conn) +{ + Assert(conn != NULL); + + PQfinish(conn); +} 2) There is a CheckDataVersion function which does exactly this, will we be able to use this: + /* Check standby server version */ + if ((ver_fd = fopen(versionfile, "r")) == NULL) + pg_fatal("could not open file \"%s\" for reading: %m", versionfile); + + /* Version number has to be the first line read */ + if (!fgets(rawline, sizeof(rawline), ver_fd)) + { + if (!ferror(ver_fd)) + pg_fatal("unexpected empty file \"%s\"", versionfile); + else + pg_fatal("could not read file \"%s\": %m", versionfile); + } + + /* Strip trailing newline and carriage return */ + (void) pg_strip_crlf(rawline); + + if (strcmp(rawline, PG_MAJORVERSION) != 0) + { + pg_log_error("standby server is of wrong version"); + pg_log_error_detail("File \"%s\" contains \"%s\", which is not compatible with this program's version \"%s\".", + versionfile, rawline, PG_MAJORVERSION); + exit(1); + } + + fclose(ver_fd); 3) Should this be added to typedefs.list: +enum WaitPMResult +{ + POSTMASTER_READY, + POSTMASTER_STILL_STARTING +}; 4) pgCreateSubscriber should be mentioned after pg_controldata to keep the ordering consistency: diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml index aa94f6adf6..c5edd244ef 100644 --- a/doc/src/sgml/reference.sgml +++ b/doc/src/sgml/reference.sgml @@ -285,6 +285,7 @@ &pgCtl; &pgResetwal; &pgRewind; + &pgCreateSubscriber; &pgtestfsync; 5) Here pg_replication_slots should be pg_catalog.pg_replication_slots: + if (primary_slot_name) + { + appendPQExpBuffer(str, + "SELECT 1 FROM pg_replication_slots " + "WHERE active AND slot_name = '%s'", + primary_slot_name); 6) Here pg_settings should be pg_catalog.pg_settings: + * - max_worker_processes >= 1 + number of dbs to be converted + *------------------------------------------------------------------------ + */ + res = PQexec(conn, + "SELECT setting FROM pg_settings WHERE name IN (" + "'max_logical_replication_workers', " + "'max_replication_slots', " + "'max_worker_processes', " + "'primary_slot_name') " + "ORDER BY name"); Regards, Vignesh