Dear Euler,

Thanks for updating the patch!

> One open item that is worrying me is how to handle the pg_ctl timeout. This
> patch does nothing and the user should use PGCTLTIMEOUT environment variable 
> to
> avoid that the execution is canceled after 60 seconds (default for pg_ctl).
> Even if you set a high value, it might not be enough for cases like
> time-delayed replica. Maybe pg_ctl should accept no timeout as --timeout
> option. I'll include this caveat into the documentation but I'm afraid it is
> not sufficient and we should provide a better way to handle this situation.

I felt you might be confused a bit. Even if the recovery_min_apply_delay is set,
e.g., 10h., the pg_ctl can start and stop the server. This is because the
walreceiver serialize changes as soon as they received. The delay is done by the
startup process. There are no unflushed data, so server instance can be turned 
off.
If you meant the combination of recovery-timeout and time-delayed replica, yes,
it would be likely to occur. But in the case, using like --no-timeout option is
dangerous. I think we should overwrite recovery_min_apply_delay to zero. 
Thought?

Below part contains my comments for v11-0001. Note that the ordering is random.

01. doc
```
    <group choice="req">
     <arg choice="plain"><option>-D</option> </arg>
     <arg choice="plain"><option>--pgdata</option></arg>
    </group>
```

According to other documentation like pg_upgrade, we do not write both longer
and shorter options in the synopsis section.

02. doc
```
  <para>
   <application>pg_createsubscriber</application> takes the publisher and 
subscriber
   connection strings, a cluster directory from a physical replica and a list of
   database names and it sets up a new logical replica using the physical
   recovery process.
  </para>

```

I found that you did not include my suggestion without saying [1]. Do you 
dislike
the comment or still considering?

03. doc
```
      <term><option>-P  <replaceable 
class="parameter">connstr</replaceable></option></term>
```

Too many blank after -P.

04. doc
```
    <para>
     <application>pg_createsubscriber</application> checks if the target data
     directory is used by a physical replica. Stop the physical replica if it is
     running. One of the next steps is to add some recovery parameters that
     requires a server start. This step avoids an error.
    </para>
```

I think doc is not updated. Currently (not sure it is good) the physical replica
must be running before doing pg_createsubscriber.

05. doc
```
     each specified database on the source server. The replication slot name
     contains a <literal>pg_createsubscriber</literal> prefix. These replication
     slots will be used by the subscriptions in a future step.  A temporary
```

According to the "Replication Slot Management" subsection in 
logical-replication.sgml,
the format of names should be described expressly, e.g.:

```
These replication slots have generated names:"pg_createsubscriber_%u_%d" 
(parameters: Subscription oid, Process id pid).
```

Publications, a temporary slot, and subscriptions should be described as well.

06. doc
```
     Next, <application>pg_createsubscriber</application> creates one 
publication
     for each specified database on the source server. Each publication
     replicates changes for all tables in the database. The publication name

```

Missing update. Publications are created before creating replication slots.

07. general
I think there are some commenting conversions in PG, but this file breaks it.

* Comments must be one-line as much as possible
* Periods must not be added when the comment is one-line.
* Initial character must be capital.

08. general
Some pg_log_error() + exit(1) can be replaced with pg_fatal().


09. LogicalRepInfo
```
        char       *subconninfo;        /* subscription connection string for 
logical
                                                                 * replication 
*/
```

As I posted in comment#8[2], I don't think it is "subscription connection". 
Also,
"for logical replication" is bit misreading because it would not be passed to
workers.

10. get_base_conninfo
```
static char *
get_base_conninfo(char *conninfo, char *dbname, const char *noderole)
...
                /*
                 * If --database option is not provided, try to obtain the 
dbname from
                 * the publisher conninfo. If dbname parameter is not 
available, error
                 * out.
                 */

```

I'm not sure getting dbname from the conninfo improves user-experience. I felt
it may trigger an unintended targeting.
(I still think the publisher-server should be removed)

11. check_data_directory
```
/*
 * Is it a cluster directory? These are preliminary checks. It is far from
 * making an accurate check. If it is not a clone from the publisher, it will
 * eventually fail in a future step.
 */
static bool
check_data_directory(const char *datadir)
```

We shoud also check whether pg_createsubscriber can create a file and a 
directory.
GetDataDirectoryCreatePerm() verifies it.

12. main
```
        /* Register a function to clean up objects in case of failure. */
        atexit(cleanup_objects_atexit);
```

According to the manpage, callback functions would not be called when it exits
due to signals:

> Functions  registered  using atexit() (and on_exit(3)) are not called if a
> process terminates abnormally because of the delivery of a signal.

Do you have a good way to handle the case? One solution is to output created
objects in any log level, but the consideration may be too much. Thought?

13, main
```
        /*
         * Create a temporary logical replication slot to get a consistent LSN.
```

Just to clarify - I still think the process exits before here in case of dry 
run.
In case of pg_resetwal, the process exits before doing actual works like
RewriteControlFile().

14. main
```
         * XXX we might not fail here. Instead, we provide a warning so the user
         * eventually drops this replication slot later.
```

But there are possibilities to exit(1) in drop_replication_slot(). Is it 
acceptable?

15. wait_for_end_recovery
```
                /*
                 * Bail out after recovery_timeout seconds if this option is 
set.
                 */
                if (recovery_timeout > 0 && timer >= recovery_timeout)
```

Hmm, IIUC, it should be enabled by default [3]. Do you have anything in your 
mind?

16. main
```
        /*
         * Create the subscription for each database on subscriber. It does not
         * enable it immediately because it needs to adjust the logical
         * replication start point to the LSN reported by consistent_lsn (see
         * set_replication_progress). It also cleans up publications created by
         * this tool and replication to the standby.
         */
        if (!setup_subscriber(dbinfo, consistent_lsn))
```

Subscriptions would be created and replication origin would be moved forward 
here,
but latter one can be done only by the superuser. I felt that this should be
checked in check_subscriber().

17. main
```
        /*
         * Change system identifier.
         */
        modify_sysid(pg_resetwal_path, subscriber_dir);
```

Even if I executed without -v option, an output from pg_resetwal command 
appears.
It seems bit strange.

```
$ pg_createsubscriber -D data_N2/ -P "port=5431 user=postgres" -S "port=5432 
user=postgres" -d postgres
Write-ahead log reset
$
```

[1]: 
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2]: 
https://www.postgresql.org/message-id/TY3PR01MB9889C362FF76102C88FA1C29F56F2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[3]: 
https://www.postgresql.org/message-id/CAA4eK1JRgnhv_ySzuFjN7UaX9qxz5Hqcwew7Fk%3D%2BAbJbu0Kd9w%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 



Reply via email to