It looks like there's an inconsistency in error handling during
START_REPLICATION command of replication protocol:

$ psql postgres://localhost/psycopg2test?replication=database
psql (9.6devel)
Type "help" for help.

psycopg2test=# IDENTIFY_SYSTEM;
      systemid       | timeline |  xlogpos  |    dbname
 6235978519197579707 |        1 | 0/2CE0F78 | psycopg2test
(1 row)

psycopg2test=# START_REPLICATION SLOT "TEST1" LOGICAL 0/0 ("invalid"
ERROR:  syntax error

1) syntax errors are reported and client can retry with corrected command:

psycopg2test=# START_REPLICATION SLOT "TEST1" LOGICAL 0/0 ("invalid"
ERROR:  replication slot name "TEST1" contains invalid character
HINT:  Replication slot names may only contain lower case letters, numbers,
and the underscore character.

2) further errors are reported and we can retry:

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid"
ERROR:  replication slot "test1" does not exist

psycopg2test=# CREATE_REPLICATION_SLOT "test1" LOGICAL "test_decoding";
 slot_name | consistent_point | snapshot_name | output_plugin
 test1     | 0/2CF5340        | 0000088C-1    | test_decoding
(1 row)

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid"
unexpected PQresultStatus: 8

The last command results in the following output sent to the server log:

ERROR:  option "invalid" = "value" is unknown
CONTEXT:  slot "test1", output plugin "test_decoding", in the startup

But the client has no way to learn about the error, nor can it restart with
correct one (the server has entered COPY protocol mode):

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0;
PQexec not allowed during COPY BOTH

I recall Craig Rigner mentioning this issue in context of the
pglogical_output plugin, but I thought that was something to do with the
startup packet.  The behavior above doesn't strike me as very consistent,
we should be able to detect and report errors during output plugin startup
and let the client retry with the corrected command as we do for syntax or
other errors.

I didn't look in the code yet, but if someone knows off top of the head the
reason to this behavior, I'd be glad to learn it.

*Oleksandr "Alex" Shulgin*
*Database Engineer*

Mobile: +49 160 84-90-639
Email: oleksandr.shul...@zalando.de

Reply via email to