On Tue, Jan 3, 2023 at 2:14 PM Michail Nikolaev <michail.nikol...@gmail.com> wrote: > > > The point which is not completely clear from your description is the > > timing of missing records. In one of your previous emails, you seem to > > have indicated that the data missed from Table B is from the time when > > the initial sync for Table B was in-progress, right? Also, from your > > description, it seems there is no error or restart that happened > > during the time of initial sync for Table B. Is that understanding > > correct? > > Yes and yes. > * B sync started - 08:08:34 > * lost records are created - 09:49:xx > * B initial sync finished - 10:19:08 > * I/O error with WAL - 10:19:22 > * SIGTERM - 10:35:20 > > "Finished" here is `logical replication table synchronization worker > for subscription "cloud_production_main_sub_v4", table "B" has > finished`. > As far as I know, it is about COPY command. > > > I am not able to see how these steps can lead to the problem. > > One idea I have here - it is something related to the patch about > forbidding of canceling queries while waiting for synchronous > replication acknowledgement [1]. > It is applied to Postgres in the cloud we were using [2]. We started > to see such errors in 10:24:18: > > `The COMMIT record has already flushed to WAL locally and might > not have been replicated to the standby. We must wait here.` >
Does that by any chance mean you are using a non-community version of Postgres which has some other changes? > I wonder could it be some tricky race because of downtime of > synchronous replica and queries stuck waiting for ACK forever? > It is possible but ideally, in that case, the client should request such a transaction again. -- With Regards, Amit Kapila.