first phase: postgres read WAL files and generate 1420 snap files. second phase: I guess, but on this point maybe you can clarify, postgres has to decode the snap files and remove them if no statement must be applied on a replicated table. It is from this point that the worker process exit after 1 minute timeout.
On Wed, Jan 12, 2022 at 11:54 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > On Tue, Jan 11, 2022 at 8:13 PM Fabrice Chapuis <fabrice636...@gmail.com> > wrote: > >> Can you explain why you think this will help in solving your current >> problem? >> >> Indeed your are right this function won't help, we have to look elsewhere. >> >> It is still not clear to me why the problem happened? IIUC, after >> restoring 4096 changes from snap files, we send them to the subscriber, and >> then apply worker should apply those one by one. Now, is it taking one >> minute to restore 4096 changes due to which apply worker is timed out? >> >> Now I can easily reproduce the problem. >> In a first phase, snap files are generated and stored in pg_replslot. >> This process end when1420 files are present in pg_replslots (this is in >> relation with statements that must be replayed from WAL). In the >> pg_stat_replication view, the state field is set to *catchup*. >> In a 2nd phase, the snap files must be decoded. However after one minute >> (wal_receiver_timeout parameter set to 1 minute) the worker process stop >> with a timeout. >> >> > What exactly do you mean by the first and second phase in the above > description? > > -- > With Regards, > Amit Kapila. >