On Mon, 10 Feb 2020 at 01:29, Arseny Sher <a.s...@postgrespro.ru> wrote:
>
> Hi,
>
> While jumping around partically decoded xacts questions [1], I've read
> through the copy replication slots code (9f06d79ef) and found a couple
> of issues.
>
> 1) It seems quite reckless to me to dive into
> DecodingContextFindStartpoint without actual WAL reservation (donors
> slot restart_lsn is used, but it is not acquired). Why risking erroring
> out with WAL removal error if the function advances new slot position to
> updated donors one in the end anyway?

Good catch. It's possible that DecodingContextFindStartpoint could
fail when the restart_lsn of the source slot is advanced and removed
required WAL.

>
> 2) In the end, restart_lsn of new slot is set to updated donors
> one. However, confirmed_flush field is not updated. This is just wrong
> -- we could start decoding too early and stream partially decoded
> transaction.

I think you are right.

>
> I'd probably avoid doing DecodingContextFindStartpoint at all. Its only
> purpose is to assemble consistent snapshot (and establish corresponding
> <restart_lsn, confirmed_flush_lsn> pair), but donor slot must have
> already done that and we could use it as well. Was this considered?

Skipping doing DecodingContextFindStartpoint while creating a new
destination logical slot seems sensible to me.

I've attached the draft patch fixing this issue but I'll continue
investigating it more deeply.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment: fix_copy_slot.patch
Description: Binary data

Reply via email to