On Mon, 10 Feb 2020 at 01:29, Arseny Sher <a.s...@postgrespro.ru> wrote: > > Hi, > > While jumping around partically decoded xacts questions [1], I've read > through the copy replication slots code (9f06d79ef) and found a couple > of issues. > > 1) It seems quite reckless to me to dive into > DecodingContextFindStartpoint without actual WAL reservation (donors > slot restart_lsn is used, but it is not acquired). Why risking erroring > out with WAL removal error if the function advances new slot position to > updated donors one in the end anyway?
Good catch. It's possible that DecodingContextFindStartpoint could fail when the restart_lsn of the source slot is advanced and removed required WAL. > > 2) In the end, restart_lsn of new slot is set to updated donors > one. However, confirmed_flush field is not updated. This is just wrong > -- we could start decoding too early and stream partially decoded > transaction. I think you are right. > > I'd probably avoid doing DecodingContextFindStartpoint at all. Its only > purpose is to assemble consistent snapshot (and establish corresponding > <restart_lsn, confirmed_flush_lsn> pair), but donor slot must have > already done that and we could use it as well. Was this considered? Skipping doing DecodingContextFindStartpoint while creating a new destination logical slot seems sensible to me. I've attached the draft patch fixing this issue but I'll continue investigating it more deeply. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
fix_copy_slot.patch
Description: Binary data