On 2021-Aug-30, Andres Freund wrote: > I'm doubtful that the approach of adding awareness of record boundaries > is a good path to go down:
Honestly, I do not like it one bit and if I can avoid relying on them while making the whole thing work correctly, I am happy. Clearly it wasn't a problem for the ancient recovery-only WAL design, but as soon as we added replication on top the whole issue of continuation records became a bug. I do think that the code should be first correct and second performant, though. > - There are very similar issues with promotions of replicas (consider > what happens if we need to promote with the end of local WAL spanning > a segment boundary, and what happens to cascading replicas). We have > some logic to try to deal with that, but it's pretty grotty and I > think incomplete. Ouch, I hadn't thought of cascading replicas. > - It seems to make some future optimizations harder - we should work > towards replicating data sooner, rather than the opposite. Right now > that's a major bottleneck around syncrep. Absolutely. > I think a better approach might be to handle this on the WAL layout > level. What if we never overwrite partial records but instead just > skipped over them during decoding? Maybe this is a workable approach, let's work it out fully. Let me see if I understand what you mean: * We would remove the logic to inhibit archiving and streaming- replicating the tail end of a split WAL record; that logic deals with bytes only, so doesn't have to be aware of record boundaries. * On WAL replay, we ignore records that are split across a segment boundary and whose checksum does not match. * On WAL write ... ? How do we detect after recovery that a record that was being written, and potentially was sent to the archive, needs to be "skipped"? -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/