Hi there, hope to find you well.
I have a follow-up question to this already long thread.
Upon deploying my PostgreSQL logical replication fed application on a stale
database, I ended up running out of space, as the replication slot is being
held back till the next time that we receive a data-changing event, and we
advance to that new LSN offset.
I think that the solution for this is to advance our LSN offset every time a
keep-alive message is received ('k' // 107).
My doubt is, can the keep-alive messages be received in between open
transaction events? I think not, but I would like to get your input to be extra
sure as if this happens, and I commit that offset, I may introduce again faulty
logic leading to data loss.
In sum, something like this wouldn't happen:
BEGIN LSN001
INSERT LSN002
KEEP LIVE LSN003
UPDATE LSN004
COMMIT LSN005
Correct? It has to be either:
KEEP LIVE LSN001
BEGIN LSN002
INSERT LSN003
UPDATE LSN004
COMMIT LSN005
Or:
BEGIN LSN001
INSERT LSN002
UPDATE LSN004
COMMIT LSN005
KEEP LIVE LSN006
LSNXXX are mere representations of LSN offsets.
Thank you again.
Regards,
José Neves
________________________________
De: Amit Kapila <[email protected]>
Enviado: 8 de agosto de 2023 14:37
Para: José Neves <[email protected]>
Cc: Andres Freund <[email protected]>; [email protected]
<[email protected]>
Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom
client
On Mon, Aug 7, 2023 at 1:46 PM José Neves <[email protected]> wrote:
>
> Humm, that's... challenging. I faced some issues after "the fix" because I
> had a couple of transactions with 25k updates, and I had to split it to be
> able to push to our event messaging system, as our max message size is 10MB.
> Relying on commit time would mean that all transaction operations will have
> the same timestamp. If something goes wrong while my worker is pushing that
> transaction data chunks, I will duplicate some data in the next run, so...
> this wouldn't allow me to deal with data duplication.
> Is there any other way that you see to deal with it?
>
> Right now I only see an option, which is to store all processed LSNs on the
> other side of the ETL. I'm trying to avoid that overhead.
>
Sorry, I don't understand your system enough to give you suggestions
but if you have any questions related to how logical replication work
then I might be able to help.
--
With Regards,
Amit Kapila.