RE: CDC/ETL system on top of logical replication with pgoutput, custom client

2023-10-25 Thread José Neves
Ok, I see. In that situation is safe indeed, as the offset is lower than the 
current transaction commit.
But I think that I asked the wrong question. I guess that the right question 
is: Can we receive a keep-alive message with an LSN offset bigger than the 
commit of the open or following transactions?
Something like:

BEGIN LSN001
INSERT LSN002
KEEP LIVE LSN006
UPDATE LSN004
COMMIT LSN005

Or:

KEEP LIVE LSN006
BEGIN LSN001
INSERT LSN002
UPDATE LSN004
COMMIT LSN005
KEEP LIVE LSN007

Or is the sequence ensured not only between commits but also with keep-alive 
messaging?

De: Ashutosh Bapat 
Enviado: 25 de outubro de 2023 11:42
Para: José Neves 
Cc: Amit Kapila ; Andres Freund ; 
pgsql-hack...@postgresql.org 
Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom 
client

On Tue, Oct 24, 2023 at 8:53 PM José Neves  wrote:
>
> Hi there, hope to find you well.
>
> I have a follow-up question to this already long thread.
>
> Upon deploying my PostgreSQL logical replication fed application on a stale 
> database, I ended up running out of space, as the replication slot is being 
> held back till the next time that we receive a data-changing event, and we 
> advance to that new LSN offset.
> I think that the solution for this is to advance our LSN offset every time a 
> keep-alive message is received ('k' // 107).
> My doubt is, can the keep-alive messages be received in between open 
> transaction events? I think not, but I would like to get your input to be 
> extra sure as if this happens, and I commit that offset, I may introduce 
> again faulty logic leading to data loss.
>
> In sum, something like this wouldn't happen:
> BEGIN LSN001
> INSERT LSN002
> KEEP LIVE LSN003
> UPDATE LSN004
> COMMIT LSN005
>

If the downstream acknowledges receipt of LSN003 and saves it locally
and crashes, upon restart the upstream will resend all the
transactions that committed after LSN003 including the one ended at
LSN005. So this is safe.

--
Best Wishes,
Ashutosh


RE: CDC/ETL system on top of logical replication with pgoutput, custom client

2023-10-24 Thread José Neves
Hi there, hope to find you well.

I have a follow-up question to this already long thread.

Upon deploying my PostgreSQL logical replication fed application on a stale 
database, I ended up running out of space, as the replication slot is being 
held back till the next time that we receive a data-changing event, and we 
advance to that new LSN offset.
I think that the solution for this is to advance our LSN offset every time a 
keep-alive message is received ('k' // 107).
My doubt is, can the keep-alive messages be received in between open 
transaction events? I think not, but I would like to get your input to be extra 
sure as if this happens, and I commit that offset, I may introduce again faulty 
logic leading to data loss.

In sum, something like this wouldn't happen:
BEGIN LSN001
INSERT LSN002
KEEP LIVE LSN003
UPDATE LSN004
COMMIT LSN005

Correct? It has to be either:

KEEP LIVE LSN001
BEGIN LSN002
INSERT LSN003
UPDATE LSN004
COMMIT LSN005

Or:
BEGIN LSN001
INSERT LSN002
UPDATE LSN004
COMMIT LSN005
KEEP LIVE LSN006

LSNXXX are mere representations of LSN offsets.

Thank you again.
Regards,
José Neves


De: Amit Kapila 
Enviado: 8 de agosto de 2023 14:37
Para: José Neves 
Cc: Andres Freund ; pgsql-hack...@postgresql.org 

Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom 
client

On Mon, Aug 7, 2023 at 1:46 PM José Neves  wrote:
>
> Humm, that's... challenging. I faced some issues after "the fix" because I 
> had a couple of transactions with 25k updates, and I had to split it to be 
> able to push to our event messaging system, as our max message size is 10MB. 
> Relying on commit time would mean that all transaction operations will have 
> the same timestamp. If something goes wrong while my worker is pushing that 
> transaction data chunks, I will duplicate some data in the next run, so... 
> this wouldn't allow me to deal with data duplication.
> Is there any other way that you see to deal with it?
>
> Right now I only see an option, which is to store all processed LSNs on the 
> other side of the ETL. I'm trying to avoid that overhead.
>

Sorry, I don't understand your system enough to give you suggestions
but if you have any questions related to how logical replication work
then I might be able to help.

--
With Regards,
Amit Kapila.


RE: CDC/ETL system on top of logical replication with pgoutput, custom client

2023-08-07 Thread José Neves
Hi Amit.

Humm, that's... challenging. I faced some issues after "the fix" because I had 
a couple of transactions with 25k updates, and I had to split it to be able to 
push to our event messaging system, as our max message size is 10MB. Relying on 
commit time would mean that all transaction operations will have the same 
timestamp. If something goes wrong while my worker is pushing that transaction 
data chunks, I will duplicate some data in the next run, so... this wouldn't 
allow me to deal with data duplication.
Is there any other way that you see to deal with it?

Right now I only see an option, which is to store all processed LSNs on the 
other side of the ETL. I'm trying to avoid that overhead.

Thanks.
Regards,
José Neves

De: Amit Kapila 
Enviado: 7 de agosto de 2023 05:59
Para: José Neves 
Cc: Andres Freund ; pgsql-hack...@postgresql.org 

Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom 
client

On Sun, Aug 6, 2023 at 7:54 PM José Neves  wrote:
>
> A follow-up on this. Indeed, a new commit-based approach solved my missing 
> data issues.
> But, getting back to the previous examples, how are server times expected to 
> be logged for the xlogs containing these records?
>

I think it should be based on commit_time because as far as I see we
can only get that on the client.

--
With Regards,
Amit Kapila.


RE: CDC/ETL system on top of logical replication with pgoutput, custom client

2023-08-06 Thread José Neves
Hi there, hope to find you all well.

A follow-up on this. Indeed, a new commit-based approach solved my missing data 
issues.
But, getting back to the previous examples, how are server times expected to be 
logged for the xlogs containing these records?

With these 2 transactions:
T-1
INSERT LSN1-1000
UPDATE LSN2-2000
UPDATE LSN3-3000
COMMIT  LSN4-4000

T-2
INSERT LSN1-500
UPDATE LSN2-1500
UPDATE LSN3-2500
COMMIT  LSN4-5500

Arriving this way:
BEGIN
INSERT LSN1-1000
UPDATE LSN2-2000
UPDATE LSN3-3000
COMMIT  LSN4-4000

BEGIN
INSERT LSN1-500
UPDATE LSN2-1500
UPDATE LSN3-2500
COMMIT  LSN4-5500

Are server times for them expected to be:

BEGIN
INSERT LSN1-1000 - 2
UPDATE LSN2-2000 - 4
UPDATE LSN3-3000 - 6
COMMIT  LSN4-4000 - 7

BEGIN
INSERT LSN1-500 - 1
UPDATE LSN2-1500 - 3
UPDATE LSN3-2500 - 5
COMMIT  LSN4-5500 - 8

Or:

BEGIN
INSERT LSN1-1000 - 1
UPDATE LSN2-2000 - 2
UPDATE LSN3-3000 - 3
COMMIT  LSN4-4000 - 4

BEGIN
INSERT LSN1-500 - 5
UPDATE LSN2-1500 - 6
UPDATE LSN3-2500 - 7
COMMIT  LSN4-5500 - 8

I'm asking because altho I'm no longer missing data, I have a second async 
process that can fail (publishing data to an event messaging service), and 
therefore there is a possibility of data duplication. Worst I've to split large 
transactions as message sizes are limited. Would be nice if I could rely on 
server time ts to discard duplicated data...

Thanks.
Regards,
José Neves

De: José Neves 
Enviado: 1 de agosto de 2023 10:13
Para: Andres Freund 
Cc: Amit Kapila ; pgsql-hack...@postgresql.org 

Assunto: RE: CDC/ETL system on top of logical replication with pgoutput, custom 
client

Hi Andres.

Owh, I see the error of my way... :(

By ignoring commits, and committing individual operation LSNs, I was 
effectively rolling back the subscription. In the previous example, if I 
committed the LSN of the first insert of the second transaction (LSN1-500), I 
was basically telling Postgres to send everything again, including the already 
processed T1.

> what you mean with the "different from the numeric order"
I'm probably lacking terminology. I mean that LSN4-5500 > LSN4-4000 > LSN3-3000 
> LSN3-2500...

But, if I'm understanding correctly, I can only rely on the incremental 
sequence to be true for the commit events. Which explains my pain.
The world makes sense again.

Thank you very much. Will try to implement this new logic, and hopefully not 
bug again with this issue.
Regards,
José Neves

De: Andres Freund 
Enviado: 1 de agosto de 2023 00:21
Para: José Neves 
Cc: Amit Kapila ; pgsql-hack...@postgresql.org 

Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom 
client

Hi,

On 2023-07-31 21:25:06 +, José Neves wrote:
> Ok, if I understood you correctly, I start to see where my logic is faulty. 
> Just to make sure that I got it right, taking the following example again:
>
> T-1
> INSERT LSN1-1000
> UPDATE LSN2-2000
> UPDATE LSN3-3000
> COMMIT  LSN4-4000
>
> T-2
> INSERT LSN1-500
> UPDATE LSN2-1500
> UPDATE LSN3-2500
> COMMIT  LSN4-5500
>
> Where data will arrive in this order:
>
> INSERT LSN1-500
> INSERT LSN1-1000
> UPDATE LSN2-1500
> UPDATE LSN2-2000
> UPDATE LSN3-2500
> UPDATE LSN3-3000
> COMMIT  LSN4-4000
> COMMIT  LSN4-5500

No, they won't arrive in that order. They will arive as

BEGIN
INSERT LSN1-1000
UPDATE LSN2-2000
UPDATE LSN3-3000
COMMIT  LSN4-4000
BEGIN
INSERT LSN1-500
UPDATE LSN2-1500
UPDATE LSN3-2500
COMMIT  LSN4-5500

Because T1 committed before T2. Changes are only streamed out at commit /
prepare transaction (*). Within a transaction, they however *will* be ordered
by LSN.

(*) Unless you use streaming mode, in which case it'll all be more
complicated, as you'll also receive changes for transactions that might still
abort.


> You are saying that the LSN3-3000 will never be missing, either the entire
> connection will fail at that point, or all should be received in the
> expected order (which is different from the "numeric order" of LSNs).

I'm not quite sure what you mean with the "different from the numeric order"
bit...


> If the connection is down, upon restart, I will receive the entire T-1
> transaction again (well, all example data again).

Yes, unless you already acknowledged receipt up to LSN4-4000 and/or are only
asking for newer transactions when reconnecting.


> In addition to that, if I commit LSN4-4000, even tho that LSN has a "bigger
> numeric value" than the ones representing INSERT and UPDATE events on T-2, I
> will be receiving the entire T-2 transaction again, as the LSN4-5500 is
> still uncommitted.

I don't quite know what you mean with "commit LSN4-4000" here.


> This makes sense to me, but just to be extra clear, I will never receive a
> transaction commit before receiving all other events for that transaction.

Correct.

Greetings,

Andres Freund


RE: CDC/ETL system on top of logical replication with pgoutput, custom client

2023-08-01 Thread José Neves
Hi Andres.

Owh, I see the error of my way... :(

By ignoring commits, and committing individual operation LSNs, I was 
effectively rolling back the subscription. In the previous example, if I 
committed the LSN of the first insert of the second transaction (LSN1-500), I 
was basically telling Postgres to send everything again, including the already 
processed T1.

> what you mean with the "different from the numeric order"
I'm probably lacking terminology. I mean that LSN4-5500 > LSN4-4000 > LSN3-3000 
> LSN3-2500...

But, if I'm understanding correctly, I can only rely on the incremental 
sequence to be true for the commit events. Which explains my pain.
The world makes sense again.

Thank you very much. Will try to implement this new logic, and hopefully not 
bug again with this issue.
Regards,
José Neves

De: Andres Freund 
Enviado: 1 de agosto de 2023 00:21
Para: José Neves 
Cc: Amit Kapila ; pgsql-hack...@postgresql.org 

Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom 
client

Hi,

On 2023-07-31 21:25:06 +, José Neves wrote:
> Ok, if I understood you correctly, I start to see where my logic is faulty. 
> Just to make sure that I got it right, taking the following example again:
>
> T-1
> INSERT LSN1-1000
> UPDATE LSN2-2000
> UPDATE LSN3-3000
> COMMIT  LSN4-4000
>
> T-2
> INSERT LSN1-500
> UPDATE LSN2-1500
> UPDATE LSN3-2500
> COMMIT  LSN4-5500
>
> Where data will arrive in this order:
>
> INSERT LSN1-500
> INSERT LSN1-1000
> UPDATE LSN2-1500
> UPDATE LSN2-2000
> UPDATE LSN3-2500
> UPDATE LSN3-3000
> COMMIT  LSN4-4000
> COMMIT  LSN4-5500

No, they won't arrive in that order. They will arive as

BEGIN
INSERT LSN1-1000
UPDATE LSN2-2000
UPDATE LSN3-3000
COMMIT  LSN4-4000
BEGIN
INSERT LSN1-500
UPDATE LSN2-1500
UPDATE LSN3-2500
COMMIT  LSN4-5500

Because T1 committed before T2. Changes are only streamed out at commit /
prepare transaction (*). Within a transaction, they however *will* be ordered
by LSN.

(*) Unless you use streaming mode, in which case it'll all be more
complicated, as you'll also receive changes for transactions that might still
abort.


> You are saying that the LSN3-3000 will never be missing, either the entire
> connection will fail at that point, or all should be received in the
> expected order (which is different from the "numeric order" of LSNs).

I'm not quite sure what you mean with the "different from the numeric order"
bit...


> If the connection is down, upon restart, I will receive the entire T-1
> transaction again (well, all example data again).

Yes, unless you already acknowledged receipt up to LSN4-4000 and/or are only
asking for newer transactions when reconnecting.


> In addition to that, if I commit LSN4-4000, even tho that LSN has a "bigger
> numeric value" than the ones representing INSERT and UPDATE events on T-2, I
> will be receiving the entire T-2 transaction again, as the LSN4-5500 is
> still uncommitted.

I don't quite know what you mean with "commit LSN4-4000" here.


> This makes sense to me, but just to be extra clear, I will never receive a
> transaction commit before receiving all other events for that transaction.

Correct.

Greetings,

Andres Freund


RE: CDC/ETL system on top of logical replication with pgoutput, custom client

2023-07-31 Thread José Neves
Hi Andres, thanks for your reply.

Ok, if I understood you correctly, I start to see where my logic is faulty. 
Just to make sure that I got it right, taking the following example again:

T-1
INSERT LSN1-1000
UPDATE LSN2-2000
UPDATE LSN3-3000
COMMIT  LSN4-4000

T-2
INSERT LSN1-500
UPDATE LSN2-1500
UPDATE LSN3-2500
COMMIT  LSN4-5500

Where data will arrive in this order:

INSERT LSN1-500
INSERT LSN1-1000
UPDATE LSN2-1500
UPDATE LSN2-2000
UPDATE LSN3-2500
UPDATE LSN3-3000
COMMIT  LSN4-4000
COMMIT  LSN4-5500

You are saying that the LSN3-3000 will never be missing, either the entire 
connection will fail at that point, or all should be received in the expected 
order (which is different from the "numeric order" of LSNs). If the connection 
is down, upon restart, I will receive the entire T-1 transaction again (well, 
all example data again).
In addition to that, if I commit LSN4-4000, even tho that LSN has a "bigger 
numeric value" than the ones representing INSERT and UPDATE events on T-2, I 
will be receiving the entire T-2 transaction again, as the LSN4-5500 is still 
uncommitted.
This makes sense to me, but just to be extra clear, I will never receive a 
transaction commit before receiving all other events for that transaction.
Are these statements correct?

>Are you using the 'streaming' mode / option to pgoutput?
No.

>Not sure what you mean with "unordered offsets"?
Ordered: EB53/E0D88188, EB53/E0D88189, EB53/E0D88190
Unordered: EB53/E0D88190, EB53/E0D88188, EB53/E0D88189

Extra question: When I get a begin message, I get a transaction starting at 
LSN-1000, and a transaction ending at LSN-2000. But as the example above shows, 
I can have data points from other transactions with LSNs in that interval. I 
have no way to identify to which transaction they belong, correct?

Thanks again. Regards,
José Neves

De: Andres Freund 
Enviado: 31 de julho de 2023 21:39
Para: José Neves 
Cc: Amit Kapila ; pgsql-hack...@postgresql.org 

Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom 
client

Hi,

On 2023-07-31 14:16:22 +, José Neves wrote:
> Hi Amit, thanks for the reply.
>
> In our worker (custom pg replication client), we care only about INSERT,
> UPDATE, and DELETE operations, which - sure - may be part of the issue.

That seems likely. Postgres streams out changes in commit order, not in order
of the changes having been made (that'd not work due to rollbacks etc). If you
just disregard transactions entirely, you'll get something bogus after
retries.

You don't need to store the details for each commit in the target system, just
up to which LSN you have processed *commit records*. E.g. if you have received
and safely stored up to commit 0/1000, you need to remember that.


Are you using the 'streaming' mode / option to pgoutput?


> 1. We have no way to match LSN operations with the respective commit, as
> they have unordered offsets.

Not sure what you mean with "unordered offsets"?


> Assuming that all of them were received in order, we would commit all data 
> with the commit message LSN4-4000 as other events would match the transaction 
> start and end LSN interval of it.

Logical decoding sends out changes in a deterministic order and you won't see
out of order data when using TCP (the entire connection can obviously fail
though).

Andres


RE: CDC/ETL system on top of logical replication with pgoutput, custom client

2023-07-31 Thread José Neves
Hi Euler, thank you for your reply.

Your output is exactly how mine doesn't look like, I don't have such an order - 
that is - not only under heavy load.
Conditions in which this occurs make it challenging to provide detailed 
information, and will also take a while to trigger.  I've sent a previous email 
explaining how my output looks like, from a previous debug.

I can gather more information if needs be, but I was interested in this bit:
> Instead, Postgres provides a replication progress mechanism [1] to do it.
It's not 100% clear to me how that would look like at the code level, can you 
provide a high-level algorithm on how such code would work? For reference, our 
implementation - to the bones - is very similar to this: 
https://adam-szpilewicz.pl/cdc-replication-from-postgresql-using-go-golang

Thanks for your help. Regards,
José Neves


De: Euler Taveira 
Enviado: 31 de julho de 2023 15:27
Para: José Neves ; pgsql-hackers 

Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom 
client

On Sat, Jul 29, 2023, at 8:07 PM, José Neves wrote:
I'm attempting to develop a CDC on top of Postgres, currently using 12, the 
last minor, with a custom client, and I'm running into issues with data loss 
caused by out-of-order logical replication messages.

Can you provide a test case to show this issue? Did you try in a newer version?

The problem is as follows: postgres streams A, B, D, G, K, I, P logical 
replication events, upon exit signal we stop consuming new events at LSN K, and 
we wait 30s for out-of-order events. Let's say that we only got A, (and K ofc) 
so in the following 30s, we get B, D, however, for whatever reason, G never 
arrived. As with pgoutput-based logical replication we have no way to calculate 
the next LSN, we have no idea that G was missing, so we assumed that it all 
arrived, committing K to postgres slot and shutdown. In the next run, our 
worker will start receiving data from K forward, and G is lost forever...
Meanwhile postgres moves forward with archiving and we can't go back to check 
if we lost anything. And even if we could, would be extremely inefficient.

Logical decoding provides the changes to output plugin at commit time. You
mentioned the logical replication events but didn't say which are part of the
same transaction. Let's say A, B, D and K are changes from the same transaction
and G, I and P are changes from another transaction. The first transaction will
be available when it processes K. The second transaction will be provided when
the logical decoding processes P.

You didn't say how your consumer is working. Are you sure your consumer doesn't
get the second transaction? If your consumer is advancing the replication slot
*after* receiving K (using pg_replication_slot_advance), it is doing it wrong.
Another common problem with consumer is that it uses
pg_logical_slot_get_changes() but *before* using the data it crashes; in this
case, the data is lost.

It is hard to say where the problem is if you didn't provide enough information
about the consumer logic and the WAL information (pg_waldump output) around the
time you detect the data loss.

In sum, the issue comes from the fact that postgres will stream events with 
unordered LSNs on high transactional systems, and that pgoutput doesn't have 
access to enough information to calculate the next or last LSN, so we have no 
way to check if we receive all the data that we are supposed to receive, 
risking committing an offset that we shouldn't as we didn't receive yet 
preceding data.

It seems very either to me that none of the open-source CDC projects that I 
looked into care about this. They always assume that the next LSN received 
is... well the next one, and commit that one, so upon restart, they are 
vulnerable to the same issue. So... either I'm missing something... or we have 
a generalized assumption causing data loss under certain conditions all over.

Let me illustrate the current behavior. Let's say there are 3 concurrent
transactions.

Session A
==

euler=# SELECT pg_create_logical_replication_slot('repslot1', 'wal2json');
pg_create_logical_replication_slot

(repslot1,0/369DF088)
(1 row)

euler=# create table foo (a int primary key);
CREATE TABLE
euler=# BEGIN;
BEGIN
euler=*# INSERT INTO foo (a) SELECT generate_series(1, 2);
INSERT 0 2

Session B
==

euler=# BEGIN;
BEGIN
euler=*# INSERT INTO foo (a) SELECT generate_series(11, 12);
INSERT 0 2

Session C
==

euler=# BEGIN;
BEGIN
euler=*# INSERT INTO foo (a) SELECT generate_series(21, 22);
INSERT 0 2

Session A
==

euler=*# INSERT INTO foo (a) VALUES(3);
INSERT 0 1

Session B
==

euler=*# INSERT INTO foo (a) VALUES(13);
INSERT 0 1

Session C
==

euler=*# INSERT INTO foo (a) VALUES(23);
INSERT 0 1
euler=*# COMMIT;
COMMIT

Session B
==

euler=*# COMMIT;
COMMIT

Session A
==

euler=*# COMMIT;
COM

RE: CDC/ETL system on top of logical replication with pgoutput, custom client

2023-07-31 Thread José Neves
Hi Amit, thanks for the reply.

In our worker (custom pg replication client), we care only about INSERT, 
UPDATE, and DELETE operations, which - sure - may be part of the issue.
I can only replicate this with production-level load, not easy to get a real 
example, but as I'm understanding the issue (and building upon your 
exposition), we are seeing the following:

T-1
INSERT LSN1-1000
UPDATE LSN2-2000
UPDATE LSN3-3000
COMMIT  LSN4-4000

T-2
INSERT LSN1-500
UPDATE LSN2-1500
UPDATE LSN3-2500
COMMIT  LSN4-5500

If we miss LSN3-3000, let's say, a bad network, and we already received all 
other LSNs, we will commit to Postgres LSN4-5500 before restarting. LSN3 3000 
will never be reattempted. And there are a couple of issues with this scenery:
1. We have no way to match LSN operations with the respective commit, as they 
have unordered offsets. Assuming that all of them were received in order, we 
would commit all data with the commit message LSN4-4000 as other events would 
match the transaction start and end LSN interval of it.
2. Still we have no way to verify that we got all data for a given transaction, 
we will never miss LSN3-3000 of the first transaction till we look at and 
analyze the resulting data.

So the question: how can we prevent our worker from committing LSN4-5500 
without receiving LSN3-3000? Do we even have enough information out of pgoutput 
to do that?
PS.: when I say bad network, my suspicion is that this situation may be caused 
by network saturation on high QPS periods. Data will still arrive eventually 
but by that time our worker is no longer listening.

Thanks again. Regards,
José Neves





De: Amit Kapila 
Enviado: 31 de julho de 2023 14:31
Para: José Neves 
Cc: pgsql-hack...@postgresql.org 
Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom 
client

On Mon, Jul 31, 2023 at 3:06 PM José Neves  wrote:
>
> Hi there, hope to find you well.
>
> I'm attempting to develop a CDC on top of Postgres, currently using 12, the 
> last minor, with a custom client, and I'm running into issues with data loss 
> caused by out-of-order logical replication messages.
>
> The problem is as follows: postgres streams A, B, D, G, K, I, P logical 
> replication events, upon exit signal we stop consuming new events at LSN K, 
> and we wait 30s for out-of-order events. Let's say that we only got A, (and K 
> ofc) so in the following 30s, we get B, D, however, for whatever reason, G 
> never arrived. As with pgoutput-based logical replication we have no way to 
> calculate the next LSN, we have no idea that G was missing, so we assumed 
> that it all arrived, committing K to postgres slot and shutdown. In the next 
> run, our worker will start receiving data from K forward, and G is lost 
> forever...
> Meanwhile postgres moves forward with archiving and we can't go back to check 
> if we lost anything. And even if we could, would be extremely inefficient.
>
> In sum, the issue comes from the fact that postgres will stream events with 
> unordered LSNs on high transactional systems, and that pgoutput doesn't have 
> access to enough information to calculate the next or last LSN, so we have no 
> way to check if we receive all the data that we are supposed to receive, 
> risking committing an offset that we shouldn't as we didn't receive yet 
> preceding data.
>

As per my understanding, we stream the data in the commit LSN order
and for a particular transaction, all the changes are per their LSN
order. Now, it is possible that for a parallel transaction, we send
some changes from a prior LSN after sending the commit of another
transaction. Say we have changes as follows:

T-1
change1 LSN1-1000
change2 LSN2- 2000
commit   LSN3- 3000

T-2
change1 LSN1-500
change2 LSN2-1500
commit   LSN3-4000

In such a case, all the changes including the commit of T-1 are sent
and then all the changes including the commit of T-2 are sent. So, one
can say that some of the changes from T-2 from prior LSN arrived after
T-1's commit but that shouldn't be a problem because if restart
happens after we received partial T-2, we should receive the entire
T-2.

It is possible that you are seeing something else but if so then
please try to share a more concrete example.

--
With Regards,
Amit Kapila.


CDC/ETL system on top of logical replication with pgoutput, custom client

2023-07-31 Thread José Neves
Hi there, hope to find you well.

I'm attempting to develop a CDC on top of Postgres, currently using 12, the 
last minor, with a custom client, and I'm running into issues with data loss 
caused by out-of-order logical replication messages.

The problem is as follows: postgres streams A, B, D, G, K, I, P logical 
replication events, upon exit signal we stop consuming new events at LSN K, and 
we wait 30s for out-of-order events. Let's say that we only got A, (and K ofc) 
so in the following 30s, we get B, D, however, for whatever reason, G never 
arrived. As with pgoutput-based logical replication we have no way to calculate 
the next LSN, we have no idea that G was missing, so we assumed that it all 
arrived, committing K to postgres slot and shutdown. In the next run, our 
worker will start receiving data from K forward, and G is lost forever...
Meanwhile postgres moves forward with archiving and we can't go back to check 
if we lost anything. And even if we could, would be extremely inefficient.

In sum, the issue comes from the fact that postgres will stream events with 
unordered LSNs on high transactional systems, and that pgoutput doesn't have 
access to enough information to calculate the next or last LSN, so we have no 
way to check if we receive all the data that we are supposed to receive, 
risking committing an offset that we shouldn't as we didn't receive yet 
preceding data.

It seems very either to me that none of the open-source CDC projects that I 
looked into care about this. They always assume that the next LSN received 
is... well the next one, and commit that one, so upon restart, they are 
vulnerable to the same issue. So... either I'm missing something... or we have 
a generalized assumption causing data loss under certain conditions all over.

Am I missing any postgres mechanism that will allow me to at least detect that 
I'm missing data?

Thanks in advance for any clues on how to deal with this. It has been driving 
me nuts.

  *

Regards,
José Neves