On Thu, Jun 4, 2026 at 12:55 PM Dilip Kumar <[email protected]> wrote:
>
> On Wed, Jun 3, 2026 at 3:25 PM Nisha Moond <[email protected]> wrote:
> >
> >
> > 2) As ProcessPendingConflictLogTuple() may perform a heap_insert, it
> > could fail for various reasons, such as table access errors, write
> > failures, WAL write failures, or ownership-related issues.
> > I looked into the error handling when CLT insert fails:
> >
> > 2a) When disable_on_error = false
> >
> > /* Try to allocate a worker for the streaming transaction. */
> > @@ -5666,6 +5674,7 @@ start_apply(XLogRecPtr origin_startpos)
> >   */
> >   AbortOutOfAnyTransaction();
> >   pgstat_report_subscription_error(MySubscription->oid);
> > + ProcessPendingConflictLogTuple();
> >
> >   PG_RE_THROW();
> >   }
> >
> > If ProcessPendingConflictLogTuple() fails, PG_RE_THROW() is never
> > reached, so the original conflict error is lost and only the secondary
> > CLT-related error is reported, e.g.:
> >
> >   ERROR:  could not open relation with OID 16421
> >   LOG:  background worker "logical replication apply worker" exited
> > with exit code 1
> >
> > If the failure persists, the apply worker will keep retrying and users
> > may never see the actual conflict error. The same concern applies to
> > the parallel worker case too.
> >
> > Is there a clean way to ensure the original apply/conflict error is
> > still logged even if ProcessPendingConflictLogTuple() itself fails?
>
> Does this happening when destination is table or with all as well?
>

It can happen with both conflict_log_destination = 'table' and 'all',
since for conflicts such as insert_exists (ERROR), we call
ProcessPendingConflictLogTuple() to insert the conflict into the CLT
in both cases.

I used the following steps to reproduce the error:
 - Attach gdb to the apply worker and set a breakpoint in
ProcessPendingConflictLogTuple().
 - Trigger an insert_exists conflict.
 - After execution passes StartTransactionCommand(), change
conflict_log_destination to log from another session.
 - The conflict log table is dropped, causing table_open() to fail and
resulting in the above error.

I think Shveta's suggestion #4 in [1] should address this issue.

> > ~~
> >
> > 2c) When "conflict_log_destination = table" the log reports:
> >
> >   ERROR:  conflict detected on relation "public.t1": conflict=insert_exists
> >   DETAIL:  Conflict details are logged to the conflict log table:
> > pg_conflict_log_16403
> >
> > However, the subsequent CLT write can still fail:
> >   ...
> >   ERROR:  could not open relation with OID 16426
> >   LOG:  background worker "logical replication apply worker" exited
> > with exit code 1
> >
> > Since the CLT write has not yet occurred when this message is emitted,
> > would it make sense to change it to:
> >   "Conflict details will be logged to the conflict log table: ..."
> >
> > to avoid implying that the logging has already succeeded?
>
> IMHO "Conflict details are logged to the conflict log table" doesn't
> mean a conflict has been logged to the table; what I interpret from
> this message is that conflict logging to the table is enabled.
>

I agree it can be read either way, but I felt the intent might be
ambiguous from a user's perspective, so I raised it.
That said, I'm fine with it if others also don't see any ambiguity.

> > ~~~
> >
> > 3) The v45-002 commit message still refers to the old conflict log
> > table name, pg_conflict.pg_conflict_log_for_subid_16396. It should be
> > updated to pg_conflict_log_16396.
> > ~~~
> >
> > 4) 035_conflicts.pl: I see that v45-002 has corrected the CLT name  in
> > below test -
> > +# Verify the contents of the Conflict Log Table (CLT)
> > +# This section ensures that the CLT contains the expected
> > +# type and specific key data.
> > +my $subid = $node_subscriber->safe_psql('postgres',
> > + "SELECT oid FROM pg_subscription WHERE subname = 'sub_tab';");
> > +my $clt = "pg_conflict.pg_conflict_log_$subid";
> > +
> > +# Wait for the conflict to be logged in the CLT
> > +my $log_check = $node_subscriber->poll_query_until(
> > + 'postgres',
> > + "SELECT count(*) > 0 FROM $clt;"
> > +);
> >
> > I wonder why this wasn't caught in the v44 tests. It seems the check
> > below is silently broken because poll_query_until() keeps retrying
> > until timeout when the table does not exist.
> > Should we replace it with an ok() test instead? something like -
> >
> > # Wait for the conflict to be logged in the CLT
> > ok($node_subscriber->poll_query_until(
> >     'postgres',
> >     "SELECT count(*) > 0 FROM $clt;"),
> >     'conflict appeared in CLT after insert');
>
> Why would CLT not exist, IIUC this is positive test where we are
> trying to validate the conflict data from CLT, no?
>

Ideally, the CLT should exist. I initially assumed the tests were
clean in v44 despite the incorrect table name being used.
I re-ran the tests and confirmed that poll_query_until() does report a
failure if it times out because the CLT is unavailable. Sorry for the
noise.

[1] 
https://www.postgresql.org/message-id/CAJpy0uAA_XszCAcoBuCUM3VobD39DbMDwCPUT%2BXW7wFfE%2B_E8w%40mail.gmail.com

--
Thanks,
Nisha


Reply via email to