On Wed, Jul 24, 2024 at 12:25 PM Hayato Kuroda (Fujitsu)
<kuroda.hay...@fujitsu.com> wrote:
>
> While creating a patch which allows ALTER SUBSCRIPTION SET (two_phase) [1],
> we found some issues related with logical replication and two_phase. I think 
> this
> can happen not only HEAD but PG14+, but for now I shared patches for HEAD.
>
> Issue #1
>
> When handling a PREPARE message, the subscriber mistook the wrong lsn position
> (the end position of the last commit) as the end position of the current 
> prepare.
> This can be fixed by adding a new global variable to record the end position 
> of
> the last prepare. 0001 patch fixes the issue.
>
> Issue #2
>
> When the subscriber enables two-phase commit but doesn't set 
> max_prepared_transaction >0
> and a transaction is prepared on the publisher, the apply worker reports an 
> ERROR
> on the subscriber. After that, the prepared transaction is not replayed, which
> means it's lost forever. Attached script can emulate the situation.
>
> --
> ERROR:  prepared transactions are disabled
> HINT:  Set "max_prepared_transactions" to a nonzero value.
> --
>
> The reason is that we advanced the origin progress when aborting the
> transaction as well (RecordTransactionAbort->replorigin_session_advance). So,
> after setting replorigin_session_origin_lsn, if any ERROR happens when 
> preparing
> the transaction, the transaction aborts which incorrectly advances the origin 
> lsn.
>
> An easiest fix is to reset session replication origin before calling the
> RecordTransactionAbort(). I think this can happen when 1) 
> LogicalRepApplyLoop()
> raises an ERROR or 2) apply worker exits. 0002 patch fixes the issue.
>

Can we start a separate thread to issue 2? I understand that this one
is also related to two_phase but since both are different issues it is
better to discuss in separate threads. This will also help us refer to
the discussion in future if required.

BTW, why did the 0002 patch change the below code:
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -164,7 +164,8 @@ typedef struct ParallelApplyWorkerShared

  /*
  * XactLastCommitEnd or XactLastPrepareEnd from the parallel apply worker.
- * This is required by the leader worker so it can update the lsn_mappings.
+ * This is required by the leader worker so it can update the
+ * lsn_mappings.
  */
  XLogRecPtr last_commit_end;

-- 
With Regards,
Amit Kapila.


Reply via email to