Dear Hou,

> Here is the V5 patch set which addressed above comments.

Thanks for updating the patch! While reviewing yours, I found a corner case that
a recently deleted tuple cannot be detected when index scan is chosen.
This can happen when indices are re-built during the replication.
Unfortunately, I don't have any solutions for it.

Found issue
========

When indices are built with the CONCURRENTLY option, a standard MVCC snapshot
is used to list up the tuples of the table, which means the new index ignores
recently deleted tuples.

This can cause the update_deleted to be wrongly detected as update_missing.
Assuming that we have a bidirectional cluster like case 1 [1] (id is a primary 
key),
and REINDEX CONCURRENTLY happens after executing DELETE but before receiving 
the UPDATE.
A primary key of t will be re-built by the REINDEX command but the dead tuples 
by
DELETE will be missed because it is invisible from the transaction.
Then, the apply worker receives the UPDATE and recognizes the target tuple is 
removed.
It scans with snapshotany to find the deleted tuple via the index, but it fails.
This event is reported as update_missing.

Reproduce steps
===========

This can be reproduced with v5 patch set:

1. constructed a 2-way replication by running attached.
2. stopped an apply worker on node2
3. executed `UPDATE tab SET a = 2;` on node1.
4. executed `DELETE FROM tab;` on node 2
5. executed `REINDEX TABLE CONCURRENTLY tab;` on node2
6. resumed the stopped apply worker
7. the worker should detect update_deleted, but it detected update_missing

```
LOG:  conflict detected on relation "public.tab": conflict=update_missing
DETAIL:  Could not find the row to be updated.
        Remote tuple (2); replica identity (a)=(1).
``

[1]: 
https://www.postgresql.org/message-id/OS0PR01MB5716A78EE3D6F2F4754E1209946C2%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment: test_1025.sh
Description: test_1025.sh

Reply via email to