Hi,

I am continuing to test REPACK, and I found another issue.

In check_concurrent_repack_requirements(), if a table has no replica identity 
index, the code falls back to using the primary key if one exists. The problem 
is that a deferrable primary key cannot be used for this purpose. WAL 
generation does not consider a deferrable primary key to be a replica identity, 
so concurrent mode may not receive enough old tuple information to replay 
concurrent changes.

I tested this with the following procedure.

1 - Create a table
```
create table t (id int,  v text, primary key (id) deferrable initially 
deferred);
insert into t values (1, 'a');
```

2 -  Attach a debugger to session 1's backend process. I used vscode. Add a 
breakpoint at the first process_concurrent_changes() call. This blocks the 
REPACK process and gives session 2 time to issue a DELETE.

3 - In session 1, issue a repack, it will stop at the breakpoint
```
repack (concurrently) t;
```

4 - In session 2
```
delete from t where id=1;
```

5 - Detach session 1 from the debugger, so that repack continues and tries to 
re-apply the delete from session 2.

6 - repack fails with:
```
evantest=# repack (concurrently) t;
ERROR:  incomplete delete info
CONTEXT:  slot "repack_96468", output plugin "pgrepack", in the change 
callback, associated LSN 0/2A5717F0
REPACK decoding worker
```

The error comes from this code in pgrepack.c:
```
                case REORDER_BUFFER_CHANGE_DELETE:
                        {
                                HeapTuple       oldtuple;

                                oldtuple = change->data.tp.oldtuple;

                                if (oldtuple == NULL)
                                        elog(ERROR, "incomplete delete info");

                                repack_store_change(ctx, relation, 
CHANGE_DELETE, oldtuple);
                        }
                        break;
```

The root cause is that repack.c assumes rel->rd_pkindex is usable as an 
identity index, but logical decoding does not treat a deferrable primary key as 
replica identity. As a result, the decoding worker may not get the old tuple 
needed to re-apply the delete.

To fix the problem, I think we should just not fall back to deferrable primary 
key in the first place. See the attached patch.

With this patch, repack will quickly for the test:
```
evantest=# repack (concurrently) t;
ERROR:  cannot process relation "t"
HINT:  Relation "t" has a deferrable primary key.
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/




Attachment: v1-0001-Reject-deferrable-primary-key-fallback-in-REPACK-.patch
Description: Binary data

Reply via email to