Hi Jian,
Haritabh and I have been reviewing this patch. The core optimization
is sound — the BEFORE ROW UPDATE trigger guard, the generated column
handling via expand_generated_columns_in_expr + ExecGetAllUpdatedCols,
and the whole-row reference check are all correct.
We independently found the same MERGE issue you fixed in v6, where
ri_CheckConstraintExprs is shared between INSERT and UPDATE actions on
the same ResultRelInfo. Nice catch on the fix.
That said, I think v6's approach of disabling the optimization entirely
for MERGE is more conservative than necessary. A MERGE with only WHEN
MATCHED THEN UPDATE (no INSERT action) would still benefit from skipping
unaffected constraints, but v6 disables it for all MERGE operations.
An alternative that preserves the optimization for MERGE UPDATE actions
would be to follow the existing ri_GeneratedExprsI/ri_GeneratedExprsU
pattern — split into two separate cached arrays:
```
/* array of expr states for checking check constraints */
ExprState **ri_CheckConstraintExprsI; /* for INSERT */
ExprState **ri_CheckConstraintExprsU; /* for UPDATE */
```
Then in ExecRelCheck, select the appropriate array based on cmdtype:
```
ExprState **checkExprs;
checkExprs = (cmdtype == CMD_UPDATE)
? resultRelInfo->ri_CheckConstraintExprsU
: resultRelInfo->ri_CheckConstraintExprsI;
if (checkExprs == NULL)
{
Bitmapset *updatedCols = NULL;
if (cmdtype == CMD_UPDATE &&
!(rel->trigdesc && rel->trigdesc->trig_update_before_row))
updatedCols = ExecGetAllUpdatedCols(resultRelInfo, estate);
/* ... alloc and populate checkExprs ... */
if (cmdtype == CMD_UPDATE)
resultRelInfo->ri_CheckConstraintExprsU = checkExprs;
else
resultRelInfo->ri_CheckConstraintExprsI = checkExprs;
}
```
This way INSERT always compiles and checks all constraints, UPDATE gets
the skip optimization even during MERGE, and both can safely coexist on
the same ResultRelInfo. The lazy-init cost for the second array only
applies when both code paths are actually taken, which matches the
generated-columns precedent. It also makes INSERT ON CONFLICT DO UPDATE
structurally safe rather than relying on INSERT always running first.
A few other items on v6:
1. The MERGE test case only tests MERGE with a single UPDATE action and
verifies the optimization is disabled. It doesn't test the actual
dangerous scenario — MERGE with both INSERT and UPDATE actions where
the INSERT row violates a constraint. Without that, a future refactor
could reintroduce the original bug without any test failing. Something
like:
```
CREATE TABLE merge_cc (id int PRIMARY KEY, a int, b int,
CONSTRAINT cc CHECK (a > 0));
INSERT INTO merge_cc VALUES (1, 10, 10);
MERGE INTO merge_cc t
USING (VALUES (1, 99, 20), (2, -5, 30)) AS s(id, a, b)
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET b = s.b
WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.a, s.b);
-- must ERROR on cc (a = -5), not silently succeed
```
2. The cross-partition update test comment says "cannot be skipped", but
cross-partition UPDATE goes through ExecCrossPartitionUpdate which
does DELETE + INSERT. The constraint check happens via ExecInsert with
CMD_INSERT on the destination partition, so the optimization was never
applicable. The test doesn't exercise anything specific to this patch.
Cheers,
Florin
--
* Florin Irion *
* https://www.enterprisedb.com <https://www.enterprisedb.com/>*