On Tue, Mar 31, 2026 at 2:38 AM Amit Kapila <[email protected]> wrote:
> On Tue, Mar 31, 2026 at 12:02 AM Roberto Mello <[email protected]> > wrote: > > > > The subscriber receives WAL from both publications for the same table. > > Which column set should it apply? It cannot apply both as they disagree > on > > whether email is included. This is exactly the situation the "cannot use > > different column lists" check was designed to prevent. > > > > I think we need to consider the cases where current permissive > behavior helps. For example, consider the cases where the schema is > I'm sorry but you completely dismissed the points that I brought up, addressing none of the concerns, and went on a tangent about "permissive behavior". Permissive behavior is fine and dandy until it bites you in the @$$, and there's a line where permissive is wrong, bad, or dangerous. I pointed out why I think the current behavior is wrong, potentially dangerous. MySQL is king of permissive behavior but I don't think that's much to brag about. PostgreSQL historically favored correctness over permissiveness. The deeper issue is that the "permissive" behavior isn't actually permissive: it's silently inconsistent. The two publications have different replication contracts stored in the catalog (prattrs = NULL vs prattrs = {1,2}), and pgoutput.c honors that difference at WAL decode time. The old code just hid the difference from the check that was supposed to detect it. static. Now let us consider another case where a user would actually > need to define such publication combinations for a subscription. One > of the more common ways this conflict happens is accidental: User has > pub_1 for TABLE t (col1, col2). User later decides to replicate the > entire database to a new subscription and creates pub_2 FOR ALL > TABLES. User adds pub_2 to the subscription. Currently, the user can > add pub_2 and then later realize they need to drop pub_1 to clean > How would they "realize" that if the view is showing them the exact same data for the two publications? > things up. If ALTER SUBSCRIPTION blocks this, the user is stuck in a > Catch-22 where they can't add the "All Tables" publication because a > single specific table has a column list. They would have to drop the > specific publication first, potentially losing replication coverage > for that table during the gap. > I don't think it actually traps the user. The sequence that works today and would continue to work: -- Starting from sub has pub_1 with explicit column list ALTER PUBLICATION pub_1 DROP TABLE t; -- on publisher ALTER SUBSCRIPTION sub REFRESH PUBLICATION; -- on subscriber -- Now add pub_2 FOR ALL TABLES ALTER SUBSCRIPTION sub ADD PUBLICATION pub_2; Or even simpler, just drop the column list from pub_1 before adding pub_2: ALTER PUBLICATION pub_1 SET TABLE t; -- removes the column list ALTER SUBSCRIPTION sub ADD PUBLICATION pub_2; There's no gap in replication coverage in either case because the subscription is still active throughout. The table remains subscribed via pub_1 until pub_2 takes over. For the static schema case: if the schema never changes, the two publications do produce identical column sets and there's no practical problem. But the publication system doesn't know the schema is static. It has to handle the general case. And the general case is that NULL means "all current and future columns" while an explicit list means "exactly these, nothing more." Those are different things (contracts) and being "permissive" with interpretation of what those contracts mean in my head versus someone else's could be dangerous. > Now, considering the other cases where replication later ERRORs out > (like the one you mentioned) when we allow such combinations, we can > give a WARNING at the time subscriber DDLs when they lead to such > combinations. > A WARNING at ALTER SUBSCRIPTION time would be better than silence, but it still allows the subscriber into a state where a future ALTER TABLE ADD COLUMN on the publisher will cause replication to break. At that point the user gets the ERROR anyway, but now it's during replication rather than at setup time, which is harder to diagnose and recover from. Erroring early, when the user is actively making the change and can fix it, is the safer default. That said, if the consensus is that the migration path is too disruptive, a middle ground could be to ERROR by default but provide a subscription-level option to downgrade it to WARNING. For the reasons I pointed out, I think the right call is to keep the ERROR as the default though, since it catches a real inconsistency. Roberto Mello Snowflake
