On Mon, Dec 1, 2025 at 2:04 PM Dilip Kumar <[email protected]> wrote: > > On Mon, Dec 1, 2025 at 1:57 PM shveta malik <[email protected]> wrote: > > > > On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <[email protected]> wrote: > > > > > > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <[email protected]> > > > wrote: > > > > > > > > > Few observations related to publication. > > > > > ------------------------------ > > > > > > Thanks Shveta, for testing and sharing your thoughts. IMHO for > > > conflict log tables it should be good enough if we restrict it when > > > ALL TABLE options are used, I don't think we need to put extra effort > > > to completely restrict it even if users want to explicitly list it > > > into the publication. > > > > > > > > > > > > > (In the below comments, clt/CLT implies Conflict Log Table) > > > > > > > > > > 1) > > > > > 'select pg_relation_is_publishable(clt)' returns true for > > > > > conflict-log table. > > > > > > This function is used while publishing every single change and I don't > > > think we want to add a cost to check each subscription to identify > > > whether the table is listed as CLT. > > > > > > > > 2) > > > > > '\d+ clt' shows all-tables publication name. I feel we should not > > > > > show that for clt. > > > > > > I think we should fix this. > > > > > > > > 3) > > > > > I am able to create a publication for clt table, should it be allowed? > > > > > > I believe we should not do any specific handling to restrict this but > > > I am open for the opinions. > > > > > > > > create subscription sub1 connection '...' publication pub1 > > > > > WITH(conflict_log_table='clt'); > > > > > create publication pub3 for table clt; > > > > > > > > > > 4) > > > > > Is there a reason we have not made '!IsConflictHistoryRelid' check as > > > > > part of is_publishable_class() itself? If we do so, other code-logics > > > > > will also get clt as non-publishable always (and will solve a few of > > > > > the above issues I think). IIUC, there is no place where we want to > > > > > mark CLT as publishable or is there any? > > > > > > IMHO the main reason is performance. > > > > > > > > 5) Also, I feel we can add some documentation now to help others to > > > > > understand/review the patch better without going through the long > > > > > thread. > > > > > > Make sense, I will do that in the next version. > > > > > > > > > > > > > Few observations related to conflict-logging: > > > > > ------------------------------ > > > > > 1) > > > > > I found that for the conflicts which ultimately result in Error, we do > > > > > not insert any conflict-record in clt. > > > > > > > > > > a) > > > > > Example: insert_exists, update_Exists > > > > > create table tab1 (i int primary key, j int); > > > > > sub: insert into tab1 values(30,10); > > > > > pub: insert into tab1 values(30,10); > > > > > ERROR: conflict detected on relation "public.tab1": > > > > > conflict=insert_exists > > > > > No record in clt. > > > > > > > > > > sub: > > > > > <some pre-data needed> > > > > > update tab1 set i=40 where i = 30; > > > > > pub: update tab1 set i=40 where i = 20; > > > > > ERROR: conflict detected on relation "public.tab1": > > > > > conflict=update_exists > > > > > No record in clt. > > > > > > Yeah that interesting need to put thought on how to commit this record > > > when an outer transaction is aborted as we do not have autonomous > > > transactions which are generally used for this kind of logging. But > > > we can explore more options like inserting into conflict log tables > > > outside the outer transaction. > > > > > > > > b) > > > > > Another question related to this is, since these conflicts (which > > > > > results in error) keep on happening until user resolves these or skips > > > > > these or 'disable_on_error' is set. Then are we going to insert these > > > > > multiple times? We do count these in 'confl_insert_exists' and > > > > > 'confl_update_exists' everytime, so it makes sense to log those each > > > > > time in clt as well. Thoughts? > > > > > > I think it make sense to insert every time we see the conflict, but it > > > would be good to have opinion from others as well. > > > > Since there is a concern that multiple rows for > > multiple_unique_conflicts can cause data-bloat, it made me rethink > > that this is actually more prone to causing data-bloat if it is not > > resolved on time, as it seems a far more frequent scenario. So shall > > we keep inserting the record or insert it once and avoid inserting it > > again based on lsn? Thoughts? > > I agree, this is the real problem related to bloat so maybe we can see > if the same tuple exists we can avoid inserting it again, although I > haven't put thought on how to we distinguish between the new conflict > on the same row vs the same conflict being inserted multiple times due > to worker restart. >
If there is consensus on this approach, IMO, it appears safe to rely on 'remote_origin' and 'remote_commit_lsn' as the comparison keys for the given 'conflict_type' before we insert a new record. thanks Shveta
