On Wed, Aug 9, 2023 at 8:28 AM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > On Thursday, August 3, 2023 7:30 PM Melih Mutlu <m.melihmu...@gmail.com> > wrote: > > > Right. I attached the v26 as you asked. > > Thanks for posting the patches. > > While reviewing the patch, I noticed one rare case that it's possible that > there > are two table sync worker for the same table in the same time. > > The patch relies on LogicalRepWorkerLock to prevent concurrent access, but the > apply worker will start a new worker after releasing the lock. So, at the > point[1] > where the lock is released and the new table sync worker has not been started, > it seems possible that another old table sync worker will be reused for the > same table. > > /* Now safe to release the LWLock */ > LWLockRelease(LogicalRepWorkerLock); > *[1] > /* > * If there are free sync worker slot(s), > start a new sync > * worker for the table. > */ > if (nsyncworkers < > max_sync_workers_per_subscription) > ... > > logicalrep_worker_launch(MyLogicalRepWorker->dbid, >
Yeah, this is a problem. I think one idea to solve this is by extending the lock duration till we launch the tablesync worker but we should also consider changing this locking scheme such that there is a better way to indicate that for a particular rel, tablesync is in progress. Currently, the code in TablesyncWorkerMain() also acquires the lock in exclusive mode even though the tablesync for a rel is in progress which I guess could easily heart us for larger values of max_logical_replication_workers. So, that could be another motivation to think for a different locking scheme. -- With Regards, Amit Kapila.