On Wed, Jul 12, 2017 at 11:19 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Sat, Jul 8, 2017 at 5:19 AM, Petr Jelinek
> <petr.jeli...@2ndquadrant.com> wrote:
>> Hi,
>>
>> I have done some review of subscription handling (well self-review) and
>> here is the result of that (It's slightly improved version from another
>> thread [1]).
>
> Thank you for working on this and patches!
>
>> I split it into several patches:
>>
>> 0001 - Makes subscription worker exit nicely when there is subscription
>> missing (ie was removed while it was starting) and also makes disabled
>> message look same as the message when disabled state was detected while
>> worker is running. This is mostly just nicer behavior for user (ie no
>> unexpected errors in log when you just disabled the subscription).
>>
>> 0002 - This is bugfix - the sync worker should exit when waiting for
>> apply and apply dies otherwise there is possibility of not being
>> correctly synchronized.
>>
>> 0003 - Splits the schizophrenic SetSubscriptionRelState function into
>> two which don't try to do broken upsert and report proper errors instead.
>>
>> 0004 - Solves the issue which Masahiko Sawada reported [2] about ALTER
>> SUBSCRIPTION handling of workers not being transactional - we now do
>> killing of workers transactionally (and we do the same when possible in
>> DROP SUBSCRIPTION).
>>
>> 0005 - Bugfix to allow syscache access to subscription without being
>> connected to database - this should work since subscription is pinned
>> catalog, it wasn't caught because nothing in core is using it (yet).
>>
>> 0006 - Makes the locking of subscriptions more granular (this depends on
>> 0005). This removes the ugly AccessExclusiveLock on the pg_subscription
>> catalog from DROP SUBSCRIPTION and also solves some potential race
>> conditions between launcher, ALTER SUBSCRIPTION and
>> process_syncing_tables_for_apply(). Also simplifies memory handling in
>> launcher as well as logicalrep_worker_stop() function. This basically
>> makes subscriptions behave like every other object in the database in
>> terms of locking.
>>
>> Only the 0002, 0004 and 0005 are actual bug fixes, but I'd still like to
>> get it all into PG10 as especially the locking now behaves really
>> differently than everything else and that does not seem like a good idea.
>>
>
> I'm now planing to review 0002, 0004 and 0005 patches first as they
> are bug fixes.

I've reviewed 0002, 0004 and 0005 patches briefly. Here are some comments.

--
0002-Exit-in-sync-worker-if-relation-was-removed-during-s.patch

As Petr mentioned, if the table subscription is removed before the
table sync worker gets the subscription relation entry,
GetSubscriptionRelState could returns SUBREL_STATE_UNKNOWN and this
issue happens.
Actually we now can handle this case properly by commit
8dc7c338129d22a52d4afcf2f83a73041119efda. So this seems to be an
improvement and not be a release blocker. Therefore for this patch, I
think it's better to do ereport in the
switch(MyLogicalRepWorker->relstate) block.

BTW, since missing_ok of GetSubscriptionRelState() is always set to
true I think that we can remove it.
Also because the reason I mention at later part this fix might not be necessary.

--
0004-Only-kill-sync-workers-at-commit-time-in-SUBSCRIPTIO.patch

+       /*
+        * If we are dropping slot, stop all the subscription workers
immediately
+        * so that the slot is accessible, otherwise just shedule the
stop at the
+        * end of the transaction.
+        *
+        * New workers won't be started because we hold exclusive lock on the
+        * subscription till the end of transaction.
+        */
+       LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+       subworkers = logicalrep_sub_workers_find(subid, false);
+       LWLockRelease(LogicalRepWorkerLock);
+       foreach (lc, subworkers)
+       {
+               LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+               if (sub->slotname)
+                       logicalrep_worker_stop(w->subid, w->relid);
+               else
+                       logicalrep_worker_stop_at_commit(w->subid, w->relid);
+       }
+       list_free(subworkers);

I think if we kill the all workers including table sync workers then
the fix by 0002 patch is not necessary actually, because the table
sync worker will not see that the subscription relation state has been
removed.
Also, logicalrep_sub_workers_find() function is defined in 0006 patch
but it would be better to move it to 0004 patch.

--
0005-Allow-syscache-access-to-subscriptions-in-database-l.patch

Do we need to update the comment at the top of IndexScanOK()?

To summary, I think we now have only one issue; ALTER SUBSCRIPTION is
not transactional, 0004 patch is addressing this issue . 0002 patch
seems an improvement patch to me, and it might be resolved by 0004
patch. 0005 patch is required by 0006 patch which is an improvement
patch. Am I missing something?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to