On Fri, Apr 14, 2017 at 7:09 AM, Petr Jelinek <petr.jeli...@2ndquadrant.com> wrote: > On 13/04/17 12:23, Masahiko Sawada wrote: >> On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada <sawada.m...@gmail.com> >> wrote: >>> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut >>> <peter.eisentr...@2ndquadrant.com> wrote: >>>> On 4/12/17 00:48, Masahiko Sawada wrote: >>>>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut >>>>>> Perhaps instead of a global last_start_time, we store a per relation >>>>>> last_start_time in SubscriptionRelState? >>>>> >>>>> I was thinking the same. But a problem is that the list of >>>>> SubscriptionRelState is refreshed whenever the syncing table state >>>>> becomes invalid (table_state_valid = false). I guess we need to >>>>> improve these logic including GetSubscriptionNotReadyRelations(). >>>> >>>> The table states are invalidated on a syscache callback from >>>> pg_subscription_rel, which happens roughly speaking when a table >>>> finishes the initial sync. So if we're worried about failing tablesync >>>> workers relaunching to quickly, this would only be a problem if a >>>> tablesync of another table finishes right in that restart window. That >>>> doesn't seem a terrible issue to me. >>>> >>> >>> I think the table states are invalidated whenever the table sync >>> worker starts, because the table sync worker updates its status of >>> pg_subscription_rel and commits it before starting actual copy. So we >>> cannot rely on that. I thought we can store last_start_time into >>> pg_subscription_rel but it might be overkill. I'm now thinking to >>> change GetSubscriptionNotReadyRealtions so that last_start_time in >>> SubscriptionRelState is taken over to new list. >>> >> >> Attached the latest patch. It didn't actually necessary to change >> GetSubscriptionNotReadyRelations. I just changed the logic refreshing >> the sync table state list. >> Please give me feedback. >> > > Hmm this might work. Although I was actually wondering if we could store > the last start timestamp in the worker shared memory and do some magic > with that (ie, not clearing subid and relid and try to then do rate > limiting based on subid+relid+timestamp stored in shmem). That would > then work same way for the main apply workers as well. It would have the > disadvantage that if some tables were consistently failing, no other > tables could get synchronized as the amount of workers is limited.
Hmm I guess that it's not a good design that a table sync worker and a apply worker for a relation takes sole possession of a worker slot until it successes. It would bother each other. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers