On Fri, Apr 14, 2017 at 9:14 PM, Petr Jelinek
<petr.jeli...@2ndquadrant.com> wrote:
> On 14/04/17 12:18, Masahiko Sawada wrote:
>> On Fri, Apr 14, 2017 at 7:09 AM, Petr Jelinek
>> <petr.jeli...@2ndquadrant.com> wrote:
>>> On 13/04/17 12:23, Masahiko Sawada wrote:
>>>> On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada <sawada.m...@gmail.com> 
>>>> wrote:
>>>>> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
>>>>> <peter.eisentr...@2ndquadrant.com> wrote:
>>>>>> On 4/12/17 00:48, Masahiko Sawada wrote:
>>>>>>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
>>>>>>>> Perhaps instead of a global last_start_time, we store a per relation
>>>>>>>> last_start_time in SubscriptionRelState?
>>>>>>> I was thinking the same. But a problem is that the list of
>>>>>>> SubscriptionRelState is refreshed whenever the syncing table state
>>>>>>> becomes invalid (table_state_valid = false). I guess we need to
>>>>>>> improve these logic including GetSubscriptionNotReadyRelations().
>>>>>> The table states are invalidated on a syscache callback from
>>>>>> pg_subscription_rel, which happens roughly speaking when a table
>>>>>> finishes the initial sync.  So if we're worried about failing tablesync
>>>>>> workers relaunching to quickly, this would only be a problem if a
>>>>>> tablesync of another table finishes right in that restart window.  That
>>>>>> doesn't seem a terrible issue to me.
>>>>> I think the table states are invalidated whenever the table sync
>>>>> worker starts, because the table sync worker updates its status of
>>>>> pg_subscription_rel and commits it before starting actual copy. So we
>>>>> cannot rely on that. I thought we can store last_start_time into
>>>>> pg_subscription_rel but it might be overkill. I'm now thinking to
>>>>> change GetSubscriptionNotReadyRealtions so that last_start_time in
>>>>> SubscriptionRelState is taken over to new list.
>>>> Attached the latest patch. It didn't actually necessary to change
>>>> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
>>>> the sync table state list.
>>>> Please give me feedback.
>>> Hmm this might work. Although I was actually wondering if we could store
>>> the last start timestamp in the worker shared memory and do some magic
>>> with that (ie, not clearing subid and relid and try to then do rate
>>> limiting based on subid+relid+timestamp stored in shmem). That would
>>> then work same way for the main apply workers as well. It would have the
>>> disadvantage that if some tables were consistently failing, no other
>>> tables could get synchronized as the amount of workers is limited.
>> Hmm I guess that it's not a good design that a table sync worker and a
>> apply worker for a relation takes sole possession of a worker slot
>> until it successes. It would bother each other.
> Not sure what you mean by apply worker for relation, I meant the main
> subscription apply worker, the "per relation apply worker" is same as
> tablesync worker.

Oops, I made a mistake. I meant just a apply worker.

> Thinking about the possible failure scenarios more, I guess you are
> right. Because if there is "global" event for the publisher/subscriber
> connection (ie network goes down or something) it will affect the main
> worker as well so it won't launch anything. If there is table specific
> issue it will affect only that table.
> The corner-cases with doing it per table where we launch tablesync
> needlessly are pretty much just a) connection limit on publisher
> reached, b) slot limit  on publisher reached, that's probably okay. We
> might be able to catch these two specifically and do some global
> limiting based on those (something like your original patch except the
> start-time global variable would only be set on specific error and
> stored in shmem), but that does not need to be solved immediately.
> --
>   Petr Jelinek                  http://www.2ndQuadrant.com/
>   PostgreSQL Development, 24x7 Support, Training & Services


Masahiko Sawada
NTT Open Source Software Center

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to